Why Pandas feels clunky when coming from R (2024)

https://news.ycombinator.com/rss Hits: 4
Summary

Five years ago I started a new role and I suddenly found myself, a staunch R fan, having to code in Python on a daily basis. Working with data, most of my Python work involved using pandas, the Python data frame library, and initially I found it quite hard and clunky to use, being used to the silky smooth API of R’s tidyverse. And you know what? It still feels hard and clunky, even now, 5 years later! But, what seems even harder, is explaining to “Python people” what they are missing out on. From their perspective, pandas is this fantastic tool that makes Data Science in Python possible. And it is a fantastic tool, don’t get me wrong, but if you, like me, end up in many “pandas is great, but…”-type discussions and are lacking clear examples to link to; here’s a somewhat typical example of a simple analysis, built from the ground up, that flows nicely in R and the tidyverse but that becomes clunky and complicated using Python and pandas. Let’s first step through a short analysis of purchases using R and the tidyverse. After that we’ll see how the same solution using Python and pandas compares. Analyzing purchases in R We’ve been given a table of purchases with different amounts, where the customer could have received a discount and where each purchase happened in a country. Finance now wants to know: How much do we typically sell in each country? Let’s read in the data and take a look: library(tidyverse) purchases <- read_csv("purchases.csv") purchases |> head() # A tibble: 6 × 3 country amount discount <chr> <dbl> <dbl> 1 USA 2000 10 2 USA 3500 15 3 USA 3000 20 4 Canada 120 12 5 Canada 180 18 6 Canada 3100 21 Now, without bothering with printing out the intermediate results, here’s how a quick pipeline could be built up, answering Finance’s question. “How much do we sell..? Let’s take the total sum!” purchases$amount |> sum() “Ah, they wanted it by country…” purchases |> group_by(country) |> summarize(total = sum(amount)) “And I guess I should deduct the discount.” ...

First seen: 2025-06-07 17:11

Last seen: 2025-06-07 20:12