|
|
|
|
|
by Bootvis
1607 days ago
|
|
I agree the example in GP is not convincing. Consider the following table of ordered events: | Date | EventType |
and I want to find the count, and the first and last date of an event of a certain type happening in 2020: events[
year(Date) == 2020L,
.(first_date = first(Date), last_date = last(Date), count = .N),
EventType
]
Using first and last on ordered data will be very fast thanks to something called GForce.When exploring data, I wouldn't need or use any whitespace. How would your Pandas approach look like? |
|
mask = events["Date"].year == 2020 events[mask].groupby("EventType").agg(first_date=("Date", min), last_date=("Date", max), count=("Date", len))
Anyway, I don't understand why terseness is even desirable. We're doing DS and ML, no project never comes down to keystrokes but ability to search the docs and debug does matter.