Hacker News new | ask | show | jobs
by nequo 1459 days ago
> simple things in Pandas or R, like changing column values based on multiple field value conditions, and it is a struggle requiring at least 17 brackets

With dplyr, wouldn’t this be mutate(x = case_when(… ~ y, TRUE ~ x)) or the same but with ifelse?

1 comments

I haven't used much R, but how would you do something like:

  CASE WHEN
  SUM(daily_revenue) OVER (PARTITION BY department, TRIM(SUBSTR(region, 5)) IN ('North','West','Misc')) > 
  AVG(revenue) OVER (ORDER BY sale_time ASC rows BETWEEN 28 PRECEDING AND CURRENT ROW)
  AND NOT COALSECE(had_prev_month_party, FALSE)

  THEN pizza_party_points + 5

  WHEN <above> AND had_prev_month_party THEN pizza_party_points + 3

  WHEN MIN(sale_time) over (PARTITION BY department) = DATE_TRUNC('month', current_date) then 5

  ELSE GREATEST(pizza_party_points - 1, 0)

  END as pizza_party_performance_points_current

this example may be a bit esoteric but it actually draws from lots of real-world cases (comparing a partitioned sum with an overall average, checking conditions on other columns, messy data parsing and manipulation, implicit waterfall nature of CASE WHEN vs. explicit elseif, etc)