|
|
|
|
|
by teej
3011 days ago
|
|
I answer these types of questions literally every day on Redshift. My data set is >1B events and more than doubling every year. You are underestimating the ability of a database to do its job. Here's 18 lines of SQL to save you a week of writing a data pipeline in Go. WITH page_timings AS (
SELECT page_type
, event_timestamp_utc
, LEAD(event_timestamp_utc, 1)
OVER(PARTITION BY session_id
ORDER BY event_timestamp_utc)
AS next_pageview
FROM events
WHERE event_type = 'page_view'
)
SELECT page_type
, MEDIAN(
DATEDIFF(sec,
event_timestamp_utc,
next_pageview)
) AS median_sec_on_page
FROM page_timings
GROUP BY page_type
;
|
|
Can we then take that raw event data, like this query, and identify users with anomalous usage patterns, automatically alerting administrators of odd activity in a timely manner?
Your answer might be "Yes". If so, I applaud your ingenuity and Redshift's ability to process data in a timely manner. It's pretty awesome, I admit!
But I believe that the best way to answer these kind of questions is to track behavioral analytics in a format that is easier to think about and easy to analyze.
Events are too deep in the weeds, I think. I may be wrong about that - But the tools I build with this will either speak for themselves or fall by the wayside as a failed experiment in history!