|
Is TimescaleDB a good fit for events? For example, we have data where each record is a tuple of (time, event, object), where the event is things like "viewed", "performedSearch", etc., and the object is event data as JSON. Let's say the object is a movie, in which case the payload might be something like: {"id": 123,
"name": "The Godfather",
"director_id": 456,
"genre": "crime"}
Our reporting UI lets users do aggregations based on arbitrary dimensions, so we might do the equivalent of: select object->>'genre' as genre,
object->>'director_id' as director_id,
extract(month from time) as month,
count(*) as count
from events
group by month, genre, director_id
Then we do things like pivot/nest values to display the groupings.It's unclear to me whether TimescaleDB fits this use case, or whether we'd have to change how things are indexed. Right now we just index the whole object itself. Another scheme I've considered is that we could index it with the values as keys; so for example, the above event would be {"name:The Godfather": 1, "director_id:456": 1, "genre:crime": 1} and essentially represent the counts. A counting aggregation would then be rewritten as as a sum(). But it's unclear to me how you do intersections here without also creating all the permutations (i.e. something like {"director_id:456/genre:crime": 1}) beforehand. We're currently using Elasticsearch for this. Performance is okay, but we're not entirely happy with it. |
- keep the event as a json object and use a GIN index (this would then be combined with constraint-exclusion on the time field for faster queries if using timescaledb)
- convert each JSON key to a column and use multi-column indexes or bitmap scans
- Normalize out the unique object json into a separate table with the columns id, object (json) and have your time-series table as time, event, object_id.
Happy to talk more on our slack support channel https://slack.timescale.com/