|
|
|
|
|
by ahmedNarrator
2086 days ago
|
|
Yeah, Entity modeling was one of the big inspirations to our approach. The main difference is how do you reassemble the single time-series table to create any table. This was quite a challenge and I think what makes the traceability and source of truth problem a lot simpler. In Narrator, the data team writes small SQL to create single customer centric business concepts that we call activities. These are around 25 lines and decided to be understood by anyone in the company (i.e. "viewed page", "called us",...). Now, every question you or a stakeholder has will simply be a rearrangement of these activities. If you can describe what you want, then Narrator can assemble a table that represent it. Source of truth - What ever is in the activity stream?
Tracebility - always Dataset (activities and how they relate), then activities (~25 SQL).
Coherent Model - Customers doing actions in time. Does that make sense? Some of these things are easier to show in a demo then describe in text. |
|
This is the problem with EAV/nosql/schemaless/etc and ultimately the problem I think you are going to have to solve. Instead of using ETL to model how the activities relate and reifying that model as database objects, EAV just kicks the can down the road to the query/BI tool.
Sprawl - The BI tool will end up containing most of the real business logic sprawled across many reports.
Single source of truth - A lot of the reports will be very similar but they will be based off slightly different activities or slightly different filtering logic. Which report is the correct one?
Traceability - I think this is more of an end-to-end "garbage-in, garbage-out" problem that all ETL/BI tools have that wouldn't be specific to your tool. It's more of an organizational/people problem.
Coherent model - In my experience, EAV isn't enough to cover the breadth of analyses mature businesses need to do and most business users won't be able to wrap their head around it. There will have to be some data person that creates a more coherent, tabular/spreadsheet-like model and in the case of this tool it looks like that model will have to exist in the BI tool. Which brings us back to sprawl/single source of truth issues.
Just some thoughts. But always glad to see more people working on stuff like this!
Edit - one last thing I wanted to mention. I think in reality you are going to find it takes more than ~25 lines of sql to define activities. That may be the case if the source is a schema that gets spit out of something like Stitch, but many other schemas in the wild will take a lot more than 25 loc to massage into your 11 column schema.