|
|
|
|
|
by debarshri
2756 days ago
|
|
> "Enough blaming the former engineer." I was one of them. I don't work there anymore. I believe is it not the actual situation in bol.com. If it is, I would be disappointed. Last I remember, Bol.com has really good set of ops and dev tooling on hadoop, hbase, spark, flink etc. for scheduling, running jobs etc. I wouldn't know why they replicated data both on hbase, elastic search etc. Having read the blog, I don't see how this fits the event sourcing pattern that bol.com was trying to implement and also, the idea of self service BI that they envisioned. |
|
If I am not mistaken the majority of the PL/SQL glue is owned by Gert, though you might recall better. Quite some VCS history was lost while migrating from SVN to Git. ;-)
The reason we are "replicating" the entire data is to 1) determine the affected products and 2) re-execute the relevant configurations (facets, synonyms, etc.) while making retroactive changes. (For instance, say someone has changed the PL/SQL of "leeftijd" facet.) Here, the storage is required to allow querying on every field, for (1), and on id, for (2). While id-based bulk querying is (almost) supported by every ETL source, querying on every field is not. Hence, we "replicate" the sources on our side to suffice these needs. Actually, the entire point of the post was to explain this problem, but apparently it was not clear enough.
For your remarks on event sourcing and BI, I am a little bit puzzled. I will need some elaboration on these remarks. We do have event sourcing on our side (that is how we can replay in case of need) and BI is not really interested in ETL data. Maybe I misunderstood you?
I am also confused by how you relate scheduling/running PL/SQL jobs via Hadoop, Spark, Flink, etc. Did you see the link to Redwood Explorer I shared in the post?