Hacker News new | ask | show | jobs
by sreekanthr 2616 days ago
For your problem, I would suggest you to take a look at streamsets. They have an ETL plus data drift system in place, which is really interesting.

Ref: https://streamsets.com/

>Besides the brittleness of the process, I found that people are reluctant to analyse the data because it takes an unreasonable amount of time.

Is this because of the bad queries or way the data is organized?

>It is only used by data scientists. What do you mean by a feature repository? How would you organize it so people can push new features? This sounds very interesting.It is only used by data scientists. What do you mean by a feature repository? How would you organize it so people can push new features? This sounds very interesting.

Can you take a look at Feast by Go-Jek: https://github.com/gojek/feast There are similar projects by different big players in market, this should get you started on idea which I was talking about.

PS: Sorry, was traveling that is why there was a delay in answering your question.