Hacker News new | ask | show | jobs
by addisonj 1044 days ago
Congrats on the launch!

Interesting project in a space that I am pretty certain is going to change a lot in the coming years. Here is a bit of random feedback and questions.

* Some of your messaging related to python vs yaml is a bit confusing, which results in me not being immediately clear on the value prop. After digging through docs and code I now understand that the yaml is a declarative pipeline calling the underlying python code that can include user defined transformations. Nifty! As someone who has led data platform teams, I understand that this would be a big win for any data platform team to better support data eng/scientists. But you don't tell me any of that. I would look at trying to give more context to what this is and adding more of these use cases and values in your marketing (even if they are pretty nascent at this stage)

* From the loom, the play you are doing is clear and makes a lot of sense to build a cloud service to easily run these jobs... but that makes me wonder if your licensing choice is maybe a bit too restrictive? IMHO, the most important thing to do when building dev tools is to be very deliberate in your end-to-end user -> customer journey and designing your open source and commercial strategies to nicely dovetail. For a product like this, I would think the faster and bigger I can build a community, the better, and that may mean "giving away" a lot of the initial core innovation, but with a clear plan on the innovation I can drive through integrated services, which would imply as open as a license as possible. As is, I think you might find it much harder to get people to take it serious, as, unlike other source available companies (Elastic, Cockroach, etc) you aren't yet proven to be worth the effort to get this approved vs a full open source alternative

* On a similar note, what is in the repo right now seems to be a relatively thin wrapper around spark. That isn't a criticism. Many technologies and communities have started based on a "remix" of a lower level tool that offers simplified UX/DX or big workflow improvements. What sets those apart though, imho, is to drastically lower the barrier to entry to using the underlying technology and to be seen as leaders and experts in the space you operate. I am guessing you probably have lots of features planned, but I would also give a soft suggestion to look as much into thinking of learnability as a feature (via features, interactive docs, etc) as I would almost anything else, as that is really where a lot of the value of a higher level interface like this comes in

* My past experience with really large and complex ETL jobs that essentially required dropping into spark to represent them has me wonder how much actual complexity can be represented by the transformers? I would be curious to know what your most complex pipeline is? It doesn't seem there is an API limitation why these pipelines couldn't get quite a bit larger and represent many sql statements, other than big long spark pipelines getting kind of ugly, and in some cases, could even remove the need for quite a few airflow jobs. I am curious to know if and how you see Serra addressing those sorts of problems like those types of ETL jobs.

Once again, congrats on launching! Happy to give more context/thoughts in a thread or reach out to me via in profile

1 comments

This is super insightful thanks a ton for this gold mine.

On the python vs yaml part—definitely could've made that way more clear in the demo. Right now are framework lets you call these python objects in your yaml file, but we are working on just a python-centric implementation as well for those that do not want to interact with yamls.

On the loom and licensing choice—that's a great point. One of the main issues we ran into is getting adoption as we originally just tried licensing out the framework (mega fail ofc)—found out the hard way that no dev wants to buy something to try it out. We're definitely flexible on our license and will take all this feedback into account.

On the barrier of entry—also super insightful. We're working on a local UI offering that will be a 'config' block builder that will be free for all installs. We're implementing a DAG view similar to Airflow on the transform level. We also want to make it super easy to see your code and preview how it changes with this local UI (and have a list of all the params you need for your spark objects without having to go through the docs). We also want to flesh out more features especially on the translate side, as well as host on the cloud.

With the complexity issue that's something I ran into Disney as well! As the product grows we definitely want to flesh out our transformers based on the scripts we see. For now, the developer can make one-off transformers—we actually have a catch all "SQL transformer" for cases where you want to just pass in your sql (similar to a dbt model) and run it that way. That way it's a fail safe for if you have one specific transform that you feel is super hard to break down, you can fall back on dbt's way of just modularizing the SQL into a transform, and reference it however many times you want as an input block later on.

Thanks so much for the congrats, will definitely reach out and would love to have further discussions in the thread as will.