Hacker News new | ask | show | jobs
by melondonkey 820 days ago
Data scientist here that’s also tired of the tools. We put so much effort in trying to educate DSes in our company to get away from notebooks and use IDEs like VS or RStudio and databricks has been a step backwards cause we didn’t get the integrated version
3 comments

I'm a data scientist and I agree that work meant to last should be in a source-controlled project coded via a text editor or IDE. But sometimes it's extremely useful to get -- and iterate on -- immediate results. There's no good way to do that without either notebooks or at least a REPL.
There is VSCode extension, plus databricks-connect… plus DABs. There are a lot customers doing local only development
Thank you ! I am so tired of all those unmaintainable nor debugable notebooks. Years ago, Databricks had a specific page on their documentation where they stated that notebooks where not for production grade software. It has been removed. And now you have a chatgpt like in their notebooks ... What a step backwards. How can all those developers be so happy without having the bare minimum tools to diagnosis their code ? And I am not even talking about unit testing here.
It’s less about notebooks, but more about SDLC practices. Notebooks may encourage writing throwaway code, but if you split code correctly, then you can do unit testing, write modular code, etc. And ability to use “arbitrary files” as Python packages exists for quite a while, so you can get best of both worlds - quick iteration, plus ability to package your code as a wheel and distribute

P.S. here is a simple example of unit testing: https://github.com/alexott/databricks-nutter-repos-demo - I wrote it more than three years ago.