Hacker News new | ask | show | jobs
by breckognize 784 days ago
Shameless plug: If you have bigger data sets, check out rowzero.io.

We implemented something like PySheets initially where the formula language was full Python. But we found the Python interpreter to be the bottleneck during (e.g.) large CSV imports, and the GIL prevented parallelizing evaluation. It was also harder for business users to adopt due to small syntactic differences between Python and the Excel formula language.

So we implemented the spreadsheet engine and formula language in Rust. We have a Python code window that allows you to write arbitrary Python functions. Those functions can be called as formulas from any spreadsheet cell. We seamlessly marshall Pandas dataframes from Python land to spreadsheet land and back. It gives you 90% of the benefits of pure Python without compromising on performance.

5 comments

Rowzero is a better spreadsheet, while PySheets is a better Jupyter Notebook. Although they converge in certain aspects, their distinct target audiences set them apart. This divergence may create some overlap, but it also leaves ample room for user preference.

PySheets currently runs inside the browser, on top of WebAsm, and the limitations there are bigger than just Python's slowness. You have only 4G addressable memory, including the interpreter and libraries. Network bandwidth is also a limiting factor for client-side computation.

That said, PySheets can render a sheet based on a 50,000-row Excel sheet in 0.5s and needs about 20s to do a full end-to-end recompute run. There are limits to what you can do in the browser without using an external kernel that can run Polars on large datasets. But, I think most people will be fine with what PySheets can let them do.

Finally, as the author of PySheets I am honored that a "competitor" sees us as a threat. I am quite impressed by Rowzero myself. Nice work :-)

Kudos on the technical achievement. We considered the thick client approach you're doing, and one of the reasons we punted was because it was so hard.

One really nice thing about your approach is it minimizes infrastructure cost. That positions you well for embedding use cases, like New York Times visualizations, that we struggle to do economically.

Best of luck!

Yes, my total development bill for EVERYTHING, including DigitalOcean, Google, and OpenAI is about $15.
Kudos to you. I would be quite flattered to have built a thing that competes with what a small startup built.
I am feeling pretty Okay now, indeed. I played golf today. It was on a Par3 course, so it only tested my short game. However, I scored -1, with almost a hole-in-one. I blame it on the success of PySheets :-)
I've been trying to get a platform to create dashboards where some data comes from spreadsheets and some data comes from databases. Something like a notebook interface crossed with a grafana interface while also enabling forms for input is sorely missing. While it can be stitched together, speed/performance and flexibility (in terms of JS or Python) seems to be lacking atm.

I want to use such a thing to create internal dashboards similar to retool.

Does it need to be live (i.e when database or underlying spreadsheet updates does it need to be reflected in real time on the dashboard) or are you ok with static display.

Live updating data is a pain I've messed around using javascript to force refresh html iframes on a timer. But I was never really satisfied with this. I've heard you can do things with websockets but that is starting to get too complicated for me (I'm not a programmer).

For static stuff one of the data scientists in my org pointed me to Streamlit (https://streamlit.io/) it's a python package I found very easy to use. Can easily combine SQL with CSV imports and display them all on one dashboard. Can use forms toggle butotns etc to control the display.

You can do that today with PySheets. On the PySheets landing page, you can find a live example. The data comes directly out of a sheet that uses a service to convert metrics into charts. For example, one of the three charts shown on https://pysheets.app/#Traction is directly embedded as an iframe from https://pysheets.app/embed?U=uXNuCGO2JU1E5aL7zcOh&k=C12. If I rerun the sheet that produces the charts, the PySheets landing page updates automatically with the latest data.
You should try http://rowzero.io. We connect directly to DBs and data warehouses, support Python natively, and scale up to hundreds of millions of rows.

Lots of people use us for dashboards.

Rowzero seems incredible, but this and PySheets target the wrong users. You are targeting data scientist while I would target finance people to get traction. So let me tell why I would use it as a Data Scientist but not as a finance guy: 1) It runs on the cloud, I would go with something that runs locally (or on premise) since there are sensible data there (with rust as a backend should be fine, python you need to ship a set of libraries using docker) or should be integrated into GCP/AWS/Azure. 2) You need to create a PowerPoint/Word alternative as well where you can just copy/paste stuff or you need to make the copy/paste in PowerPoint/Word easy 3) Push strong on big data and DB connection, right now those are the bottlenecks, also create some API in python for popular services in finance (Bloomberg, Factset, CapitalIQ, ...) so that they are available out of the box with a subscription 4) Do something for the text part, like getting embeddings for similarity, fuzzy match in python plus probably the interface can be different in analyzing text (highlights in green of keywords, search in text and so on), people in finance often work also with PDF and having all in a platform is nice instead of having two windows as of today
PySheets has been designed to run on-prem and on GCP as well. The beta version you are looking at is just offered as a zero-install experimentation platform. We are actively talking with financial institutions, and both co-founders on the team, https://pysheets.app/#Team, have a long history in Finance, so we are very sensitive to all the (correct) points you make. We will look in more detail at your very helpful suggestions!
Any chance you could expand on how the DAG is implemented in Rust for the execution engine? I'm trying to do something similar (not for spreadsheets but rather for a language: https://docs.yoctoproject.org/bitbake/bitbake-user-manual/bi...). I cannot find any good examples of how to implement something like this in Rust. E.g. should I use a graph library like petgraph, or roll my own?
PySheets is not based on Rust. It is 100% Python.
I replied to the rowzero guy, which is written in Rust.
Both of the solutions seem interesting for different reasons. @breakognize. You said 90% of the benefits. Can you or @laffa give an example of the 10% that would prevent me from using your solution?
Are Row Zero and/or PySheets open source?
A major part is, in the form of Pyscript-LTK. I keep moving more of PySheets to LTK as I find reusable parts. I truly love open-source, but I am also trying to get some revenue for the months of work I spent on developing PySheets.
nah, but it would be nice to have a communist version too.
That was not the point, there is a natural focus in HN towards open source software. Open source is not equal to Communism.