Hacker News new | ask | show | jobs
by kpierce 1535 days ago
Trillions of decisions have been made off a tool that has poor error handling and data consistency issues. Blame is not entirely on either the user or the software, but the tool is too trusted without validation.

[Study that was at the core for Europe's austerity and European debt crisis contained excel errors when fixed showed the inverse of original hypothesis.](https://www.nytimes.com/2013/04/19/opinion/krugman-the-excel...)

Programmable commands instead of a data grid would be huge improvement to quality but people use excel in many ways. Python is out of reach for most people. SQL would be an improvement as well. I assumed airtable or similar would replace excel over time. But the sunk cost for existing report and the sharablity seems to keep excel in control.

5 comments

Airtable really ought to be killing Excel, but the SaaS model combined with a stupidly low artificial row count limit (over 50000 rows is listed as "contact us for pricing") means that it will never achieve penetration into weird and wonderful use cases like Excel has.

Like, my default is to throw a dataset I'm hacking on into an SQL database so I can actually query the thing. But no I don't want to upload my 400MB log file. I'll just use grep, or build a CSV and deal with Excel filtering.

Airtable should be awesome at reducing the cost of database-ifying these random datasets to zero. But the sales constraints put it in this niche where it's not the default tool of choice.

You might want to try out Baserow (https://baserow.io). It’s an open source alternative to Airtable, backed by a PostgreSQL database. Main differences are that you can self host it with unlimited rows, it’s modular and it’s made to handle high volumes of data.

Disclaimer, I’m the founder of Baserow.

Baserow is frickin awesome and I wish y'all the best.
100%

I found what Airtable is doing to be deeply attractive. But their costs and their lock in and their pricing model and it's just...

UGH.

Microsoft Access was a good idea with a terrible implementation.

There HAS to be a unfilled niche here.

nocodb looks to be the best answer so far? Because it ties to a backend postgres database, it can be used along side bespoke applications. It still needs development though. I'm watching it like a hawk.

We are trying to answer this with CloudTables (https://cloudtables.com) - which is effectively a GUI for my DataTables library with a Postgres backend. Current work is to address the row limit and allow millions of rows without needing to contact "sales" (me), while also not charging per user (I hate that as a customer). If anyone fancies giving it a go and dropping me some feedback that would be most welcome! There are some rough edges without question that are still being worked on, but I think it has some advantages such as being able to self host with your own Postgres instance.
I just want to say thanks for DataTables - it's a fantastic library!
This is off-topic, but I wanted to say a big Thank you for DataTables. I used it on many a project and still rave about how fully functional it is.
Access was pretty incredible for what it is/was. I could build a structured database with a nice UI for non-data people, reports, and even more advanced things like automated emails, exports, etc., in 1/10th or even 1/20th the time it'd take to build something similar as a web app.

We had an Access database that managed grant funding for an entire public University and in many ways it worked a lot better than the SaaS app that recently replaced it. Need to collect a new set of data? No problem, give me 4 hours and it'll be ready to use :P.

I'd love to have something like Access but that worked very well as a platform-agnostic web app and could easily integrate with cloud infrastructure.

You used to be able to do that with Access 2010 web databases. Of course, Microsoft has deprecated that in favor of Powerapps and Microsoft Dataverse, but it's not clear that actually lets you join an Access database to a low-code frontend. (It should, but there's a lot of marketing speak that I don't quite understand.)
Access wasn’t even that bad of an implementation. It was amazing not just how broadly access was used but the kinds of users who could do real things with it. A bit like HyperCard.
VC-funded SaaS wiped out any real possibility of a sane business model for a product in this space.
Access lacks, IMO, better internal programming and more exposure to the fact that you can use pretty much any database you can access (pun intended) with ODBC or ADO.

Make it easily deliverable over network, and you have killer product.

I also think it’s just kinda clunky compared to excel or Google sheets. Maybe if you get used to it it’s ok to work with, but I guess you run into the issue that any friction makes it a hard sell to those who are used to excel.
I am currently working on a CLI that will allow you to query .csv, .xls and .tsv files and SQL tables via plain English.
What do you mean by plain English? Will it support colloquialisms? Regional dialects? How is a Left Join expressed in English, distinct from a Full Join? Will it accept synonyms and contractions? Or will the Query Language require Structure?
Interesting concept - care to explain in more detail?
I too would like to subscribe to this newsletter.
Yeah their monetization strategy is extremely puzzling. As a casual user I loved their Chrome extension that lets me grab data and put it into a sheet in a click but it only lasted as long as my Pro membership. All of the advanced features seem to be locked behind a subscription.
Wonder how much the lack of pirating or using a friends out of date version prevents ubiquity.
apparently grist can handle 100,000 rows, and that's just a soft limit so you might be able to do more.

being able to have an excel grid and a chart view on the same page would probably suit your use case as well. being able to use python for the formulas is a nice touch too.

the free hosted version has almost all features available as far as i remember. there's also a docker version that's easy to get up and running and doesn't have any limitations

Also Grist isn't limited to the grid visual model https://twitter.com/dsagal0/status/1509924813837635593 (disclaimer1: grist employee) (disclaimer2: April 1)
> Programmable commands instead of a data grid would be huge improvement to quality...

Would it? At the end of the day, someone else still has to proofread and QA the commands/formulas/program or it's just blind trust that the decision is being made on. Trust (or ignorance) that the creator knew what they were doing and developed it in an accurate way before action is taken on the decision being made. The interface really makes no difference, it's the human component and "process" for creation that needs to be fine tuned. Things like the London whale situation was a process failure where one person had too much power to execute trades without oversight, review, QA, testing, etc. [0] All things that are pretty standard in a software developer's day-to-day but the rest of the world has not realized or adjusted to the fact that they are now software developers too.

[0] Excel wasn't the problem with the London whale at all, they made a mathematical error "modelers divided by a sum instead of an average"

The tools are scapegoats.

The Reinhart-Rogoff issue was technically an error in Excel, but also an error by the authors for not actually verifying the results before publishing. It didn't hurt that their particular biases were in line with the results.

The technical problem can be addressed with more warnings and safeguards, but they are meaningless if no one uses them.

I hadn't previously read up on the RR issue. But after some surface level research, I would not say it was Excel as an issue. It sounds like the tool did exactly what they programmed it to. It seems like human error or choices they made to arrive at the conclusion they wanted; which seems to be speculated (or true, I only scratched the surface).

> While using RR’s working spreadsheet, we identified coding errors, selective exclusion of available data, and unconventional weighting of summary statistics. [0]

I'm not a fan of tools giving warnings for these types of "coding errors". Although a warning I can think would be nice is where math just doesn't work as expected. The recent floating point discussion [1] seems appropriate as it's just not very intuitive and as a programmer you need a pretty deep level of understanding to know that the resulting math is likely not accurate. But, it also seems to effect nearly every programming language and is not a quirk of one specific thing.

I'd be interested to read more if you have info outline the actual error within Excel. If there is some 2+2=5 situation, I'd be interested to learn about that. I feel like every time someone says "Excel error", it's actually "human error". It would be like if every car accident was a vehicle malfunction but we all know it's most likely an operator issue.

[0] http://peri.umass.edu/fileadmin/pdf/working_papers/working_p...

[1] https://news.ycombinator.com/item?id=30856434

Go farther. Would the results have ever been validated if the source of truth was not a universal format easily interpretable by millions?
I think what you're touching on is complexity and how many people tend to trust complexity because it's too difficult to validate and you must know what you're doing if you built something so complex.

Even in a corporate environment, I use spreadsheets to support big decisions every day, the format is as you say universal and easily interpretable, but many people never go that far. They trust that I did it correctly and take the "output" as truth. If I screw up big ($), it might cost me my job, but is it really my fault if nobody else even bothered to check my work?

Bikeshed effect writ large, I'm afraid.

It's not our fault, our brains are wired to find quick & dirty; simple not small is beautiful. But people also contemptuous of what they understand, and accept at face value what they don't. Michael Crichton, much disparaged in his later years for contentious political stances, had much of value to say in his Gell-Mann Amnesia Effects' "wet sidewalks cause rain" critique of how we parse information.

Still, I'm glad it's transparent in this case and thus exposable. How much more is hidden, guarded jealously even, as though methodology was trade secret instead of, you know, the underpinnings of proposed outcome?

I think that more precision over application of formulas would solve a lot. Arrays are mapped over by copying the code for each array element by dragging it across a row of cells, and the arguments to the formula are automatically mutated based on where the code is dragged to. This can be error prone.

More concrete definitions of where a formula should apply would be good, for example, leave the formula cell in one place and specify that one argument should come from this range of cells and the other from this range of cells, and the output should be mapped to this range.

Give https://exploratory.io/ a look. It's free/cheap.

It's a nice easy GUI wrapper for R and just works.

I stumbled across it a year ago and now use it daily.

I don't feel that Python is out of reach for most who are using Excel with vlookups. I do feel that most Pandas code is poorly written and thus not at all compelling to replace Excel.

(My background is that I teach Python and Data Science to large corps.)

+1, the Pandas API is somewhere between mediocre and bad, and results in garbage code unless you use it in a carefully constrained way (which is admittedly true of many complete languages, much less libraries that organically evolved several tooling generations ago)
Not sure I agree, but interesting, so who is your target audience in large corps?
Are you disagreeing with "excel is programming" or "most pandas code is bad"?