Hacker News new | ask | show | jobs
Ask HN: How do you handle Excel import/exports in your apps
6 points by NOtherDev 2647 days ago
Hi folks!

I'm implementing Excel-related functionality for at least 4th project in my career. As usual, an ability to save the data set into an XLSX file in a custom format and the way to ingest the data back, using the same data format, or ideally letting the customer map their own formats into our format.

What worries me is that I need to take care about all the hassle around - handling large uploads, managing timeouts, reporting the progress, caring about basics of data validation, file encodings, ensuring the import process doesn't take up all the server resources etc.

The list of impediments for such a "side" feature is unproportionally large, IMO. There are few libraries for almost every stack, but they still require me to handle at least half of these impediments by my own (resources, server config etc.).

I'm wondering, how do you usually handle these stuff? Do we all go through these problems over and over again or are there solutions that I'm not aware of that makes things easier, like a standalone tool or service I can "just use"?

If not, my maker soul asks: maybe there should be one? Maybe I should build one? Would you consider using this kind of service in your applications if one exists? Why not?

4 comments

Been there done that.

Tips (apply common sense based on your specific company):

1. As an engineer, learn to say no. Want to import some random excel sheet into our system? Sorry no you can not.

2. Try to provide an API instead and push the problem to the customer. If they want it badly enough they can pay someone to write something that takes their stuff and imports it via your API.

3. If you have to do the file thing avoid XLSX like the plague. Stick to CSV. It's their problem to convert their XLSX or anything else to CSV.

4. CHOOSE, VALIDATE AND ENFORCE the format (or a few possible formats). Do not negotiate or collaborate with customers about "custom format" and so on. That is a never ending battle you will never win. You give 1 inch they take 10. Eventually they will give an animated gif compressed in a zip file, wrapped in a tarball and ask you to convert that to CSV and import it in.

5. If you are running a typical SaaS don't turn your product into free in-house IT consulting for the random customers. Make the capability possible within the system by exposing an API and leave the rest for them to figure out. If they care about enough they will have to pay someone to create the integrations. 95% don't care and will give up and adapt.

If you have enterprise customers paying 6-7 figure sums it's a different story...

Expose the api and offer consulting to work with them, min 40 hours st $200 hour for example.
Sounds like you’re talking about two separate things, the upload part and the import part. The upload part is solved for most languages and frameworks. The import step; excel parsing solved, getting data into the correct format is highly app specific. Maybe you could clarify what you have in mind?
SheetJs [0]

[0]: http://sheetjs.com/

free as beer, voila !

> If not, my maker soul asks: maybe there should be one? Maybe I should build one? Would you consider using this kind of service in your applications if one exists? Why not?

To answer your "maybe there should be one?" question. What you're describing sounds like this: https://flatfile.io/. It was shared here 2 months ago.

Though to answer your original question:

> How do you handle Excel import/exports in your apps

One way we go about that is: whenever someone uploads a file, create some sort of UploadRequest record which contains the filepath in S3/Azure/GCS/etc. Then the actual parsing and validation is spun off in a worker process. That way the web process isn't locked and we can pub progress ((processed_rows/row_count).percentage) on authorized subs. If that worker process ends, update the UploadRequest record status and dump the array of good_rows/error_rows in some column for that record. Users see that the status for that UploadRequest is updated, then they can review the UploadRequest errors. If there are errors, they download the uploaded file, fix that and re-upload again. At the start you can also provide a template to "gate" your upload, if it doesn't match your template you blow up UploadRequest with the proper error. If all is good, they click "Process" or something, which actually inserts the rows or does whatever.

Again, that’s just one way.

If bizdev asks to make a feature that allows fixing those rows in your app without leaving the browser, we find that it slowly feature creeps into Google Sheets territory, which is a BIG undertaking and a separate feature on it’s own. It's a nice experience to do in-app editing but we try to see if most of our Excel uploaders would prefer to go back to Excel for editing. Sometimes we find that they’ll prefer going back to Excel because they’ll use the same excel sheet for other purposes in their process (send that Excel as an email to another department, use that and aggregate it as a separate report, etc.).

But really, the hardest part in Excel uploads isn't server resources, progress, file encoding, or timeouts, it's the validation. There are different levels of validation: from presence checks, formatting, type-checking to running a row or a cell against business logic, 3rd-party API calls or records in your database. This is where most User errors arise, because you'd have to educate them on what values to put if you're referencing something in your business logic (special abbreviated codes, special combination of codes, date formats, etc.).

Small question, since the developer docs are not available on mobile devices:

When the flatfile.io homepage keeps referring to "XLS" docs, is it actually trying to say they will import modern Excel files ("XLSX") or does it literally mean you would have to ask users to re-save their spreadsheets into the ancient format?

Yes, this is also the process we're taking on and it seems pretty complex for me. But anyway, flatfile.io is probably answering my all question best. Thanks for sharing!