Hacker News new | ask | show | jobs
by cowsup 754 days ago
Some of these UI decisions are more technical versus tyrannical. I use Linux every day, yet I can understand why Google doesn’t have such detailed progress reports: If you are using Google Docs, and you want to convert a file to Google Sheets, that likely requires several different microservices working in tandem to handle your request.

For them to build out a real-time feed that tells you the progress would perhaps require a complete change in how these microservices behave (so they can all feed real-time, ongoing data to the client), and not provide any real benefit. The only time I really pay attention to my Linux boot sequence is if something is stuck or an error appears, so I can handle it. Seeing what Google Sheets is doing may be “neat,” but I completely understand why that’s not a good reason to build it out, and it wouldn’t make anyone outside of Google employees more productive.

3 comments

> If you are using Google Docs, and you want to convert a file to Google Sheets, that likely requires several different microservices working in tandem to handle your request.

This discussion is about UX, but your comment shocked me. Seriously, converting a spreadsheet would be a single process if you ran it from your command line. It’s hard for me to imagine why I would invoke several microservices to perform this on a back end server.

What would your solution for 10 million such requests within a day be?
You may be conflating microservices and horizontal scaling. You don't need to have multiple (disparate) microservices to scale. Microservices have absolutely nothing to do with scaling. That's a myth started by people who never understood the actual point of microservices, which was partitioning and continuity of developer productivity.
You know: write a program like in the old days.

I would perform each request in a single process: read in the metadata (mainly structure) and then either process each tab sequentially or more likely map the whole thing into memory and spawn a thread for each tab, then write the whole thing out in order.

No need for the overhead of microservices: locating, invoking, transferring data, and synchronizing responses, much less dealing with all the pain of lost connections, abnormal termination and so on.

The largest excel sheet I've worked on is only about 500 MB and (does a quick search of my local filesystem) almost all are less than one MB. So in the (rare) worst case the transmission doesn't justify spreading it around; in the common case there's no benefit.

So what happens when this hypothetical machine of yours, that has enough NIC bandwidth to process the scale of data that needs to be streamed in both directions, enough CPU power to handle millions of concurrent requests in a process or a thread of their own, and enough ram/ fast enough disks to to map and swap all the files that are being converted, goes down?

In 2022, Google Workspace apparently had ~ 3 billion users (8 million of which paying) https://developers.googleblog.com/en/year-in-review-12-aweso... .

Not every solution needs microservices. But also, we have problems today that we did not have solutions for "in the old days".

You keep confusing horizontal scaling with microservices. The two are basically unrelated. You have to horizontally scale regardless of whether you are running regular services or microservices - the goal of micro services is just to increase the granularity of horizontal scaling (or, more often, to solve organisational issues around feature/code ownership)
Ten million per day seems unlikely to translate to millions of concurrent requests.

This kind of task is typically suited to be made into a single thing, you don't want partial conversions hanging around in 'microservices' if something goes wrong.

As for scaling, I'd likely put this in a process definition and run it on the BEAM if I were to make such a product. That way millions of requests per hour can hit my cluster and those that fail somehow will just get cleaned up and the transaction rolled back, the clients get 'sorry, try again' or 'sorry, we're working on fixing it', and the rest happily chug along.

Apache Beam?
You don't have to run them all on a single device! But you hardly need a bunch of microservices, or any, really, to do any one tranlation.
Maybe this'll come off as snarky, but I would build Excel! It would spread 10 million requests across the 10 million users, be totally immune to network outages, and my users could rest assured I wasn't thumbing through their data.
Couldn’t the conversion run locally?
My assumption is that the comment I was responding to addressed Google Docs specifically (which afaik is web only?). If this is the case, then your options to run locally would be for the conversion to run in your browser, or for you to have some Google agent run on your computer that can handle these requests instead of the Google servers themselves, neither of which is a scalable option in this case (due to browser differences / expecting people to be able to install software locally to their devices)?
> due to browser differences

Surely a file conversion should not be affected too much by browser differences, it should be a pure function, pure calculation not requiring too many APIs and which doesn't have much to do with rendering.

you want to convert a file to Google Sheets, that likely requires several different microservices working in tandem to handle your request.

Well there's your problem, right there.

Why?
Because file conversion is pure (in the functional programming sense). It produces an output based solely on the given input, without side effects.

If you have to implement a pure function by splitting it across multiple services, there is something very, very wrong with your software architecture.

A progress bar doesn’t need to literally report precise progress — just that milestones/checkpoints are reached. Like that 1/7 microservices has completed.

Now, you do need to consider expected timing and weigh it accordingly else you’ll run into the 1-99% takes 1 sec, and then stuck on 99% for 10 min issue… but otherwise.

And if you can’t report that level of progress, then you’ve got other issues (namely that you yourself have no idea what the hell the system(s) is up to and working on at a given moment)

Kind of off topic but this assumes that the ideal progress bar should be a smooth continuous movement from 0% to 100% when I don't think that's the case. If you have a 100-step process, and the first 99 take 1 second and the last one takes 10 minutes, the progress bar should be at 99% for 10 minutes because the process is 99% complete, right?
different devs have done through history different methods implementing this. the real answer is context depend, who's the target audience and what do they want out of that progress bar.

you can have a progress bar that show the milestones + ETA. multiple progress bar + log messages box that shows what the background is doing, or you can just have single disconnected bar that's based on a timer based on what you estimated the task would take etc.

as user: - how do you know if the UI / process is stuck then or just taking ? - how much time is there left, you got other things to do after 5 more tasks like the current one and want to estimate a rough estimate when you'll be done with this.

What does a human want to know when they look at a progress bar? How long (or how much longer) they need to wait, typically. So yeah, the progress value displayed should ideally be the percentage of total time.