Hacker News new | ask | show | jobs
by taeric 895 days ago
I'm struggling to think how I'd meaningfully do any of that with 3.2M rows. :(

I'm reminded of old books that were various number sequences for checking calculations. Clearly somewhat useful, but incredibly niche and seems to lose to other tools rather quickly.

2 comments

I don't really know how to interpret this comment. 3.2M rows are not loading into the browser at once, only about 10k. The page size is configurable. The frontend and backend have a contract to agree on how this works, so as the user scrolls (and frontend needs another page) it asks the backend for more. The frontend will keep up to N pages (also configurable) cached in the client. Works shockingly well without any fuss at all. The AG-Grid team really did a great job here (it is not free for this particular feature, but well worth the cost)
Apologies, it was not the technical side of it that had me musing on it. I was genuinely curious why and how I would "scroll" through that much data in a meaningful way.

And I don't mean this as a heavy criticism of the idea. I'm assuming it is useful to you. Always fun to hear about how this sort of thing is used.

You don't scroll through such a list - you use filters, pivoting etc. For end users, it's often quite comfortable to unify this in an Excel-like interface, as they're used to it. In most CRUD projects I had, users had an aversion against paged/pre-filtered displays and rather would have everything in one list where they can dynamically filter it if necessary.
Right. But that just brings us back to "how do people work with a scrolling list of 3.2m records?"

I get the point of wanting it locally to use power tools. And I get that the browser is probably capable of implementing a lot of power tools. Seems silly to insist on doing it all "in memory" on the browser, though?

That is, if the idea is you are doing pivots and filters, I don't know why a server side hit wouldn't be better for that. Similarly, when I look at something like a stock ticker for the day, I don't expect every single transaction was sent to my browser to create the graph. It /could/ be done that way, but why?

More directly to the question I had here, why and how would someone need a scroll list of every market transaction? For fine audits, I would get it, but even then I'd expect some sort of search or anomaly detection?

Still, I think if the answer is to "get it in the users hand and let them do what they will with the data," I can accept that. Goal isn't necessarily to let the users scroll the data endlessly, but for them to use any bespoke tooling they are already using.

Hope I do not misunderstand your comment, but I think the point is not that you use millions of rows but that you should be able to use all of your rows/data without having to use pagination as a workaround to hide the problem that the GUI is incapable to render too many rows at once.

Imagine having such a terrible UX while editing a large source code files, the editor loads only 20 lines then you need to click a button or scroll down to load the next 20 lines, search would also be slow since it would do a request to the backed.

Sorry again if I missunderstood your comment, its not super clear to me what was your point.

Apologies, that was not what I meant. I meant it more directly in assuming the technical side works and you are able to scroll just fine, how do you meaningfully do so with that much data?

And I don't mean this as an attack. I presume there are some techniques that I just don't know. Or data sets I just don't typically interact with.

For the ones I am used to, aggregates are key to working with them. That and graphical visualizations. (Though, it is frustrating how many visualizations can be reduced quickly to "top N" or similar.)

Say in my web app I can save "projects" and I have 200 projects in total. Because of DOM performance the devs implemented pagination say 40 items at max so I have 5 pages of projects. We all know the issues with pagination.

If all the projects are loaded I can :

1 scroll all of them , exactly like I scroll a document or how I scroll in my File Manager or Image Viewer

2 I can do Instant Filter and Search, no backend requests, don't you hate when say you open a YouTube account video list and you just can't use the browser Find function because the content is not in the page

One example is YouTube in a specific account Videos section , you are forced to scroll and wait, scroll and wait , scroll and wait , they can try and hack this to be smoother but this could be instant since the JSON for all the videos could be returned at once, it is just text and is more efficient then returning it in chunks , then like in competent GUI toolkits you have a big grid with all the results and the toolkit/framework Grid does the work for you in making sure that everything is rendered efficiently and smooth.

I would prefer at least an option in this apps to offer the customer a setting to decide the Page size , maybe I want a slower initial load but have everything loaded in one screen , imagine paginated File Manager

Agreed on most points. But 200 is a very different number from 3.2m. I would expect I could load all 200 in the client and get fast local operations. For 3.2m, I'd expect sending the filter off client would be faster.

Essentially, for performance there is a tradeoff between sending the data to where you will perform operations, and sending operations to the data. I don't know where the cutoff is, but I'd expect 3.2m to be on the "send operations to the data" side.

if there are 3 million simple objects like title, url, thumbnail url, why would be faster to send the request, to the filtering in the backed db and then send the response back? I suspect that an Array.filter would be as fast with a db search + the request overhead.

I just wish that the DOM would have some built in List,Table, Grid components, like you have in Qt,.Net WPF or Flex4 . Today we just have divs in divs in divs.

I mean, fair, you could have tiny objects? My assertion was to not send all 3m rows to the frontend, but only the results of any aggregations and such. Want to know the top N? That should require sending N rows, period. Same for filtering and such. This is basically just a restatement of the "send the operations to the data" approach. Is a big part of why doing joins in the database is so much faster than doing them in the application layer.

And, as you allude, the shuffling of all of the DOM overhead to manage what is visible is non-trivial. Yes, you can basically flyweight it to save memory, but it is only a matter of time before the user wants Ctrl-F to work and then you try to find a way to put the whole object in a place for the user to directly work against. (Yeah, you would probably try to capture Ctrl-F in the application and fake the native search. But then case folding and other concerns now have to be reimplemented by you.)

It's something that's both a common B2B requirement because users keep asking for it ("I want to see all of my data at once"), and something functionally ~useless.