| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jnowlan 4142 days ago

Zeek,

I've been lucky enough to have benefitted from Sqlalchemy and Mako (but it's been awhile). Thanks.

This article looked like it was going to hit the sweet spot of stuff I'm curious about, but I found I was still left with questions. If you (or anyone) will indulge me... I'll try and ask a question to help clarify matters.

I work at a University on legacy ERP system(s). During registration there are 800+ concurrent connections but normally it floats around 200. Most all of these connections are idle. As you pointed out, a db may not be io/context bound (still hazy on that one). At the end of the day I consider myself technically astute, but basically a crud business programmer. I understand ACID and transactions; threads and async, maybe not so much.

Where I've always thought async could provide benefit would be in the following scenario. Our apps make a large # of procedural db calls today. If after studying them I realize that many are independent (i.e. reads) and could be 'batched' could that not provide a big performance/latency improvement? I.e. instead of the serial sequence of calls that happen now (even if a stored proc), async allows me to submit multiple sql calls. What I'm calling batch. In this ideal world, sqlalchemy would take care of the details (perhaps with some guidance directives as to whether ordering of results was important) and assemble the results.

Is this not a possible future 'async' sqlalchemy with superior responsiveness? Don't threads block on each sql request?

2 comments

zzzeek 4142 days ago

if you want to send out a series of long-reply SQL calls and wait for them all in batch, that is doable with Postgresql's async support, but they'd all be on distinct database connections, so you wouldn't get transactional consistency between these calls, but maybe that's not important. You can do the same thing with threads but it would mean you'd need to spin up that many threads, but at least would be something you could test in the short term to see if it is in fact feasible.

The rudimental SQLAlchemy-emulation system within aiopg right now can probably accommodate this use case but it is Postgresql specific. "Legacy ERP system" sounds like there's some different database in play there, if you are relying upon closed-source drivers you'd have to get access to a non-blocking API within them. Else you're stuck with threads.

acveilleux's point about caching here is very relevant and I thought of that also. if these are indeed independently read sets of fairly non-changing data, pulling it from a cache is the more traditional approach to reducing latency (as well as the need for 800+ database connections).

link

laurencerowe 4142 days ago

Postgres lets you synchronize snapshots across connections so that they all see the same data (though subsequent changes are not visible to the other transactions unless you export another snapshot.) http://www.postgresql.org/docs/9.4/static/functions-admin.ht...

This lets you parallelize work across multiple processes while maintaining consistency. When the client is the limiting factor you can use this with multiprocessing. When the db server is the limiting factor you can just use threads (or async.) Postgres backend processes are essentially single threaded.

link

jnowlan 4141 days ago

So caching. Doesn't the db do that? And as much as I hate to say it, are the added complexities (webserver caches) better than the even more traditional approach - throw hardware at it? Always lots to think about. Thanks!

link

meowface 4141 days ago

>And as much as I hate to say it, are the added complexities (webserver caches) better than the even more traditional approach - throw hardware at it?

That depends, do you want to save (some) development time or do you want to save (a lot of) money?

link

aidos 4141 days ago

When I was more junior I was always told to use caching as a last resort. It's a good attitude to take to make sure you're not doing something stupid and hiding it with caching. These days though I look for caching opportunities up-front. In fact, I'll design with them in mind.

I did some work for a client some time ago that were expecting a lot of read load. Their backend had a bunch of constantly changing data in mongo - but it only refreshed every 10 seconds. I told them initially to just output the aggregated data to an S3 object and have all the clients access it from there. They decided to run loads of servers instead, they were muttering something about AWS Autoscale (even though I told them that wouldn't help).

As expected, I got a call one Friday evening asking if I could take a look at why their servers were timing out. When I got there, there were about 15 frontend servers hammering 1 poor mongo box that was aggregating the same query again and again - and within any 10 second window always getting the same result. I stripped it down to 1 frontend box with an nginx cache (after jumping through a hoop to support jsonp).

After the dust settled they apparently didn't want to admit that it was something that could just be solved with caching so it was described as a configuration issue to the business.

link

acveilleux 4142 days ago

Ultimately, those queries you want to batch are either:

1. Not critical to the logic of your app and thus should be fairly obvious candidate for caching.

2. Critical to the logic of your app and quite possibly done within a specific DB connection / transaction for locking reason.

Trying to be too clever and parallel often flounders on mundane things like running out of DB connections.

link