Hacker News new | ask | show | jobs
by bassdropvroom 1902 days ago
There is aioboto3 which wraps boto3 in asyncio calls. It's a lot simpler than all of this.
3 comments

Having used aiobotocore, it's pretty good. It can be a bit sketchy about releasing connection pools, though. Normally you shouldn't run into any issues, but we've had trouble with running out of filehandles on AWS Lambda.

I would also point out that for many cases, using a ThreadPoolExecutor works fine.

I maintain a simulation engine and it's distributing work out to AWS Lambda. So I had two distributors, one thread based and one async based. (Also one based on multiprocessing that runs work on a local machine.)

From my tests, the performance is basically equivalent, which is not surprising: most of it was just waiting and then processing incoming responses. Threads work great at this, and botocore is designed to work with threads.

I eventually went with async because Python doesn't let you prioritize threads. That means that if you get a "stop" signal, in the threaded model, the command thread is competing with many worker threads. In the async model, the workers are all in one thread, so the command thread will be woken up per the switch interval[1].

So, broadly, if you're going to need a command channel that must respond in a timely fashion, I'd recommend piling workers into an event loop through async.

The other possibility is to have a command channel run in another process, but then you need to get the fork right, do IPC, etc.

[1]: https://docs.python.org/3/library/sys.html#sys.setswitchinte...

In this case I just wanted to keep it simple and do it the "native" way. I've been hearing a lot about aioboto3 and I'll surely check it out. However, it seems to have some limitations like https://github.com/boto/botocore/issues/458
That issue is just botocore not supporting async, which aiobotocore addresses. It's also very long; is there a particular comment on that issue that you wanted to point out?
I just mentioned the issue because a colleague of mine did some quick tests with aioboto3 and he told me it was downloading the files sequentially. We thought it might be related to that.

You made a great point about going async if you need a working command channel.

Have you benchmarked this solution on a high bandwidth connection? If so, what's the max throughput that it achieved?
Unfortunately I haven't. If someone wants to benchmark this solution I'd be glad to hear about that!