| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simonw 575 days ago

Wrote some notes on this here: https://simonwillison.net/2024/Nov/22/amazon-s3-append-data/

Key points:

- It's just for the "S3 Express One Zone" bucket class, which is more expensive (16c/GB/month compared to 2.3c for S3 standard tier) and less highly available, since it lives in just one availability zone

- "With each successful append operation, you create a part of the object and each object can have up to 10,000 parts. This means you can append data to an object up to 10,000 times."

That 10,000 parts limit means this isn't quite the solution for writing log files directly to S3.

3 comments

jiggawatts 575 days ago

Wow, I'm surprised it took AWS this long to (mostly) catch up to Azure, which had this feature back in 2015: https://learn.microsoft.com/en-us/rest/api/storageservices/u...

Azure supports 50,000 parts, zone-redundancy, and append blobs are supported in the normal "Hot" tier, which is their low-budget mechanical drive storage.

Note that both 10K and 50K parts means that you can use a single blob to store a day's worth of logs and flush every minute (1,440 parts). Conversely, hourly blobs can support flushing every second (3,600 parts). Neither support daily blobs with per-second flushing for a whole day (86,400 parts).

Typical designs involve a per-server log, per hour. So the blob path looks like:

    "{account}/{path}/{year}/{month}/{day}/{hour}_{servername}.txt"

This seems insane, but it's not a file system! You don't need to create directories, and you're not supposed to read these using VIM, Notepad, or whatever.

The typical workflow is to run a daily consolidation into an indexed columnstore format like Parquet, or send it off to Splunk, Log Analytics, or whatever...

sofixa 575 days ago

> Wow, I'm surprised it took AWS this long to (mostly) catch up to Azure, which had this feature back in 2015:

Microsoft had the benefit of starting later and learning from Amazon's failures and successes. S3 dates from 2006.

That being said, both Microsoft and Google learned a lot, but also failed at learning different things.

GCP has a lovely global network, which makes multi-region easy. But they spent way too much time on GCE and lost the early advantage they had with Google App Engine.

Azure severely lacks in security (check out how many critical cross-tenant security vulnerabilities they've had in the past few years) and reliability (how many times have there been various outages due to a single DC in Texas failing; availability zones still aren't the default there).

ak217 574 days ago

Microsoft did this by sacrificing other features of object storage that S3 and GS had since the beginning, primarily performance, automatic scaling, unlimited provisioning and cross-sectional (region wide) bandwidth. Azure blob storage did not have parity on those features back in 2015 and data platform applications could not be implemented on top of it as a result. Since then they fixed some of these, but there are still areas where Azure lacks scaling features that are taken for granted on AWS and GCP.

jiggawatts 574 days ago

Today I learned that there is a 5 PB soft capacity limit for Azure blob storage: https://learn.microsoft.com/en-us/azure/storage/common/scala...

Also, a 200 Gbps egress limit.

How does that compare to S3?

Mind you, at this scale the storage cost is about $15K/mo, so it would be cost effective to throw some developer time at the problem of scaling out between multiple storage accounts. Or just call support to have the soft limit cap raised…

cedilla 575 days ago

If I need to consolidate anyway, is this really a win for this use case? I could just upload with {hour}_{minute}.txt instead of appending every minute, right?

jiggawatts 574 days ago

Consolidation is for archival cost efficiency and long-term analytics. If you don't append regularly, you can lose up to 59 minutes of data.

zaphirplane 574 days ago

In all fairness. Shipping unreliable features for unreliable services is a lot easier

kochie 575 days ago

AWS ranks features based on potential income from customers. Normally there’s a fairly big customer PFR needed to get a service team to implement a new feature.

jiggawatts 575 days ago

I always found it strange that AWS seems to have 2-3x as many products or services as Azure, but it has these bizarre feature gaps where as an Azure user I think: "Really? Now? In this year you're finally getting this?"

(Conversely, Azure's low-level performance is woeful in comparison to AWS and they're still slow-walking the rollout of their vaguely equivalent networking and storage called Azure Boost.)

bradleyjg 574 days ago

Low level performance is underselling the issues. Blob storage is not infinitely scalable. That means it’s just not the same thing as s3.

WaxProlix 575 days ago

I've only used azure a little bit, and mostly liked it - but I'd love to know what kinds of things you're referring to here (mostly on AWS only, so probably I don't even know what I'm missing out on).

jiggawatts 574 days ago

What Azure has that from what I've seen AWS does not:

Resource Groups that actually act like folders, not just as special tags.

Resources with human-readable names instead of gibberish identifiers.

Cross-region and cross-subscription (equiv. to AWS account) views of all resources as the default, not as a special feature.

Single pane-of-glass across all products instead of separate URLs and consoles for each thing. E.g.: a VM writing to an S3 bucket dedicated to it are "far apart" from each other in AWS consoles, but the equivalent resources are directly adjacent to each other in an Azure Resource Group when viewed in its Portal.

Azure Application Insights is a genuinely good APM, and the Log Analytics workspace it uses under the hood is the consistent log collection platform across everything in Azure and even Entra ID and parts of Microsoft 365. It's not as featureful as Splunk, but the query language is up there in capability.

Azure App Service has its flaws, but it's by far the most featureful serverless web app hosting platform.

Etc...

withinboredom 574 days ago

Don’t forget, you don’t pay for a stopped vm in azure! You only pay while it is running. This makes things like dev environments much more affordable, since you won’t be paying for nights/weekends.

twisteriffic 574 days ago

Kusto is wonderful. I'd love to be able to use outside of log analytics.

omeid2 575 days ago

Not directly, but enough to write once every hour for more than a year!

simonw 575 days ago

Yeah, or I guess log rotation will work well - you can write 10,000 lines to one key and then switch to a new key name.

santiagobasulto 575 days ago

This will require some serious buffering.