So basically BigQuery from AWS. Looks good on first sight, a bit late. Personally worked for a large org which has just moved from BigQuery from Redshift and I have to say that BigQuery is the much better product.
The only problem with BigQuery is that it's on GCP, so either you have to migrate your whole workload over or you end up eating a lot of data transfer costs.
Having used both I do think BigQuery is better in a lot of ways (although it's easier to make it a lot expensive too), but I'm really excited to see Redshift catch up. Adding the serverless options are really great too since my biggest complaint with Redshift was managing the quantity and type of the underlying instances.
I have not used it, but my understanding is that BigQuery Omni is "BigQuery running on AWS/Azure". My understanding is that they are running Anthos on AWS (managed by Google) and they offer BigQuery as a service from that AWS infra managed by Google.
Athena is not BigQuery. Athena is just a fancy wrapper around Hive (you can see this right from the log output) and just runs map reduce over your S3 data. It's a great tool for what it is.
BigQuery is a full database. It is significantly faster than running anything from Athena. The closest comparison on AWS is Redshift.
I think Athena has a lot of value (especially given its pricing), but you're not wrong that it has limitations.
Getting data into Athena isn't something that is just done for you. Athena just takes what you've put on S3 and queries over it - and leaves getting it onto S3 (and into an efficient format) as an exercise for the reader.
Athena's speed varies a lot depending on what format you put things in. Querying over CSVs will mean that you're slow and reading a lot of data. Querying over ORC (column-store) files is pretty quick.
The big thing is Athena's pricing. They price it on how much data you read in reality not how much data would be read if things weren't optimized. BigQuery charges you based on how much data would be read if it weren't optimized. With BigQuery, an integer is always 8 bytes. It doesn't matter if they're able to optimize it down to nothing using RLE (run length encoding). You still pay the full 8 bytes. If your ORC files make that integer column tiny, you get the benefit of that.
BigQuery is great, but Athena's pricing is a lot cheaper given that you get to benefit from any storage optimization you do.
Out of curiosity, how have you used Athena that you're seeing it be so much slower? In my experience, BigQuery is faster (maybe 2x faster), but I've been using column-oriented data with Athena. If you're using CSVs with Athena, it will be way slower than BigQuery.
I'm always a little surprised that AWS doesn't build Athena out more, but I guess if they did they'd want money and margin for the value add. Still, Athena is a pretty decent serverless Presto and Presto can work pretty well over data in column formats.
Yeah I get that they want to go after Snowflake, Databricks and BigQuery, but AWS it not known for delivering high quality software. They have few things that are very good (EC2, S3, Lambda) and the rest is done by their B-team and barely holding together.
I have to say, speaking for Redshift, I significantly agree with the critique.
There are too many simple problems which should have been caught in testing, and the problem which have over time been found absolutely imply unprofessional, even amateurish software development standards.
For example, recently, the format of the version string was changed. This broke a lot of existing software, which had hard coded parsing - SQL Alchemy stopped working - so did AWS's own JDBC driver.
This on the face of it indicates the RS test suite does not include any connections over JDBC.
It then turned out the version string had anyway been inaccurate for months, because RS had moved from GCC 3.4.2 (I think it was) to 7.3. But the version string kept reporting the old numbers.
I can't even begin to describe how many issues - flat factual errors, and profoundly meaningful gobble-de-gook - in the official docs.
The whole thing just feels too much like amateur hour.
Having used both I do think BigQuery is better in a lot of ways (although it's easier to make it a lot expensive too), but I'm really excited to see Redshift catch up. Adding the serverless options are really great too since my biggest complaint with Redshift was managing the quantity and type of the underlying instances.