Hacker News new | ask | show | jobs
by TuringNYC 6 days ago
I'm trapped on Azure at work and we're constantly waiting for Azure pg to catch up with modernity.

For example, you cant use this: https://www.paradedb.com/blog/hybrid-search-in-postgresql-th...

Also for example, you dont get ultra-wide high dimensionality vectors.

It is nice they are open sourcing pg_durable, but how about adopting table stakes I'd get with AWS?

4 comments

ParadeDB is AGPL so not generally available on the hyperscalars. However, you can use https://github.com/timescale/pg_textsearch on Azure HorizonDB (and likely soon Flex). Disclosure: I'm the pg_textsearch maintainer and now at Azure.

I didn't quite follow your comment about vector support, are you asking for something beyond what pgvector + diskann provide (both available on Azure)?

ParadeDB maintainer here :). We would happily make it available on Azure (and all other cloud providers!) if there were a way for us to earn a living in doing so.

Fyi, we are in discussion with some hyperscalers on making this possible.

>> I didn't quite follow your comment about vector support, are you asking for something beyond what pgvector + diskann provide (both available on Azure)?

You dont support ultra-wide vectors from the largest embeddings models. We have to wierd stuff like chop up vectors across fields.

Some thing I've learned, but rarely seen explained anywhere: Storing the vectors is most likely not an issue, mostly likely you're having a problem with the indexes on top of them in which case you can use quantized vector indexes[0] (handled by pgvector) which will get past the limits imposed by PostgreSQL.

I had to switch off pgvecto.rs at some point and figured that out.

I don't have specific experience with the Azure environment here, but this probably applies if you have access to pgvector.

[0]: Types of indexes + number of bits supported at bottom of this section: https://github.com/pgvector/pgvector#hnsw

I'm sorry, I'm sure you've considered this, but why couldn't you create a bare VM with Postgres vCurrent installed?
You could. But then you’re also building from scratch HA failover, backups, replica management, monitoring, etc - cloud vendor managed RDBMS come with lots of niceties. All of which are possible to set yourself. But a hassle, and difficult to make bullet proof.
Wouldn't Azure Cosmos DB be better suited for vector searches?
Never ever use Azure Cosmos DB. The entire point is to lock you in. This isn’t some paranoid shit either. We use azure a lot, and I have worked with many people designing systems on Azure. Always avoid cloud providers lock in services. That’s their bread and butter. They want you to use them. They want you using Azure Cosmos DB, Azure Event Hubs, Azure Apps, Azure DataLake, etc. Same with AWS. Don’t be naive. Use Azure VMs, Azure Postgres, Azure Redis. Those are fine. You’re just paying someone for the operational cost of a service, but you can migrate of. There is no migration from Cosmos or DataLake. They tell you you can abstract your code, but that never works. They know you will be locked in. That’s the entire business model. Also resist the temptation of the offers they’ll through at you to link those services with all their other crap. Don’t be naive.
no - locking yourself into proprietary single-vendor solutions is never a better option
I’m not sure why you’re getting downvoted because CosmosDB doesn’t even have a local install edition. Conversely the cloud hosted offering is slower than cold molasses and costs most of your body parts… per month.
Hey! I'm a PM on the Azure PG team and work on AI features on Postgres. Wanted to address your points directly because we actually ship the capabilities you're asking about, we have made ALOT of progress in the last 3-6 months:

Hybrid search (BM25 + vector): Worth noting that ParadeDB's pg_search isn't an AWS-native feature either, you'd need to self-host it on EC2. On Azure PostgreSQL, we built pg_textsearch which provides the same BM25 ranking model (term frequency saturation, document-length normalization, IDF) natively. Fun fact, the main contributor of pg_textsearch is now on the Azure Postgres team :)

Docs: https://learn.microsoft.com/en-us/azure/horizondb/ai/full-te...

High-dimensional vectors: This is actually an area where we're ahead. pgvector with HNSW caps at 2,000 dimensions. We support pgvector for vector storage and search, and for high-dimensional / large-scale workloads we ship pg_diskann — Microsoft's graph-based vector index that supports up to 16,000 dimensions and also does advanced in-index filtering (your WHERE clauses get evaluated during graph traversal, so you don't lose recall on selective predicates).

pgvector: https://learn.microsoft.com/en-us/azure/horizondb/ai/vector-...

DiskANN high-dimension support: https://learn.microsoft.com/en-us/azure/horizondb/ai/vector-...

These are available today on Azure PostgreSQL, specifically Azure HorizonDB (Preview). Happy to dig into specifics if you have a particular workload in mind.

> we built pg_textsearch

Maybe you meant to word this differently and I’m nitpicking, but didn’t TJ Green build this while he was still at Tiger Data?

Great call out! I meant to say we built support for pg_textsearch extension on HorizonDB