| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TuringNYC 54 days ago

I'm trapped on Azure at work and we're constantly waiting for Azure pg to catch up with modernity.

For example, you cant use this: https://www.paradedb.com/blog/hybrid-search-in-postgresql-th...

Also for example, you dont get ultra-wide high dimensionality vectors.

It is nice they are open sourcing pg_durable, but how about adopting table stakes I'd get with AWS?

4 comments

tjgreen 54 days ago

ParadeDB is AGPL so not generally available on the hyperscalars. However, you can use https://github.com/timescale/pg_textsearch on Azure HorizonDB (and likely soon Flex). Disclosure: I'm the pg_textsearch maintainer and now at Azure.

I didn't quite follow your comment about vector support, are you asking for something beyond what pgvector + diskann provide (both available on Azure)?

link

philippemnoel 53 days ago

ParadeDB maintainer here :). We would happily make it available on Azure (and all other cloud providers!) if there were a way for us to earn a living in doing so.

Fyi, we are in discussion with some hyperscalers on making this possible.

link

TuringNYC 53 days ago

>> I didn't quite follow your comment about vector support, are you asking for something beyond what pgvector + diskann provide (both available on Azure)?

You dont support ultra-wide vectors from the largest embeddings models. We have to wierd stuff like chop up vectors across fields.

link

0xCMP 53 days ago

Some thing I've learned, but rarely seen explained anywhere: Storing the vectors is most likely not an issue, mostly likely you're having a problem with the indexes on top of them in which case you can use quantized vector indexes[0] (handled by pgvector) which will get past the limits imposed by PostgreSQL.

I had to switch off pgvecto.rs at some point and figured that out.

I don't have specific experience with the Azure environment here, but this probably applies if you have access to pgvector.

[0]: Types of indexes + number of bits supported at bottom of this section: https://github.com/pgvector/pgvector#hnsw

link

moron4hire 54 days ago

I'm sorry, I'm sure you've considered this, but why couldn't you create a bare VM with Postgres vCurrent installed?

link

oofbey 53 days ago

You could. But then you’re also building from scratch HA failover, backups, replica management, monitoring, etc - cloud vendor managed RDBMS come with lots of niceties. All of which are possible to set yourself. But a hassle, and difficult to make bullet proof.

link

FuriouslyAdrift 54 days ago

Wouldn't Azure Cosmos DB be better suited for vector searches?

link

eddythompson80 53 days ago

Never ever use Azure Cosmos DB. The entire point is to lock you in. This isn’t some paranoid shit either. We use azure a lot, and I have worked with many people designing systems on Azure. Always avoid cloud providers lock in services. That’s their bread and butter. They want you to use them. They want you using Azure Cosmos DB, Azure Event Hubs, Azure Apps, Azure DataLake, etc. Same with AWS. Don’t be naive. Use Azure VMs, Azure Postgres, Azure Redis. Those are fine. You’re just paying someone for the operational cost of a service, but you can migrate of. There is no migration from Cosmos or DataLake. They tell you you can abstract your code, but that never works. They know you will be locked in. That’s the entire business model. Also resist the temptation of the offers they’ll through at you to link those services with all their other crap. Don’t be naive.

link

antonkochubey 54 days ago

no - locking yourself into proprietary single-vendor solutions is never a better option

link

jiggawatts 53 days ago

I’m not sure why you’re getting downvoted because CosmosDB doesn’t even have a local install edition. Conversely the cloud hosted offering is slower than cold molasses and costs most of your body parts… per month.

link

abeomor 54 days ago

Hey! I'm a PM on the Azure PG team and work on AI features on Postgres. Wanted to address your points directly because we actually ship the capabilities you're asking about, we have made ALOT of progress in the last 3-6 months:

Hybrid search (BM25 + vector): Worth noting that ParadeDB's pg_search isn't an AWS-native feature either, you'd need to self-host it on EC2. On Azure PostgreSQL, we built pg_textsearch which provides the same BM25 ranking model (term frequency saturation, document-length normalization, IDF) natively. Fun fact, the main contributor of pg_textsearch is now on the Azure Postgres team :)

Docs: https://learn.microsoft.com/en-us/azure/horizondb/ai/full-te...

High-dimensional vectors: This is actually an area where we're ahead. pgvector with HNSW caps at 2,000 dimensions. We support pgvector for vector storage and search, and for high-dimensional / large-scale workloads we ship pg_diskann — Microsoft's graph-based vector index that supports up to 16,000 dimensions and also does advanced in-index filtering (your WHERE clauses get evaluated during graph traversal, so you don't lose recall on selective predicates).

pgvector: https://learn.microsoft.com/en-us/azure/horizondb/ai/vector-...

DiskANN high-dimension support: https://learn.microsoft.com/en-us/azure/horizondb/ai/vector-...

These are available today on Azure PostgreSQL, specifically Azure HorizonDB (Preview). Happy to dig into specifics if you have a particular workload in mind.

link

jbonatakis 53 days ago

> we built pg_textsearch

Maybe you meant to word this differently and I’m nitpicking, but didn’t TJ Green build this while he was still at Tiger Data?

link

abeomor 53 days ago

Great call out! I meant to say we built support for pg_textsearch extension on HorizonDB

link