| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chaxor 1101 days ago
	Is there a decent way of converting to a structure with a very constrained vocabulary? For example, given some input text, converting it to something like {"OID-189": "QQID-378", "OID-478":"QQID-678"}. Where OID and QQID dictionaries can be e.g. millions of different items defined by a description. The rules for mapping could be essentially what looks closest in semantic space to the descriptions given in a dictionary. I know this should be able to be solvable by local LLMs and bert cosine similarity (it isn't exactly, but it's a start on the idea), but is there a way to do this with decoder models rather than encoder models with other logic?

1 comments

jiggawatts 1101 days ago

You can train custom GPT 3 models, and Azure now has vector database integration for GPT-based models in the cloud. You can feed it the data, and ask it for the embedding lookup, etc...

You can also host a vector database yourself and fill it up with the embeddings from the OpenAI GPT 3 API.

link

chaxor 1100 days ago

Unfortunately this doesn't really work, as the model is not limited in it's decoding vocabulary.

Does anyone have other suggestions that may work in this space?

link