| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by txtai 1224 days ago
	If speed and price are concerns, use the FOSS models available on the Hugging Face Hub: https://hf.co/models. Thousands of models, different sizes and tasks. Download locally and fine-tune, if necessary. For those specifically interested in text embeddings, here is a good analysis: https://medium.com/@nils_reimers/openai-gpt-3-text-embedding...

2 comments

wongarsu 1224 days ago

Normally I'd point out how these are a lot less capable than GPT3. But the article ends up fine tuning GPT Babbage, and multiple free models can outperform Babbage, so this is very solid advise.

link

txtai 1224 days ago

Starting with HF models and moving to a large model like GPT3 when the task calls for it is a good approach to take with almost all tasks.

link

joedevon 1224 days ago

How about fine tuning testing w/ Davinci and then scaling it down for the other models or HF once you've proven it works. I believe the openai docs propose this approach (minus HF of course)

link

txtai 1223 days ago

That's up to you. Many don't want to open an account and pay in order to explore what's possible. There are LLMs available on the HF Hub, such as google/flan-t5-xl.

link

rozgo 1223 days ago

This works well. Starting with bigger models allows us to explore whats possible. I find it easier to scale down from here.

link

nafizh 1223 days ago

The medium article you posted does analysis on OpenAI's old embeddings. OpenAI has a single new embedding model [0] that replaces all the old models and it's also super cheap ($0.0004 / 1K tokens).

0. https://openai.com/blog/new-and-improved-embedding-model/

link

txtai 1223 days ago

The first comment in that article has details on the new model. Not the original author but per their testing they said they paid $70 to encode 1M records. The embeddings are 1536 dimensions, which require a lot of vector storage. The HF hub has open models for 384 dimensions or 768 dimensions that work well for a lot of use cases.

link