| I just shipped a new llm-embed-jina plugin for my LLM tool which provides access to these new Jina models: https://github.com/simonw/llm-embed-jina Here's how to try it out. First, install LLM. Use pip or pipx or brew: brew install llm
Next install the new plugin: llm install llm-embed-jina
You can confirm the new models are now available to LLM by running: llm embed-models
You should see a list that includes "jina-embeddings-v2-small-en" and "jina-embeddings-v2-base-en"To embed a string using the small model, run this: llm embed -m jina-embeddings-v2-small-en -c 'Hello world'
That will output a JSON array of 512 floating point numbers (see my explainer here for what those are: https://simonwillison.net/2023/Oct/23/embeddings/#what-are-e...)Embeddings are only really interesting if you store them and use them for comparisons. Here's how to use the "llm embed-multi" command to create embeddings for the 30 most recent issues in my LLM GitHub repository: curl 'https://api.github.com/repos/simonw/llm/issues?state=all&filter=all' \
| jq '[.[] | {id: .id, title: .title}]' \
| llm embed-multi -m jina-embeddings-v2-small-en jina-llm-issues - \
--store
This creates a collection called "jina-llm-issues" in a default SQLite database on your machine (the path to that can be found using "llm collections path").To search for issues in that collection with titles most similar to the term "bug": llm similar jina-llm-issues -c 'bug'
Or for issues most similar to another existing issue by ID: llm similar jina-llm-issues 1922688957
Full documentation on what you can do with LLM and embeddings here: https://llm.datasette.io/en/stable/embeddings/index.htmlAlternative recipe - this creates embeddings for every single README.md in the current directory and its subdirectories. Run this somewhere with a node_modules folder and you should get a whole lot of interesting stuff: llm embed-multi jina-readmes \
-m jina-embeddings-v2-small-en \
--files . '**/README.md' --store
Then search them like this: llm similar jina-readmes -c 'backup tools'
|