Hacker News new | ask | show | jobs
by simonw 974 days ago
I just shipped a new llm-embed-jina plugin for my LLM tool which provides access to these new Jina models: https://github.com/simonw/llm-embed-jina

Here's how to try it out.

First, install LLM. Use pip or pipx or brew:

    brew install llm
Next install the new plugin:

    llm install llm-embed-jina
You can confirm the new models are now available to LLM by running:

    llm embed-models
You should see a list that includes "jina-embeddings-v2-small-en" and "jina-embeddings-v2-base-en"

To embed a string using the small model, run this:

    llm embed -m jina-embeddings-v2-small-en -c 'Hello world'
That will output a JSON array of 512 floating point numbers (see my explainer here for what those are: https://simonwillison.net/2023/Oct/23/embeddings/#what-are-e...)

Embeddings are only really interesting if you store them and use them for comparisons.

Here's how to use the "llm embed-multi" command to create embeddings for the 30 most recent issues in my LLM GitHub repository:

    curl 'https://api.github.com/repos/simonw/llm/issues?state=all&filter=all' \
    | jq '[.[] | {id: .id, title: .title}]' \
    | llm embed-multi -m jina-embeddings-v2-small-en jina-llm-issues - \
    --store
This creates a collection called "jina-llm-issues" in a default SQLite database on your machine (the path to that can be found using "llm collections path").

To search for issues in that collection with titles most similar to the term "bug":

    llm similar jina-llm-issues -c 'bug'
Or for issues most similar to another existing issue by ID:

    llm similar jina-llm-issues 1922688957
Full documentation on what you can do with LLM and embeddings here: https://llm.datasette.io/en/stable/embeddings/index.html

Alternative recipe - this creates embeddings for every single README.md in the current directory and its subdirectories. Run this somewhere with a node_modules folder and you should get a whole lot of interesting stuff:

    llm embed-multi jina-readmes \
      -m jina-embeddings-v2-small-en \
      --files . '**/README.md' --store
Then search them like this:

    llm similar jina-readmes -c 'backup tools'
7 comments

The only feedback I had from your embedding post was

    wish we could create the array of floating points without openai

Great timely turnaround time, good sir. Ht
Thank you so much for all the work you've put into llm!
Thanks, this is wonderfully simple to use. Just managed to package this up using docker and was able to use it without a lot of drama. Nice how simple this is to use.

I've dabbled a bit with elasticsearch dense vectors before and this model should work great for that. Basically, I just need to feed it a lot of content and add the vectors and vector search should work great.

FYI it seems that llm install llm-embed-jina is missing yaml dependency

  File "/opt/homebrew/Cellar/llm/0.11_1/libexec/lib/python3.12/site-packages/llm/default_plugins/openai_models.py", line 17, in <module>
    import yaml
ModuleNotFoundError: No module named 'yaml'
Thanks! I wonder if the Python 3.12 upgrade broke something.

The pyyaml package is correctly listed on the formula page though: https://formulae.brew.sh/formula/llm

Excellent! And you were just saying how risky it is to rely long-term on OpenAI text embeddings in your post on the topic. The timing for this open source option worked out nicely.
JFYI, this is what happens on my M1 Macbook:

$ brew install llm $ llm ModuleNotFoundError: No module named 'typing_extensions'

Not sure where to report it.

Whoa, that is a weird one. Do you know what version of Python you have from Homebrew?

It looks like that package is correctly listed in the formula: https://github.com/Homebrew/homebrew-core/blob/a0048881ba9a2...

    % python3 --version
    Python 3.11.6
    
    % which python3
    /opt/homebrew/bin/python3

    % brew info python-typing-extensions
    ==> python-typing-extensions: stable 4.8.0 (bottled)
Probably not this, but check with `which llm` what that's running. I had weird issues not matching the documentation but just had some other random python cli tool called llm I'd put in my home bin for and forgotten about it.

    % which llm
    /opt/homebrew/bin/llm