I had similar ideas on the back burner for per site scripts. MV3 sure makes things annoying at times! I want to build something that's kind of a mix between what you're doing here and what I've been experimenting with.
Did you experiment with the sidePanel api yet? You can open it, but the <input> will not get focus like with the popups. You'd think at least with the commands keybindings, or reacting to the browser action icon click this should be possible?!
I asked Lumos some questions and it's hitting the embeddings endpoint many times, and then when I asked a follow up question, it hit the same endpoints again, each taking a few hundred ms. It's not caching them? TBD?
p.s. iirc There's some way to set the key in the manifest such that you'll get a static id, useful for OLLAMA_ORIGINS where you don't need * in the examples.
p.p.s starling-lm is really great for a default local model for these types of things
I did not look into the sidePanel API yet. I will look though. I'm still fairly new to Chrome extension development. I just discovered the Options page heh...
Regarding the caching -- the entire RAG workflow needs an update. I'll spend some time on it. Thanks for calling out and thanks for the PR!
I elaborate a little bit more on my thinking and approach for parsing in this blog post (4 min read): https://medium.com/@andrewnguonly/local-llm-in-the-browser-p...