| HN Mirror

Hi, thanks for the comment.

I put this project pretty quickly and I don't want to pretend there is tremendous depth behind any of the decisions I made :/.

For now the only source is the Stats Pearl book published on ncbi.nlm.nih.gov (the only place this is mentioned is here: https://github.com/clint-llm/clint-cli/blob/main/README.md#u...). It contains about 11,000 peer reviewed articles about anatomy and conditions: https://www.ncbi.nlm.nih.gov/books/NBK430685/. The copyright terms are CC BY-NC-ND 4.0. I might add some Wikipedia articles to this in the future.

I chunk the documents by section, and embed only the first 2048 tokens that fit in the OpenAI embeddings. I'm using OpenAI for embedding as opposed to something like all-minilm-l6-v2 because I don't want to have to ship a model to the clients (transfer times could be large and supporting this would increase the complexity of the library).

I didn't experiment with different chunk sizes, and I suspect something smaller would be more beneficial as you point out. But it would also complicate the logic, and most choices I made in this project were to remove complexity and get this done quickly. If I revisit this I might chunk by paragraph on your advice :).

RAG is indeed what is being used. But it a few different ways. The diagnoses are refined using a pretty straightforward RAG prompt: consider these notes ... consider this diagnosis ... can you improve on it etc.

But in a way the entire program is RAG-based. In most prompts some documents are added to the system message for context. It's not clear that the information in the documents is always used, but based on a bit of experimentation it seems to improve various responses.

I have no plans to fine tune. I'm not sure how beneficial would be fine tuning here. The model needs a fair bit of general knowledge to reason about descriptions of symptoms. Fine tuning could over-specialize it. And hallucinations could come up even with fine-tuning, so you would probably want a RAG-like prompt to get it to focus on real details.

This this is very much a hobby, so I haven't dug deep enough to look into other models. But I'd be _very_ curious to see how GPT 3.5 with RAG compares to vanilla MedPALM. In my experience GPT 3.5 can reason quite well about with the right documents in the context.