Show HN: Chat with GPT about medical issues, get answers from medical literature | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

Show HN: Chat with GPT about medical issues, get answers from medical literature (github.com)

45 points by garrinm 1070 days ago

Clint is an open-sourced medical information lookup and reasoning tool.

Clint enables a user to have an interactive dialogue about medical conditions, symptoms, or simply to ask medical questions. Clint helps connect regular health concerns with complex medical information. It does this by converting colloquial language into medical terms, gathering and understanding information from medical resources, and presenting this information back to the user in an easy-to-understand way.

One of the key features of Clint is that its processing is local. It's served using GitHub pages and utilizes the user's OpenAI API key to make requests to directly to GPT. All processing, except for that done by the LLM, happens in the user's browser.

I recently had a need to lookup detailed medical information and found myself spending a lot of time translating my understanding into the medical domain, then again trying to comprehend the medical terms. That gave me the idea that this could be a task for an LLM.

The result is Clint. It's a proof-of-concept. I currently have no further plans for the tool. If it is useful to you as-is, great! If it is useful only to help share some ideas, that's fine too.

8 comments

TuringNYC 1070 days ago

Dear @garrinm Firstly, thank you for sharing this! We built something like this at work, and i'd love if you could share details on your effort.

I tried to follow: https://github.com/clint-llm/clint-cli/blob/main/clint/scrip... but it wasnt clear -- are you indexing the medical literature in entirety or just abstracts?

We tried to do this with arXiv and others, but getting commercial rights was difficult and we got stuck on that, could you share which medical literature source you used.

I tried to follow the code and it looks like you embed, so i'm assuming you're using RAG, is that it, or are you trying to fine-tune also? I didnt see any fine-tune code. (We didnt fine tune due to cost)

Did you benchmark different embedding chunk sizes, etc? (Yes for us! We've tried a matrix search of chunk sizes, including sliding window and found the sweet spot for different types of media, usually a single paragraph)

Did you manage to get access to a fine-tuned model like MedPALM and benchmark that? (we are still awaiting access)

garrinm 1070 days ago

Hi, thanks for the comment.

I put this project pretty quickly and I don't want to pretend there is tremendous depth behind any of the decisions I made :/.

For now the only source is the Stats Pearl book published on ncbi.nlm.nih.gov (the only place this is mentioned is here: https://github.com/clint-llm/clint-cli/blob/main/README.md#u...). It contains about 11,000 peer reviewed articles about anatomy and conditions: https://www.ncbi.nlm.nih.gov/books/NBK430685/. The copyright terms are CC BY-NC-ND 4.0. I might add some Wikipedia articles to this in the future.

I chunk the documents by section, and embed only the first 2048 tokens that fit in the OpenAI embeddings. I'm using OpenAI for embedding as opposed to something like all-minilm-l6-v2 because I don't want to have to ship a model to the clients (transfer times could be large and supporting this would increase the complexity of the library).

I didn't experiment with different chunk sizes, and I suspect something smaller would be more beneficial as you point out. But it would also complicate the logic, and most choices I made in this project were to remove complexity and get this done quickly. If I revisit this I might chunk by paragraph on your advice :).

RAG is indeed what is being used. But it a few different ways. The diagnoses are refined using a pretty straightforward RAG prompt: consider these notes ... consider this diagnosis ... can you improve on it etc.

But in a way the entire program is RAG-based. In most prompts some documents are added to the system message for context. It's not clear that the information in the documents is always used, but based on a bit of experimentation it seems to improve various responses.

I have no plans to fine tune. I'm not sure how beneficial would be fine tuning here. The model needs a fair bit of general knowledge to reason about descriptions of symptoms. Fine tuning could over-specialize it. And hallucinations could come up even with fine-tuning, so you would probably want a RAG-like prompt to get it to focus on real details.

This this is very much a hobby, so I haven't dug deep enough to look into other models. But I'd be _very_ curious to see how GPT 3.5 with RAG compares to vanilla MedPALM. In my experience GPT 3.5 can reason quite well about with the right documents in the context.

TuringNYC 1070 days ago

Thanks for the details. We're also trying to compare GPT 4 vs GPT 3.5 vs LLAMA2 and we've been putting together "exams" though one other thing on our mind is what happens when the next foundational model comes and scrapes our exams and the exam questions make their way into the training set.

This really does make me wonder about all the attempts to administer the USMLE to LLMs -- what are the chances the USMLE administered was uniquely created vs just put together from exam questions online? Ultimately no LLM is required if these exam questions are in the training set...just need a k-v lookup :-)

linsomniac 1070 days ago

My wife is an RN and asked it some medical questions and pronounced it "pretty neat" and also said "We're going to get so much better doctoring in the future."

She did find that some of the follow-up questions seemed to lead it down the wrong path, where the follow-ups seemed to be much more important than the initial messages, and she got better results by taking the original message and follow-ups and combining them into a new message to submit to a cleared session.

garrinm 1069 days ago

Thank you! That's great feedback. Clint is very much a proof-of-concept, I'm sure this idea can be taken much further when done properly. But to get a "pretty neat" from an RN feels like an accomplishment xD.

I see what you mean about going down the wrong path. I can explore a couple of modifications to help it have better context of the conversation as a whole.

jcutrell 1070 days ago

The health anxiety in me just found a whole new lifeline.

garrinm 1069 days ago

Unfortunately you can ask Clint to tell you just about anything. But fortunately it will at least try to tell you that some things are less plausible than others.

gumballindie 1069 days ago

There is a failed startup in the uk called babylon health that tried using ai chat bots for medical diagnosis. Turns out there aren't many people out crazy enough to rely on it for health advice.

garrinm 1069 days ago

Clint should not be used for diagnosis. Only for personal information. Like an web search but more interactive.

clint 1070 days ago

why am I an llm ??

NBJack 1070 days ago

I regret to inform you we have decided to replace you with a model. It's nothing personal, but we had such a good time training it on your comment history, and your replacement model offered us cookies. They're made with flour, sugar, and Windex, but their heart is in the right place.

More importantly, the new Clint is able to respond to 20 comments a minute, and engagement on the platform is going up!

/s

SOLAR_FIELDS 1070 days ago

Funny, but more generally it’s a mildly interesting question - why did the creators choose that name for their AI?

garrinm 1070 days ago

Clint could be short for CLINician Tool. I didn't put much effort into this name xD.

whycome 1070 days ago

just don't write it in all-caps with a sans-serif font and bad kerning.

TRiG_Ireland 1070 days ago

All I want is something to reassure me that I don't have housemaid's knee.

ChildOfChaos 1070 days ago

Seems good, but I need to pay OpenAI for API access to use this right?

garrinm 1069 days ago

Yes, Clint is a proof-of-concept, meant to showcase this use case of LLMs more than anything else.

soared 1070 days ago

What are the implications of hallucinations in this context?

garrinm 1069 days ago

Clint should be used only to research information. It provides links to resources. It uses Retrieval Augmented Generation which is less prone providing incorrect information, though it can still happen.