Hacker News new | ask | show | jobs
by andy99 902 days ago
Also, I should promote the code I wrote for running this. It runs models in ggml format, the one I made available is an older checkpoint though. It's easy to convert the newer one. And it's in Fortran but it should be easy to get gfortran if you don't have it installed.

https://github.com/rbitr/llm.f90/tree/optimize16/purefortran

2 comments

Here is another inference implementation in Python (only dependency is PyTorch).

https://github.com/99991/SimpleTinyLlama

The new checkpoints did not seem much better and they changed the chat format for some reason, so I did not port the new checkpoints yet. Perhaps I'll get to it this weekend.

Man I didn't recognize your username but once you said Fortran I recognized you immediately. You are an inspiration, an example of true software engineering vis a vis what in day to day becomes what you can hire for.

Edit: you have some rare knowledge, I'm curious if you have any thoughts on small models good enough for RAG. Mistral 7B is in my testing buts it's laughably slow and 7B is just too much for mobile, both iOS and Android get crashy. (4 tkns/s on Pixel Fold, similar on iOS). Similar problems on web from a good-enough 2 year old i7.

I'd try Phi-2 but I want to charge for my app and the non-commercial usage license bars that. (all these hours building ain't free! And I can't responsibly give search away, scraping locally is too risky for the user, and the free search API I know of has laudable goals, but ultimately, is "trust me bro" as far as privacy goes)

I'm starting to think we might not get an open, RAG capable model sub 7B without a concerted open source effort. Stabilitys distracted and spread thin, MS is all in on AI PCs(tm), and it's too commercially valuable for the big boys to give away

So much changed in a day. What a field!