Alfred-40B, an OSS RLHF version of Falcon40B

Y	Hacker News new \| ask \| show \| jobs

	Alfred-40B, an OSS RLHF version of Falcon40B (lighton.ai)
	80 points by nuitblanche 1052 days ago

7 comments

paulcjh 1052 days ago

Any performance benchmarks compared to other LLMs? Also, any performance increases on the orig Falcon model in inference speed?

We ditched most of our focus on Falcon 40B after Llama 2 70B came out, both the tokens per sec and quality of results are not even close.

link

brianjking 1052 days ago

I'm assuming this will have similar scores to the original 40B model, in which case LLaMa2 70b would outperform it. The avg score on the Open LLM Leaderboard of LLaMa2 70b instruct is 72.3.

Falcon-40B is 63.4 or 61.5 on the non instruction tuned version.

link

Tepix 1051 days ago

Did you fine-tune your own Llama 2? Llama2-chat is awfully gimped.

link

tysam_and 1052 days ago

Will every comment here be a question? Will someone break the trend?

link

version_five 1052 days ago

It's a good observations- there are so many unknowns about all these models. Every day there's a new wizard_uncensored_rhlf_alpaca_tuned_best_one_use_this_13B_4.6-bit_rqm.pth that gets released, it's almost impossible to know the relative merits and which are worth paying attention to.

link

0cf8612b2e1e 1052 days ago

How true. Every time I browse a model listing there are four word descriptions with little sense of versioning, provenance, hardware requirements, or any reason why I would choose one vs the other.

link

Ilasky 1052 days ago

What kind of hardware do I need to run this sufficiently well? I.e. say I want 10 tokens/s, what specs am I looking at?

link

brianjking 1052 days ago

This particular model has 83.66gb of model weights so you'll need to 2x Nvidia 80gb A100 at a minimum unless you're loading it in 8bit mode.

link

brianjking 1052 days ago

With that said, there are ggml/gptq and other optimization techniques.

link

brucethemoose2 1052 days ago

Pretty much anything with 32GB (?) total RAM+VRAM:

https://github.com/cmp-nct/ggllm.cpp

But its going to be slow without even a small Nvidia GPU (a 2060?). CPUs are really slow at prompt ingestion, and that can't be hidden with streaming.

link

brianjking 1052 days ago

Doesn't this new version of falcon need to be ggml'ed first?

link

version_five 1052 days ago

The architecture is the same I belive, it's just a fine tune so there's nothing special to be done for this version. That said, ggml doesn't support Falcon, but i saw today there is a fork that claims to, though I didn't try it.

link

brucethemoose2 1052 days ago

That link above is the fork ^

It uses the ggml library, just like llama.cpp does, and is indeed a fork of llama.cpp's implementation of ggml.

link

version_five 1052 days ago

Right, I'm being stupid, that's the fork I saw earlier today I didn't realize. Have you tried it? Iirc the documentation mentioned at 2-bit quantizatikn of the 40B model performing well. I've been using a 5-bit 7B llama2 which I'm generally happy with (because it can run in a pretty crappy machine) but interested to see the differences.

link

lp251 1052 days ago

Igor, is LightOn still working on optical computing? Or have you fully pivoted to genai without optics?

link

brianjking 1052 days ago

Will the team be releasing the momentum-internal dataset?

link

techn00 1052 days ago

Anyone keeping track of all these LLM releases?

link

sanity31415 1052 days ago

This is the closest thing I'm aware of: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

link

version_five 1052 days ago

Also this one: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

link

viraptor 1052 days ago

Kind of. There's https://github.com/Hannibal046/Awesome-LLM and you can follow the subreddit too. They're not amazing though, so I'm in the process of making my own catalogue.

link

kken 1052 days ago

so its open source? Where is the link to the weights?

The article was mildly annoying because of many underlines sentences and words that looked like links.

link

brianjking 1052 days ago

The weights are literally linked in the article. The license is Apache 2.0

https://huggingface.co/lightonai/alfred-40b-0723/tree/main

link

llm_nerd 1052 days ago

The fact that this empty SEO blogspam bizarrely underlines (e.g. just like links) loads of content kind of obscures the single link to HuggingFace.

link

kken 1052 days ago

thanks!

link