| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mark_l_watson 313 days ago

Wow, Sebastian Raschk's blog articles are jewels - much appreciated.

I use the get-oss and qwen3 models a lot (smaller models locally using Ollama and LM Studio) and commercial APIs for the full size models.

For local model use, I get very good results with get-oss when I "over prompt," that is, I specify a larger amount of context information than I usually do. Qwen3 is simply awesome.

Until about three years ago, I have always understood neural network models (starting in the 1980s), GAN, Recurrent, LSTM, etc. well enough to write implementations. I really miss the feeling that I could develop at least simpler LLMs on my own. I am slowly working through Sebastian Raschk's excellent book https://www.manning.com/books/build-a-large-language-model-f... but I will probably never finish it (to be honest).

2 comments

imtringued 312 days ago

For me it is the opposite. I'm shocked by how simple transformer based models and how small the architectural differences are between the latest models. Almost nothing has changed since late 2023.

link

lvl155 313 days ago

He does an amazing job of keeping me up to date on this insanely fast-paced space.

link