Hacker News new | ask | show | jobs
by eps 487 days ago
I don't see any description of the resulting model in the post. Or any results for that matter. Reads more like a book plug.

Am I missing something?

1 comments

I get the same vibe, especially after reading the update where the book author contacts him to clarify stuff.

The whole post reads like hasty clumsy grey marketing.

I wrote the blog post, and I did it on my own freewill, and am receiving no compensation. The main reason I wrote it was to help cement my own learnings from the book. I've heard that the best way to learn something is to teach it, so I wanted to see how much I could regurgitate on my own. Turns out, not a whole lot. It was hastily written, and more of a "brain dump" than anything else. I'm entering a new-to-me field, and wanted a place to document the things I'm learning. If anyone finds it interesting, great. If not, no big deal.

As for the specifics of the model I trained, I would be hard pressed to recall the specifics off the top of my head. I believe I trained a small model locally, but after completing that as a PoC, I downloaded the GPT-2 model weights, then trained / fine-tuned those locally. That is what the book directed. All the steps are in my github repo, which (unsurprisingly) like the author's repo. His repo actually has more explanation. Mine is more or less just code.