Hacker News new | ask | show | jobs
Show HN: Create-LLM – Train your own LLM in 60 seconds (github.com)
54 points by theaniketgiri 243 days ago
https://medium.com/@theaniketgiri/three-months-ago-i-wanted-...
8 comments

Thanks everyone for the feedback and discussion. For those asking technical questions - happy to help! The tool works on Mac/Linux/Windows, check the README for setup. For those concerned about the architecture - it follows standard scaffolding patterns (create-next-app, etc). TypeScript CLI generates Python projects. 82+ stars in 24 hours - grateful for everyone trying it out. Keep the feedback coming!
The blogpost is some of the best LLM greentext I have seen for targeting the hn hivemind. Everything about this is :chefs kiss:
Thanks! The blog post is just my honest journey - spent way too much time trying to understand LLMs, figured others had the same frustration.

If you try create-llm, would love your feedback. Always looking to make it better.

This is great initiative. don't let anyone here discourage from doing something great like this.
Thanks a lot, really appreciate that Building something new always gets mixed reactions, but messages like yours keep me going.
2 questions: how much of this project is AI generated and how much of only the readme is AI generated?
Mostly the repetitive stuff like README generation and pushing code with meaningful commit messages was handled by AI. The actual work and logic were done by me.
What about the commit that added tens of thousands of lines of markdown claiming to be an AI summary?

Or the meaningful commit message of “.”

And the commit editing 1,000s of lines of python code mislabeled as a docs change?

Totally fair question!

Docs / Markdown: AI handled repetitive stuff like READMEs and summaries.

Core logic / Python: fully written by me.

Commit messages: some minimal ones just for quick iterations — the real work is in the code.

AI helped with boilerplate so I could ship faster; all functionality is hand-crafted.

If the AI did the boilerplate that implies it was not fully written by you.

The “meaningful commit messages” — again are a single period as the message for a single commit for the entire python portion of the codebase.

My question was rhetorical. Whether the AI did it or a human did, it burns credibility to refer to things that don’t exist (like “meaningful commit messages”)

Hacker News is a better place when we don’t attack people sharing their work. Your point was made.

Well done to the author for shipping code. I look forward to trying it out.

I don't quite understand how you get from this:

> I wanted to understand how these things work by building one myself.

Directly to this:

What if training an LLM was as easy as npx create-next-app?

I mean that the second thought seems to be the opposite of the first (what if the entirety of training llm was abstracted behind a simple command)

Great question - I should've been clearer.

When I started, I wanted to understand LLMs deeply. But I hit a wall: tutorials were either "hello world" toys or "here's 500 lines of setup before you start."

What I needed was: "give me working code quickly, THEN let me modify and learn."

That's what create-llm does. It scaffolds the boilerplate (like create-next-app), so you can spend time learning the interesting parts: - Why does vocab size matter? (adjust config, see results) - What causes overfitting? (train on small data, see it happen) - How do different architectures perform? (swap templates, compare)

It's "easy to start, deep to master." The abstraction gets you running in 60 seconds, then you dig into the code

How did you test this? Did you train something?
Yeah, I did! I trained a few small ones — mostly the “nano” and “tiny” templates (a few million params) on datasets like Shakespeare and Alpaca. The goal was to make sure the training loop, tokenizer, and evaluation all worked smoothly.

Didn’t go for massive models — more about making the whole setup process quick and reliable. You can actually train the nano one on CPU in a few minutes just to see it working.

How does this differ from nanochat?
Good question! I think you mean nanoGPT (Karpathy's minimal GPT implementation)?

Key differences:

nanoGPT: - Minimal reference implementation (~300 lines) - Educational code for understanding transformers - Requires manual setup and configuration - Great for learning the internals

create-llm: - Production-ready scaffolding tool (like create-next-app) - One command: npx create-llm → complete project ready - Multiple templates (nano/tiny/small/base) - Built-in validation (warns about overfitting, vocab mismatches) - Includes tokenizer training, evaluation, deployment tools - Auto-detects issues before you waste GPU time

Think of it as: nanoGPT is the reference, create-llm is the framework.

nanoGPT teaches you HOW it works. create-llm lets you BUILD with what you learned.

You can actually use nanoGPT's architecture in create-llm templates - they're complementary tools!

Unlike nanochat this is purely vibe-coded, improving vibes by 110%, with 112x more emoji. A key innovation that gets to the heart of the problem is that this project stores python files as strings in typscript files to help improve workflows. I imagine the author solved this engineering challenge to overcome existing limitations\emdash more efficient, interpretable, and maintainable code\emdash in existing projects.
> Unlike nanochat this is purely vibe-coded, improving vibes by 110%

Karpathy clearly said that it wasn't vibe coded. Apparently it was more time consuming to fix gpt bugs than to do it by himself.

The Python-in-TS bit made me smile But to clarify, it’s a standard TypeScript CLI — no such hacks involved, just template-based generation.
Ok, but there is no reason to bake it into the TS scripts. You could write the python scripts and package them using standard tools. In my experience only an LLM would do that, since it makes sense to generate the code and templates to insert in one go. However, if a human were to do it, the python scripts would be their own files and they would be bundled / read in as strings when/as required. A gigantic lump of text in a string makes no sense in human paradigms, even if it makes perfect sense for an LLM to do it. For humans it is incredibly hostile to update and maintain.

As a side note, without looking it up, on your device, what is the process for typing an emdash?

While you are not wrong about this not being the wisest choice, i have seen it done before countless times already long before llms arrived. So i don‘t think it‘s such a clear sign of intense llm usage as you make it out to be.
Fair point I agree embedding code as strings isn’t ideal. I did it mainly to make npx create-llm portable without needing a Python setup during scaffolding. Definitely open to improving that happy to refactor if you have suggestions.
That makes zero sense. Am I even speaking with a natural person right now?!? Your comments sound like llm bullshit and everything about this project reeks of it as well, from code to readme.
Does this work on mac
Yep, works fine on Mac. Try the nano or tiny templates if you want quicker training runs