Hacker News new | ask | show | jobs
by SamDc73 102 days ago
https://talimio.com/ Generate fully personalized courses from a prompt. Fully interactive.

New features shipped last month:

- Adaptive practice: LLM generates and grades questions in real-time, then uses Item Response Theory (IRT) to estimate your ability and schedule the optimal next question. Replaces flashcards; especially for math and topics where each question needs to be fresh even when covering the same concept. - Interactive math graphs (JSXGraph) that are gradable - Single-image Docker deployment for easy self-hosting

Open source: https://github.com/SamDc73/Talimio

1 comments

The IRT angle is interesting — most adaptive learning tools just do basic spaced repetition, but using Item Response Theory to estimate ability level in real-time is a much more honest approach to "personalized." The JSXGraph integration for gradable math graphs is a nice touch too, that's a hard problem. Quick question: how do you handle subjects where the "right answer" is more ambiguous? Does the LLM grading struggle with open-ended questions outside of math?
yeah we use an LLM for the grading .. (for the free form questions)

the flow is basically:

When practice questions are generated, the model generates the question + the reference answer together, but the user only sees the question. then on submit, a smaller model grades the learner answer against that reference answer + the grading criteria.

I benchmarked a bunch of judge models for this on a small multi-subject set, and `gpt-oss-20b` ended up being a very solid sweet spot for quality/speed/structured-output reliability. on one of the internal benchmarks it got ~98.3% accuracy over 60 grading cases, with ~1.6s p50 latency, so it feels fast enough to use live.

for math, it’s not just LLM grading though:

- `SymPy` for latex/math expressions, so if the learner writes an equivalent answer in a different form, it still gets marked correct; so `(x+2)(x+3)` and `x^2 + 5x + 6` can both pass. (but might remove that one since it might be easily replaced by an LLM? And it's a niche use that add some maintenance cost)

- tolerance-based checks for the JSXGraph board state stuff; so on the graph if you plotted x = 5.2 instead of 5.3 it will be within the margin of error to pass but will give you a message about it

I also tried embedding/similarity checking early on, but it was noticeably worse on tricky answers, so I didn’t use that as the main path.