| HN Mirror

Sounds like an interesting masters thesis. Is your masters thesis available online somewhere?

Well, not sure about the final doc that went to the university, but this is the almost final draft.

https://docs.google.com/document/d/e/2PACX-1vSyWbtX700kYJgqe...

Since its in Cyrillic you should perhaps use a translation service. There are some screens showing results, though as I was really on a tight deadline, and its not a PHD but masters thesis, I decided to not go into in-depth evaluation of the proposed methodology against SPIDER (https://yale-lily.github.io/spider). Even though you can find the simplifed GBNF grammar, also some of the outputs. The grammar, interestingly it benefits/exploits a bug in llama.cpp which allows some sort of recursively-chained rules. Bibliography is in English, but really - there is so much written on the topic, by no means comprehensive.

Sadly no open inference engine (at time of writing) was both good enough in beam search, and grammars, so this whole things needs to perhaps be redone in pytorch.

If I find myself in a position to do this for commercial goals, I'd also explore the possibility of having human-catered SQLs against the particular schema, in order to guide the model better. And then do RAG on the DB for more context. Note: I'm already doing E/R model reduction to the minimal connected graph which includes all entities of particular interest to the present query.

And finally, since you got that far - the real real problem with restricting LLM output with grammars is the tokenization. Because all parsers work reading one char at a time, and tokens are very often few chars, so the parser in a way needs to be able to "lookahead", which it normally does not. I believe OpenAI wrote they realized this also, but I can't really find the article atm.

Thanks. Took a quick look and definitely needed to use Google Translate but seems to have worked to get the gist of it.

There's local applications of parallel processing; your average chatbot wouldn't use it, but a research bot with multiple simultaneous queries will, for example.

Better local beamsearch would be really nice to have, though.

I do wonder if recursion is particularly hard for LLMs, given that they have a hard limit on how much they can loop for a given token. (Absent beam search, reasoning models, and other trickery.)

Given a prolog (not problog, but the non-stochastic one) source is a parametric grammar, we can perhaps* argue the inference on the programming logic level can be unfolded by using a pen and pencil. think L-systems, they are self-similar, and recursively defined. The catch is that the whole sequence gets rewritten on each step. If you can get the LLM to do this as it progresses with generation - you get recursion. Question is whether you can get the LLM rewrite the context window, and my bet would be someone is already working on it.

* I say perhaps, because PROLOG engines normally don't rewrite strings like crazy while doing inference, so my statement may be somewhat off.