|
LLMs are essentially text completion machines, they get some text in as a list of tokens, and their job is to predict what token comes next. Typically, you're interested in getting more than just one token out of them, so you run them in a loop. You start with the token list containing just the user's message, run the LLM with that list, append the newly obtained token at the end, run the LLM again to see what comes after that new token and so on, until a special end-of-sentence token is generated and the loop terminates. There's no reason why you have to start with nothing but the users' message, though. You can let the user specify the beginning of the LLM's supposed completion, and ask the LLM to start from that point, instead of generating its completion from scratch. This essentially ensures that what the LLM says begins with a specific string. Not alll APIs expose this feature, there are good safety reasons not to, but all LLMs are capable of doing it in principle, and doing it with the open ones is trivial. LLMs are typically trained to output markdown, which uses ```language_name to denote code blocks in language_name, so that user interfaces like Chat GPT's web UI can do proper syntax highlighting. Therefore, if you make your LLM think that it already started a completion, and that completion began with ```json, it will predict what's most likely to come after that delimiter, and that would be a JSON block. |