| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ianbicking 653 days ago

Lots of broken links in the doc, though I guess the YAML file specifies everything: https://github.com/open-llm-initiative/open-message-format/b...

The metadata tokens is a string [1]... that doesn't seem right. Request/response tokens generally need to be separated, as they are usually priced separately.

It doesn't specify how the messages have to be arranged, if at all. But some providers force system/user/assistant/user... with user last. But strict requirements on message order seem to be going away, a sort of Postel's Law adaptation perhaps.

Gemini has a way of basically doing text completion by leaving out the role [2]. But I suppose that's out of the standard.

Parameters like top_p are very eclectic between providers, and so I suppose it makes sense to leave them out, but temperature is pretty universal.

In general this looks like a codification of a minimal OpenAI GPT API, which is reasonable. It's become the de facto standard, and provider gateways all seem to translate to and from the OpenAI API. I think it would be easier to understand if the intro made it more clear that it's really trying to specify an emergent standard and isn't proposing something new.

[1] https://github.com/open-llm-initiative/open-message-format/b...

[2] https://ai.google.dev/gemini-api/docs/text-generation?lang=r...

1 comments

un1imited 653 days ago

hey @ianbicking - thanks a lot for the feedback. I've merged a change to fix the links [1].

> The metadata tokens is a string [1]... that doesn't seem right. Request/response tokens generally need to be separated, as they are usually priced separately.

For the metadata you are right. Request and response tokens are billed separately and should be captured accordingly. I've put a PR to address that [2]

> It doesn't specify how the messages have to be arranged, if at all. But some providers force system/user/assistant/user... with user last. ...

We do assume that last message in the array to be from user. But we are not forcing it at the moment.

[1] https://github.com/open-llm-initiative/open-message-format/p...

[2] https://github.com/open-llm-initiative/open-message-format/p...

link

ianbicking 653 days ago

I've hit cross-LLM-compatibility errors in the past with message order, multiple system messages, and empty messages.

Multiple system messages are kind of a hack to invoke that distinct role in different positions, especially the last position. I.e., second to last message is what the user said, last message is a system message telling the LLM to REALLY FOLLOW THE INSTRUCTIONS and not get overly distracted by the user. (Though personally I usually rewrite the user message for that purpose.)

Multiple user messages in a row is likely caused by some failure in the system to produce an assistant response, like no network. You could ask the client to collapse those, but I think it's most correct to allow them. The user understands the two messages as distinct.

Multiple assistant messages, or no trailing user message, is a reasonable way to represent "please continue" without a message. These could also be collapsed, but that may or may not be accurate depending on how the messages are truncated.

This all gets even more complicated once tools are introduced.

(I also notice there's no max_tokens or stop reason. Both are pretty universal.)

These message order questions do open up a more meta question you might want to think about and decide on: is this a prescriptive spec that says how everyone _should_ behave, a descriptive spec that is roughly the outer bounds of what anyone (either user or provider) can expect... or a combination like prescriptive for the provider and descriptive for the user.

Validation suites would also make this clearer.

link

aayushwhiz 653 days ago

Yeah, I can completely see this, the goal of this was to be specifically for the messages object, and not a completions object, since in my experience, you usually send messages from front end to backend and then create the completion request with all the additional parameters when sending from backend to an LLM provider. So when just sending from an application to the server, trying to just capture the messages object seemed ideal. This was also designed to try and maximize cross compatibility, so it is not what the format "should be" instead, it is trying to be a format that everyone can adopt without disrupting current setups.

link

ianbicking 653 days ago

Huh, that's a different use case than I was imagining. I actually don't know why I'd want a standard API from a frontend and backend that I control.

In most applications where I make something chat-like (honestly a minority of my LLM use) I have application-specific data in the chat, and then I turn that into an LLM request only immediately before sending a completion request, using application-specific code.

link

sparacha 653 days ago

Well, in the case of the front-end (like streamlit, gradio, etc) they send conversational messages in their own custom ways - this means I must develop against them each specifically, and that slows down any quick experimentation I would want to do as a developer. This is the client <> server interaction.

And then the conversational messages sent to the LLM are also somewhat unique to each provider. One improvement for simplicity purposes could be that we get a standard /chat/completions API for server <> LLM interaction and define a standard "messages" object in that API (vs the stand-alone messages object as defined in the OMF").

Perhaps that might be simpler, and easier to understand

link