| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CMay 11 days ago

The Openrouter website says that y'all do not train on the data, but it does not make it clear that the data is not shared with any 3rd parties (other than the LLM provider) who might train on it.

There is the example of Apple and Google providing transport for push notifications, but claiming to delete the content and only preserve the metadata.

What is Openrouter's policy on this? Is the logging of user data an essential part of the business model, or is the primary business model really facilitating a proxy between multiple services and nothing beyond that? If everything is logged, do y'all store it securely so that if one database is stolen (by China for example) then it's not useful on its own?

With the race for AGI and everyone training on each other's outputs, Openrouter is clearly in a position to abuse all of that even though the major providers weaken their output to limit the value of distilling them.

1 comments

numlocked 11 days ago

We have never sold any prompt data to anyone, in any form, and have no plans to do so. Full stop.

link

segmondy 11 days ago

Can you also confirm that you do not log/retain it. 100% pass through. If you are logging it, you could one day change your position on that.

link

numlocked 11 days ago

We have two mechanisms whereby we retain data. Both are opt-in and off by default.

One mechanism where you get a discount and we can use the data (in theory this does mean sell it; but our intent is to use it to make efficient dynamic routing solutions. But absolutely we could one day sell it) and another where we retain it for you so you can see it in your logs. We have no rights to this data in any way. This is similar to how any tracing/logging solution works.

Both and opt-in. If you don’t opt in, we don’t retain anything and are a pass through with regards to your prompt data.

All of this is carefully documented and I encourage you to explore and chat with the docs.

link

CMay 10 days ago

Do you specify prompt data, because prompt data is subject to user copyright, but LLM outputs are not? There is the issue that LLM outputs might leak user data back through the response. Y'all don't log those either, correct?

link