Hacker News new | ask | show | jobs
by bitforger 1230 days ago
Pretty cool.

I once worked on AI Dungeon and we had a similar idea to parse the story so far into a graph, so that we could manage long-term memory outside of the context window (which was only 2048 tokens).

Coreference is hard. ("he took the sword"... who is he?) Updating the graph is also hard. (As the story progresses, new facts contradict old facts. Jenny was dating Tom, but now she's dating Mike.)

And knowing what to do with the knowledge graph is hard too, especially if you don't know the schema up front. The only thing we could think to use it for was... programmatically turning relevant sections back into text and prepending it to the context window. (There were easier ways to get a similar effect.)

5 comments

It's really fascinating hearing about this and what the issues were. I have played a lot of AI Dungeon on and off and this always felt like part of what was missing: some way for it to keep a structured view of the story to help consistency. The biggest problem has always been that it keeps contradicting itself or lose track of the plot. It's gotten a bit better with the manageable context being fed back each step, but it's still not nearly good enough.
Handling state (especially long-term) is really a struggle for LLMs right now. This issue should become easier to work with as context windows scale up in the next couple years (or months, who knows!).
People are already making progress on this, e.g. the H3 project[1].

[1] https://arxiv.org/abs/2212.14052

This is the most excited I've ever been sequence models! If the claims the H3 (and S4) authors are true then we are on the cusp of something very big that will provide another quantum leap in LLM performance. I worth that the claims may come with a hidden catch, but we just have to work with these systems to know.

I'll venture that once truly long range correlations can be managed (at scales 100-1000x what's possible with current GPTs), all the issues about logical reasoning can be answered by training on the right corpus and applying the right kinds of human guided reinforcement.

Google scaled context to 40K tokens
Using tokens as context still sounds to me like you're asking someone to read back text that someone else wrote and continue the story. It might work but it's not the best way to get a coherent narrative.
How can you have a coherent narrative if you can't link things across very large contexts?
I'm saying the context should consist of more than just tokens.
The new facts contradicting old facts thing is fascinating to me.

Why can’t graphs properly model time or sequences?

It's possible to model by annotating facts in the database with a timestamp (Wikidata has this, as well as qualifiers for e.g. the source of a statement, or that it applies within a restricted context) but you still need to somehow integrate the information if you want to know the state right now. E.g. if you have (Jenny, date, Tom) from a year ago and (Jenny, date, Mike) from yesterday, does that mean (Jenny, date, Tom) is no longer valid? Or are both simultaneously true? Or is (Jenny, date, Mike) invalid too, because yesterday was like ages ago?

You could have some heuristics to handle this and then you add another relation "has met" and suddenly you need a whole new set of heuristics.

you can have a date_start and date_end to handle this ambiguity. but yes the complexity lies in the interpreter/reasoner that has to deal with these facts and evolution of this (meta)schema.

But rdf style and labeled property graph data modeling approach have multiple ways of dealing with this.

The way Datomic handles facts, accumulating them and providing point-in-time queries, is very effective.

Facts can contradict each other. Old facts are not lost. Querying requires a notion of time - “as of”.

It's a combination of reification and bitemporal modeling.
Correctomundo. See RDF-Star for progress about state-in-time. During summer 2022 there was extensive discussion/consideration in the W3C working group about different state-conditions.
Cool story! Feeding context back into the 0 shot is the hotness. I’ve had a lot of success with that.

Curious what other (easier) ways you found to accomplish the same effect?

> programmatically turning relevant sections back into text

I can't help but think, is this the voice in our heads?