Hacker News new | ask | show | jobs
by ctoth 853 days ago
For solving long term tasks like finding things that aren't there, you can turn the annotated scene into a templated description and feed it to a large-enough model trained on interactive fiction.

You are standing in a kitchen. Ahead of you to your right there is a large refrigerator with the handle on the right side. There is a set of cabinets to your left with a plate sitting on the counter above them.

> get beer

You don't see any beer here.

<< COT: I know that beer is often found in the fridge. I should try opening the refrigerator

> open fridge

Opening the refrigerator reveals 4 cans of beer.

> get beer

taken

Obviously we're still several years from this working, but it's very exciting to consider. Interactive Fiction narrative fed by real sensors plus chain-of-thought blocks as internal monologue.

2 comments

Great, now we can teach robots to wander around rooms looking for things, saying "keys, keys, keys... where would I put keys?"
I can pick up and place objects myself if only I could remember where I put them.

You could take video data and have fuzzy identification of objects moving around, then throw away the video and keep track of the objects, the blue floppy thing (gloves) and the metal shiny deforming things (keys) then have a more constructive dialog about the keys. A voice responding, what do the keys look like? Is there a blue square thing on the key ring? The less identifiable the object the funnier the discussion. What shirt? You have many shirts! Oh, the blue one, you have 4 of those, one in the sink, one behind the bed, one in the laundry basket, one in the closet. Oh the one with stripes! Why didn't you say so, it's behind the bed bro.

It could also ask you if they are suppose to be on the outside in the front door after you close it.

Get a Tile. I have one attached to my keys, and saying "hey Alexa, find my keys" has been really nice. We also have one taped to our remote, which turned out to be excellent since our couch constantly eats it. I just wish it lit up, but sound-only is fine.

It would be really cool if the robot could just know where your keys are by attaching some kind of tile-type thing to it. If it already has a scan of your home, theoretically it could show a photo. But I have no idea if it’s possible to pinpoint an object via rfid.

I have exactly one place I put my keys in the house - the handle of a certain door. As soon as I get into the house, I put the keys there. This hasn't failed me yet.
Multi modal LLM already excel at theses sorts of tasks. Try taking a picture of your kitchen and ask chatgpt where to find the beer.

I use this quite a lot actually. Being lazy I take photographs of components and boards and ask it how to wire them to my esp32. It’s able to distinguish the board, chip, etc, as well as the pinouts from a set of photos and tell me what wires to where and anything of note. It’ll often even suggest helpful libraries for the parts. It’s essentially magic.