Hacker News new | ask | show | jobs
by mirekrusin 546 days ago
This "draw pelican riding on bicycle" is quite deep if you think about it.

Phi is all about synthetic training and prompt -> svg -> render -> evaluate image -> feedback loop feels like ideal fit for synthetic learning.

You can push it quite far with stuff like basic 2d physics etc with plotting scene after N seconds or optics/rays, magnetic force etc.

SVG as LLM window to physical world.

1 comments

> SVG as LLM window to physical world.

What? let’s try not to go full forehead into hype.

SVGs would be an awfully poor analogy for the physical world…

SVGs themselves are just an image format; but because of their vector nature, they could easily be mapped onto values from a simulation in a physics engine — at least, in the game physics sense of the word, rods and springs etc., as a fluid simulation is clearly a better map to raster formats.

If that physics engine were itself a good model for the real world, then you could do simulated evolution to get an end result that is at least as functional as a bike (though perhaps it wouldn't look like a traditional bike) even if the only values available to the LLM were the gross characteristics like overall dimensions and mass.

But I'd say the chance of getting a pelican SVG out of a model like this is mostly related to lots of text describing the anatomy of pelicans, and it would not gain anything from synthetic data.

> but because of their vector nature, they could easily be mapped onto values from a simulation in a physics engine.

I don’t think the fact that the images are described with vectors magically makes it better for representing physics than any other image representation. Maybe less so, since there will be so much textual information not related to the physical properties of the object.

What about them makes it easier to map to physics than an AABB?

For soft body physics, im pretty sure a simpler sort of distance field representation would even be better. (I’m not as familiar with soft body as rigid body)

For rendering them, more than for anything else. There's a convenient 1-to-1 mapping in both directions.

You can of course just rasterise the vector for output, it's not like people view these things on oscilloscopes.

SVGs are just 2D geometries, and I can assure you that almost every GIS project in the world uses vector data to model the physical world.

Whether it's a good model or a model that the LLMs can manipulate, I have no idea. But it's entirely plausible that I could feed an SVG map and ask it geospatial reasoning questions like how far between A and B, what's the nearest grocery store, etc.