Hacker News new | ask | show | jobs
by iliketrains 471 days ago
This is awesome! I like the idea of abstracting the factory building with a code-like structure. I wonder if supplemental 2D image (mini-map style) as an input to the policy would help with the spatial reasoning?

I work on a similar factory game (Captain of Industry) and I have always wanted an agent that can play the game for testing and balancing reasons. However, pixels-to-mouse-actions RL policy (similar to Deep Mind's StarCraft agent) always seemed like a very hard and inefficient approach. Using code-like API seems so much better! I might try to find some time to port this framework to COI :) Thanks for sharing!

2 comments

Regarding the 2d image - the issue is that these frontier models don't tend to support supplemental image inputs, and the ones that do aren't sufficiently well trained on (high precision) Factorio visuals to add that much information.
I see, integrating image inputs can be very challenging in this case as the models work with text input. I was not even thinking about the full isometric image, but just some simple 2D map where each pixel can be color-coded based on the entity type. I guess the problem is that these maps would look like nothing the models were trained on, so as you say, it might not provide any value.

The reason I was suggesting this is that I worked in robotics making RL policies, and supplying image data (be it maps, lidar scans, etc.) was a common practice. But our networks were custom made to ingest these data and trained from scratch, which is quite different from this approach.

Indeed I think the trade-off here is the more "pure factorio" types of images we give to the agents, the more likely it is that they've seen it during training (from google etc), however the signal-to-noise ratio is low and hence the current models get confused as the map complexity (amount of entities) and level of detail grows. If we start to create custom images, we can reduce the unneeded noise, but then risk giving something completely OOD to the agent (unless we train a visual encoder) and the performance also tanks
Thats really cool! Please shoot me an email if you get around to doing this, always happy to assist however I can.