Hacker News new | ask | show | jobs
by ponta17 388 days ago
Thanks a lot for the thoughtful feedback — I really appreciate it!

I think there might be a small misunderstanding regarding how the LLM is actually being used here (and in many agent-based setups). The LLM itself isn’t directly executing twist commands or handling motion; it’s acting as a decision-maker that chooses from a set of callable tools (Python functions) based on the task description and intermediate results.

In this case, yes — one of the tools happens to publish Twist commands, but that’s just one of many modular tools the LLM can invoke. Whether it’s controlling motion or running object detection, from the LLM’s point of view it’s simply choosing which function to call next. So the computational load really depends on what the tool does internally — not the LLM’s reasoning process itself.

Of course, I agree with your broader point: we should push toward more meaningful high-level tasks where LLMs can orchestrate complex pipelines — and I think your examples (like fetch-a-beer or map annotation via VLMs) are spot-on.

My goal with this project was to explore that decision-making loop in a minimal, creative setting — kind of like a sandbox for LLM-agent behavior.

Actually, I’m currently working on something along those lines using a TurtleBot3. I’m planning to provide the agent with tools that let it scan obstacles via 3D LiDAR and recognize objects through image processing, so that it can make more context-aware decisions.

Really appreciate the push for deeper use cases — that’s definitely where I want to go next!