Hacker News new | ask | show | jobs
by marioskales 109 days ago
Great question! Multi-agent group chat (3-5 AI personas discussing a topic) works well - no latency issues and my main PC only has 8gb of ram, each round is one API call per participant, so latency depends mostly on your chosen model and provider. With faster models like Gemini Flash or Groq, responses come in 1-2 seconds per turn. Heavier models like Claude or GPT-4 take a bit longer but still smooth (each AI within the group-chat has a 'response timeout', If a participant times out, they are skipped for that round and the next one discusses the topic).

For Execute Mode (multi-step autonomous tasks), Skales queues steps sequentially so there's no parallel bottleneck – it plans, you approve, then it runs through each step. There's also a desktop buddy (think as Microsofts Clippy, but actually useful) that sits in your system tray (if activated) as soon you minimize or close the main-windows, you can ask it quick questions without even opening the main interface. It runs within the same Electron process, so zero additional RAM overhead. Idle RAM sits around ~300MB (I had 400MB at least) which keeps things snappy.

The main speed factor is honestly the LLM provider, not Skales itself. With local Ollama models it's purely your hardware.

Happy to answer more specific questions, thank you for asking jlongo78!