Hacker News new | ask | show | jobs
by trq_ 601 days ago
This is awesome, can't wait for evals against Claude Computer Use!
2 comments

Can we first test this with basic sysadmin work in a simple shell?

Can't wait to replace "apt get install" by "gpt get install" and then have it solve all the dependency errors by itself.

This had been possible for a year already. My project gptme does it just fine (like many other tools), especially now with Claude 3.5.
I know that it exists. I was just hoping we can make such interactions (practically) bug-free before we move on to the next big thing.
Threat actors can't wait for you to start doing this either.
how can you write metrics against something that's non deterministic?