Hacker News new | ask | show | jobs
by hodgehog11 47 days ago
I have plans to publish the problems, not any plans to publish how well the LLMs perform on them. The standard for publishing benchmarks is very high, and I'm really just posting vibes here. Still, I hope my experiences are useful to some people, as others experiences have been useful to me.