Hacker News new | ask | show | jobs
by nomand 1062 days ago
My benchmark is having a peer programming session spanning days and dozens of queries with ChatGPT where we co-created a custom static site generator that works really well for my requirements. It was able to hold context for a while and not "forget" what code it provided me dozens of messages earlier, it was able to "remember" corrections and refactors that I gave it and overall was incredibly useful for working out things like recurrence for folder hierarchies and building data trees. This kind and similar use-cases where memory is important, when the model is used as a genuine assistant.
1 comments

Excelent! That sounds like a very usefull personal benchmark then. You could test llama v2 by copying in different lengths of snippets from that conversation and checking how usefull you find its outputs.