|
|
|
|
|
by vunderba
8 days ago
|
|
I use LLMs more in the context of peer-reviewing and also came to a similar conclusion, gpt-5.5 codex xhigh reasoning seemed to catch more edge cases and went "deeper" into analysis than Opus 4.7/4.8. My preliminary tests of Fable were pretty promising but that's DOA for everyone for now. |
|
and most of its findings were false positive or outright wrong as in the screenshot I posted above.