I think LLMs are usually give very simple ideas if you don't do "deep research" and just give one paper, I assume that's what you're doing? At least I saw some of my projects were not good, and I need to run more steps on deduplication/improvement
Oh, that's cool! I will have to read your article on scoring more carefully a bit later but I think that in short, I decided to take a simpler and more heuristic approach for now based on TRL (a well established metric) and a combination of some other factors. I also didn't want the user to consider the score too much as a kind of "divine word". In general I found that the heuristic used is OK for the given score plus/minus maybe 1 point.