| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by thruway516 883 days ago
	I think this could be more useful to most people as a prompt/RAG testing service rather than an llm testing service. If I ran a test and found out the llm I was using is 60% accurate on some topic what would I do with this knowledge - build a more accurate llm? Switch to another? On the other hand if a service offered me suggestions to improve accuracy by providing a score for various prompt or RAG inputs, I think this would be very useful to many people. It could even uncover a general prompting strategy depending on the underlying Llm or inputs available which would be really useful