| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by padolsey 306 days ago
	Hi jph! Not intending to subvert the thread, but I'd love to chat to someone like you. The non-profit I work at has been working on democratizing evals. This wouldn't be to ensure your in-house AI is up to scratch (parachute looks ideal!), but on ensuring the general landscape of models is up-to-date on best practice, e.g. NICE and other guidance, so that everyday model users aren't misled. Such a demo eval is here: https://weval.org/analysis/uk-clinical-scenarios/08278696ca2... We're looking for domain experts especially in high risk domains like healthcare, education, therapy. Then we'd work together co-authoring an eval in your specialism to expose and motivate AI labs to do better.