Hacker News new | ask | show | jobs
by truskovskiyk 808 days ago
This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.

questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?

Would love to see more progress toward this direction!

1 comments

Thanks for the feedback! Yes, it is similar to ludwig but we do think that our toolkit is a more lightweight solution to fine-tuning and ablation studies. In most cases, finding the right LLM with the right config on your dataset requires multiple runs (grid search). Our toolkit offers this capability via one yaml file.

As for the test coverage, right now, the toolkit includes property-based unit tests. For instance, for an LLM fine-tuned on summarization, a property-test will evaluate if the summarized text is smaller in length compared to the actual input text.

Similar to the above test, we have a handful of property-based tests. Of course, the list is not exhaustive at this time. As more progress is being made on the testing side, we aim to distill the most relevant tests depending on use-cases.

Hope this helps.