| ## Core features * Collaborative playground with automatic versioning and multi-provider support. * One-click evals like context recall, llm-rubric (LLM as a judge), latency, and many more against datasets to get a definitive score and run regressions. * Datasets that play well with logs and enable one-click creation of ‘good’ and ‘bad’ examples from real data. * Logs that can enrich the analytics dashboard to track token usage, latency, cost, etc. * Continuous evals sampled on live logs to ensure real-time performance. ## Coming soon * One-click synthetic data to make iteration even faster. * Auto-suggestions for instructions in the prompts by using the eval and prompt diff data already in the workspace. * Multi-modality across the workspace - playground, datasets, evals, logs, history, and analytics. * Prompt engineering → Flow engineering for multi-turn, CoT, and Tree of thought use cases. |