* Collaborative playground with automatic versioning and multi-provider support.
* One-click evals like context recall, llm-rubric (LLM as a judge), latency, and many more against datasets to get a definitive score and run regressions.
* Datasets that play well with logs and enable one-click creation of ‘good’ and ‘bad’ examples from real data.
* Logs that can enrich the analytics dashboard to track token usage, latency, cost, etc.
* Continuous evals sampled on live logs to ensure real-time performance.
## Coming soon
* One-click synthetic data to make iteration even faster.
* Auto-suggestions for instructions in the prompts by using the eval and prompt diff data already in the workspace.
* Multi-modality across the workspace - playground, datasets, evals, logs, history, and analytics.
* Prompt engineering → Flow engineering for multi-turn, CoT, and Tree of thought use cases.
* Collaborative playground with automatic versioning and multi-provider support.
* One-click evals like context recall, llm-rubric (LLM as a judge), latency, and many more against datasets to get a definitive score and run regressions.
* Datasets that play well with logs and enable one-click creation of ‘good’ and ‘bad’ examples from real data.
* Logs that can enrich the analytics dashboard to track token usage, latency, cost, etc.
* Continuous evals sampled on live logs to ensure real-time performance.
## Coming soon
* One-click synthetic data to make iteration even faster.
* Auto-suggestions for instructions in the prompts by using the eval and prompt diff data already in the workspace.
* Multi-modality across the workspace - playground, datasets, evals, logs, history, and analytics.
* Prompt engineering → Flow engineering for multi-turn, CoT, and Tree of thought use cases.