A Notion-like workspace to iterate, evaluate, and monitor LLM prompts

## Core features

* Collaborative playground with automatic versioning and multi-provider support.

* One-click evals like context recall, llm-rubric (LLM as a judge), latency, and many more against datasets to get a definitive score and run regressions.

* Datasets that play well with logs and enable one-click creation of ‘good’ and ‘bad’ examples from real data.

* Logs that can enrich the analytics dashboard to track token usage, latency, cost, etc.

* Continuous evals sampled on live logs to ensure real-time performance.

## Coming soon

* One-click synthetic data to make iteration even faster.

* Auto-suggestions for instructions in the prompts by using the eval and prompt diff data already in the workspace.

* Multi-modality across the workspace - playground, datasets, evals, logs, history, and analytics.

* Prompt engineering → Flow engineering for multi-turn, CoT, and Tree of thought use cases.