| While building mendable - we found that feeding LLMs well-structured markdown improved accuracy. We also found it surprisingly hard. We found some great tools online, but none reliably handled the entire process. We wanted an API that took a URL, crawled the pages in the URL, and gave us an easy-to-use, up-to-date markdown we could feed into our index. So, we released an open-source repo and an API that crawls and turns entire websites into a markdown with just a few lines of code The API handles: - Crawling without consistent sitemaps
- Infra to handle running many crawling jobs
- Proxying, hosting headless browsers at scale
- Conversion to clean markdown
- Caching
- Handling images, videos (soon), and tables(soon)
- LLM extraction (soon) It is open source, and we also offer an easy-to-use API that starts free. It has built-in loaders for both @llama_index and @langchain. Excited to see people try it |