Hacker News new | ask | show | jobs
Show HN: API to turn entire websites into Markdown (firecrawl.dev)
23 points by cpeffer 798 days ago
While building mendable - we found that feeding LLMs well-structured markdown improved accuracy. We also found it surprisingly hard.

We found some great tools online, but none reliably handled the entire process. We wanted an API that took a URL, crawled the pages in the URL, and gave us an easy-to-use, up-to-date markdown we could feed into our index.

So, we released an open-source repo and an API that crawls and turns entire websites into a markdown with just a few lines of code

The API handles:

- Crawling without consistent sitemaps - Infra to handle running many crawling jobs - Proxying, hosting headless browsers at scale - Conversion to clean markdown - Caching - Handling images, videos (soon), and tables(soon) - LLM extraction (soon)

It is open source, and we also offer an easy-to-use API that starts free. It has built-in loaders for both @llama_index and @langchain.

Excited to see people try it

4 comments

This is a meaningful thing. I use markdown in many places. But if the core capabilities can be trimmed and open-sourced for a long term, it would be more appealing to me.
Can you inform me about the hardest think for this, i am using a function only and its solves my request but i want to learn your experience.
this is pretty cool, the scrape with only main content needs some work, scraping something like cnn article comes back with a lot of excess things like advertising messages repeated etc.

but cool

cpeffer - What about sites requiring auth? (IE: Confluence, staging environments, etc.)

I'm honestly really impressed.