Show HN: API to turn entire websites into Markdown

Y	Hacker News new \| ask \| show \| jobs

Show HN: API to turn entire websites into Markdown (firecrawl.dev)

23 points by cpeffer 798 days ago

While building mendable - we found that feeding LLMs well-structured markdown improved accuracy. We also found it surprisingly hard.

We found some great tools online, but none reliably handled the entire process. We wanted an API that took a URL, crawled the pages in the URL, and gave us an easy-to-use, up-to-date markdown we could feed into our index.

So, we released an open-source repo and an API that crawls and turns entire websites into a markdown with just a few lines of code

The API handles:

- Crawling without consistent sitemaps - Infra to handle running many crawling jobs - Proxying, hosting headless browsers at scale - Conversion to clean markdown - Caching - Handling images, videos (soon), and tables(soon) - LLM extraction (soon)

It is open source, and we also offer an easy-to-use API that starts free. It has built-in loaders for both @llama_index and @langchain.

Excited to see people try it

4 comments

QuantumLeapOG 798 days ago

This is a meaningful thing. I use markdown in many places. But if the core capabilities can be trimmed and open-sourced for a long term, it would be more appealing to me.

link

onuratakan 795 days ago

Can you inform me about the hardest think for this, i am using a function only and its solves my request but i want to learn your experience.

link

cchance 795 days ago

this is pretty cool, the scrape with only main content needs some work, scraping something like cnn article comes back with a lot of excess things like advertising messages repeated etc.

but cool

link

brianjking 794 days ago

cpeffer - What about sites requiring auth? (IE: Confluence, staging environments, etc.)

I'm honestly really impressed.

link