| Hello everyone, I created a tool called "site2pdf" over the weekend. This tool converts the main page and sub-pages of a website that match a specified URL pattern into a PDF file. It is particularly suitable for AI-based RAG (Retrieval-Augmented Generation) and QA (Question Answering) tasks. GitHub: https://github.com/laiso/site2pdf/ # Features - Generate PDFs of main and sub-pages - Based on Node.js and Puppeteer - Easy-to-use CLI tool I want to make this software available online for my friends, but I'm struggling with the best architecture to use. I want to meet the following requirements: - Cost-effective - Use Cloudflare Workers' Browser Rendering API(Managed Puppeteer) - Save to Workers Queue -> R2 bucket I have already created a prototype, but it encounters ExceededCpu errors when running the consumer for a long time. It seems I need to implement a distributed architecture including merging, which seems challenging. I would appreciate any advice you can give. Thank you! I look forward to your feedback! |
You mention that the generated PDFs are "particularly suitable for AI-based RAG and QA tasks" - can you please share your preferred method/tools for that?