|
|
|
|
|
by davedx
724 days ago
|
|
I’ve checked out quite a few RAG projects now and what I haven’t seen really solved is ingestion, it’s usually like “this is an endpoint or some connectors, have fun!”. How do I do a bulk/batch ingest of say, 10k html documents into this system? |
|
Ingestion is pretty straightforward, you can call R2R directly or use the client-server interface to pass the html files in directly to the ingest_files endpoint (https://r2r-docs.sciphi.ai/api-reference/endpoint/ingest_fil...).
The data parsers are all fairly simple and easy to customize. Right now we use bs4 for handling HTML but have been considering other approaches.
What specific features around ingestion have you found lacking?