|
|
|
|
|
by alexmarquardt
162 days ago
|
|
I kept running into the same demo problem: it’s hard to find a product catalog that behaves like a real e-commerce catalog (titles, working images, usable categories/attributes), is easy to ingest, and is safe/clear to reuse. So I built two small OSS pipelines that convert open product sources into a clean, stable NDJSON schema you can bulk-index into Elasticsearch/OpenSearch. One outputs ~100K grocery products (Open Food Facts) and the other ~1M electronics-style products (Open Icecat), with strict “no image = no entry” quality gates and a shared schema contract. Would love feedback on:
• what fields you consider essential for a convincing search/relevance demo dataset
• whether the schema choices (flat attrs for faceting + searchable description) match what you’ve seen work in practice |
|