Hacker News new | ask | show | jobs
by alexmarquardt 162 days ago
I kept running into the same demo problem: it’s hard to find a product catalog that behaves like a real e-commerce catalog (titles, working images, usable categories/attributes), is easy to ingest, and is safe/clear to reuse.

So I built two small OSS pipelines that convert open product sources into a clean, stable NDJSON schema you can bulk-index into Elasticsearch/OpenSearch. One outputs ~100K grocery products (Open Food Facts) and the other ~1M electronics-style products (Open Icecat), with strict “no image = no entry” quality gates and a shared schema contract.

Would love feedback on: • what fields you consider essential for a convincing search/relevance demo dataset • whether the schema choices (flat attrs for faceting + searchable description) match what you’ve seen work in practice