| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kingcauchy 91 days ago
	I'd be super interested to here more about what you all do in this space, currently Antfly (and Termite) doesn't handle custom content types explicitly because we've mostly focused on supporting the "classic" ones (application/pdf, image/png, image/jp2, e.g.) but we've had to build out a lot of the support for these things as custom support into the system. For instance I chose jsonschema for the schema so users could do exactly what you're suggesting, custom content types indexed differently. The ML side of things also has to know how to support them (i.e. does a pdf get rendered ocr then embedded or text extraction on some fallback). Would love to here about what you all do and the types of media you make searchable today!