Hacker News new | ask | show | jobs
Show HN: Searchbase – Plug-n-play semantic/fuzzy search for your data (searchbase.dev)
63 points by giulioco 854 days ago
10 comments

> Say goodbye to the complexity of crafting queries by code.

IDK about this angle, just because elasticsearch / SQL has fucky syntax doesn't mean that crafting queries in code is something we wanna say goodbye to?

I like what Google Malloy has done, it made writing things in code enjoyable by reinventing the syntax and language, but keeping it compatible with existing tech.

Otherwise this is cool! I found it daunting as a newcomer to search thinking about optimizing search and making sense of how users use it.

But even with elasticsearch, it was not that hard to craft queries to do a faceted search and it was cheap and easy to setup, so the value I see is more in the analysis. And if you're going beyond a faceted search with basic group and / or logic, then isn't vector searches the way to go anyway?

I'm probably missing something!

Good point. While the visual query builder is meant to be more like a playground to play around while crafting your queries, at the end of the day it just provides you with an API request body (json) that can also be hand-crafted. The VQB is also a way to craft and save query “aliases” with their own permissions.

Analytics: our painpoint with a few other search solutions was a lack of out-of-the-box insights for user behavior and common searches. We’re addressing this with Searchbase

Vector search is no replacement for complex logic, its merely another data type like bm25 ir regular range filter. Searchbase looks to me as better fit for intuitive data exploration, rather than replacing existing robust high throughput searches
I'm always a little skeptical of general purpose search solutions. The hard part about search is ranking and relevance. BM25 and ANN similarity are great algorithms for sourcing documents, but without a ranking step the user is left to scroll through a lot of potentially relevant results that aren't sorted very well.
Searchbase also supports user-defined ranking; the user can provide keyword, geo-distance, and number “ranking modifiers” each with their own “weight”. (Similar to elasticsearch’s “RANK” filters).
What I struggle with solutions like this is permissions. When having a permission system like row level security in place for sql, how to rewrite all this logic for the search queries?
Right now a Searchbase api key has read access to an entire index or to an “alias”, which is a filtered subset of an index. I agree that row-level security could be really interesting… Could you give an example of your use-case?
I’m also interested in how to handle this. And not just rls but any authorization mechanism.
Interesting! Any idea of what will be the pricing? Is it going to be a service that's also available for small userbases (does the pricing scale down)?

What does "real time" means, are we talking about minutes or seconds?

Finally, how does it work for data that needs to be joined, should I create prejoined tables locally?

Scalable pricing is one of the core issues we encountered with existing search solutions. While the details for pricing are still TBD, we do want to make sure to lower the barrier to entry to that earlier-stage startups have a simple and affordable way to integrate their first search solution.
Thank you! That sounds appealing :)
The landing page doesn't mention self-hosting, is it an option considered for the future?

Maybe the AI revolution placed convenience over data ownership, but there's no way I'm giving full read access to my databases to a third party.

TL;DR: We’re prioritizing convenience atm.

I’m very open to the idea of an on-prem version of Searchbase as an option in the future! We decided to start with a cloud version for our MVP to enable users to get started with Searchbase in a few minutes.

As for sharing DB credentials, I get the concern. “Subscribing” to your database is meant for developer convenience. But if there’s demand, we’d also like to support the traditional ETL method of “manually bulk uploading” :)

Curious to know more about the choice to go for limit/offset; knowing all the limitation with this way of querying data it seems shortsighted.
Your github is still just a clone of a doc generator tool, maybe just dont link it until there's something there. Leaves a bad first impression.
Thanks for the heads up. Removed.
Your signup form isn’t working on iOS. Would be great to give this a go!
Uh, oh. Taking a look, sorry about that. What error are you getting? In the meantime, you can email me at giulio at searchbase.dev and I can add you manually.
Am I missing something? Where is the source code?
Benchmarks? Multi-tenancy?
Multi-tenancy: I assume you mean organizations w/ user-roles, which we do support

Benchmarks: this is a big one! We have some initial data that shows we’re comparable to elastic for fuzzy-search at medium scale. However, one of our value-propositions is out-of-the-box semantic search, so we’re working on a blog post with the full benchmarking story.

Benchmark hybrid (lexical + semantic) search while you're at it.