Hacker News new | ask | show | jobs
by krishna2 3253 days ago
That's a big claim and kudos if you really pulled it off. There is also the aspect of relevancy in addition to speed.

I think the best way you can showcase is to build a few sample proof-of-concept search engines. For e.g., How about a search engine for Wikipedia? Project Gutenberg? StackOverflow? All these datasets are freely available. You can set up a search engine for this and easily let anyone be able to verify your search engine's speed and relevancy.

Lastly, in addition to both speed and relevance is how easy it is to install, customize and extend.

Hope that helps!

1 comments

Yes (it helped) and speed, i.e. querying and indexing performance, for sure is only an USP if you also have relevance.

I'm confident the relevance is as good as or better than Lucene. I especially like my phrase queries and how they seem more relevant compared to that of a Lucene phrase query. The scoring is a half-way implementation of word2vec (in a lot of ways similar to the scoring mechanics of Lucene's tf-idf scheme). I'm aiming for full word2vec implementation in vNext.

I have only my own benchmark tests to tell me I'm faster than Lucene. Which is why I'm contemplating writing a formal proof both of ResinDB's performance and of it's relevance.

My test data has been the English verison of Wikipedia plus Project Gutenberg. I suppose I could publish those indices to the world, as a demo search engine. I don't think a soul would care about a proper searchable Project Gutenberg though. Looking into common crawl now.

I'm a part-time father of two, employed doing tedious unmotivating work, focusing completely on my spare time project. I need some advise as to what the next step should be, if I wanted to make this into a business that I could spend all of my time with, not only nights and weekends. Formal proof? Demo?

Side note: one of the most approachable people in the database building community is Oren Eini, creator of RavenDB. He's reviewing ResinDB on his blog. I've read a preview of the entire series of posts, implemented solutions for the best parts of the critique and just released v2. Blog is here: http://ayende.com/blog

Great that you already have those datasets. Yes, putting it on a small public server where people can search and evaluate would be good. Honestly the "speed" part cannot be truly verified if it is on a standard public server but at least the other aspects of it can be. As I see it, you are up against primarily Elasticsearch and Postgres's search engines. [To put out a full index could be costly so you could always try a small subset but still a good enough chunk, say a million docs or so].

Another thing to keep in mind is how easy it is to install and be pluggable. I know you have designed it as a library but I think a small wrapper around it with its own http server so anyone using it can start it as a service and use http to access via JSON would be useful too. [At least everyone these days seem to do everything in containers]. And also to add, Elasticsearch sets a good bar for how easy it is spin up a search engine and get started. Again, not sure how far you must be going to make ResinDB as easy to install, to use and document and all that.

One way to get adoption is to approach a few open-source projects and non-profit orgs (or profit orgs but you might've to start out for free) and see if you can convince them to use your search engine. Once you have a couple or more, it helps in two ways. First, you can get good feedback on what are the steps that someone besides you need to do to get it in production and updates and maintenance and second, you can use them as reference customers.

Feel free to contact me via email [same as hn id but with the popular email service from another search engine out there! :)].

Thank you so much for this feedback. My eyes have been on Elasticsearch ever since their first funding of 80 million bucks. But I have also noticed how Postgres is the only database engineering team that seem to care about full-text search. Their indexing capabilities are just awsome. They have a library of indexing types you can use. They all seem well constructed and so does Postgres (and the team).

A HTTP wrapper. Sure. It's in the backlog. I can push it up a bit.

What is it that you like about ELK the most? The easy-peacy install where you immediately can start writing data, the HTTP JSON API, or something else?