Hacker News new | ask | show | jobs
by wolfgarbe 1637 days ago
A laudable effort. Two questions:

1. What is the rationale behind choosing Python as a implementation language? Performance and efficiency are paramount in keeping operational costs low and ensuring a good user experience even if the search engine will be used by many users. I guess Python is not the best choice for this, compared to C, Rust or Java.

2. What is the rationale behind implementing a search engine from scratch versus using existing Open Source search engine libraries like Apache Lucene, Apache Solr and Apache Nutch (crawler)?

1 comments

Premature optimization is the root of all evil. Best to concentrate on the algorithm first, and then, maybe, improve it with a faster language.

Apart from that, the misconception that "python is slow" should die :-)

> Premature optimization is the root of all evil.

Is keeping performance in mind and choosing the tech stack accordingly really a premature optimization?

This might be the most abused phrase in CS history. Perhaps we should add "Premature optimization fallacy" to the list of cognitive errors programmers use as an excuse to not seriously think about performance.

> Is keeping performance in mind and choosing the tech stack accordingly really a premature optimization? This might be the most abused phrase in CS history.

And it's often misquoted, and/or taken out of context. The full quote goes like this: "The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming."

Back then - this was the 60s, remember - CPU cycles were costly.

But with all theories, it stands its time in the sun. We have the same "problem" today, but in a different form. We call it "agile" today, though; make sure that the customer is happy before the programmer is happy. If the programmer is allowed to spend too much time on trying to become happy, the customer is either gone, or someone else came up with a better solution.

In regards to your specific "is keeping performance in mind and choosing the tech stack accordingly really a premature optimization" question, and keeping in mind OP's endevour, you're on the right spot. But the real question is how programmers _get there_.

By experience.

And in turn, by relating to clients more directly these days, programmers have to adhere to the laws that were separated from them back in the days. And your business isn't worth shit without customers, even if you have the best programmers that can create the best code from day 1.

Hence Knuth's quote, translated: "if you spend so much time on planning your journey that you don't reach your flight, you are getting nowhere."

It’s a trade off between speed of development and performance. Speed of development seems like a good optimization for an experimental project?
Agreed, this was my thinking - and since I'm better at Python, it's faster for me to get stuff done. I would like to rewrite it in Rust though, all help from Rustaceans gladly accepted!
In general, speed isn't the problem with search (at least the retrieval aspect), but memory efficiency is. Things like small object overhead and the ability to memory map large data ranges are extremely beneficial for a language if you want to implement a search index.

But I agree, get it working first, then re-implement it in another language if it turns out to be necessary.

"Apart from that, the misconception that "python is slow" should die :-) "

Yeah it's not python that is slow, it's the interpreter.

Which interpreter? There are multiple. I found pypy to be quite reasonable; often faster than the standard C python interpreter.