| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by _delirium 5329 days ago
	There aren't very many statisticians/MLers who suggest (or practice) reimplementing your own algorithms, except for quite simple things, because the risk of getting something wrong is pretty high, and the work to make things efficient is non-trivial. If anything, the current push is in the other direction, towards encouraging more people to share their code, and more people to use well-tested code, through initiatives like http://jmlr.csail.mit.edu/mloss/ , http://www.jstatsoft.org/ , and CRAN. For example, you could reimplement your own SVM instead of using http://svmlight.joachims.org/ , but your chance of producing something correct and as efficient is pretty low...

2 comments

fauigerzigerk 5329 days ago

I think the choice between using existing libraries and implementing your own mostly depends on how central particular algorithms are to your product. If a better algorithm makes a great difference for my customers then it's insane for me to use an existing library.

I don't even find much value in looking at existing code as a starting point because it's bound to be either obscured by lots of optimizations or naive or it's university code left behind by someone finishing their thesis in a hurry. For code beyond a certain level of complexity I prefer to either use it as a black box or implement it myself.

Obviously, if the algorithm is not a core component of my product it's insane to waste time on reimplementing it, provided there is a good quality implementation that has the right license.

law 5329 days ago

It's a fine line to walk. On the one hand, community-vetted code is a spectacular idea for the core algorithms, but on the other, overly-restrictive licenses (like [L]GPL) effectively preclude the maximum utility being derived from them.

dubya 5329 days ago

OTOH, if you do need to write your own code, you could use the existing test suites to make it a lot easier. I think the GPL would allow this.