| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by graycat 4839 days ago

The article criticizes too soon. Here I explain this point, give some references where can get good explanations of the topics in the article, mention some more topics, and then explain what I believe is a more important guide to success in computing for the foreseeable future and, in particular, a low risk approach to very successful information technology startups. For this last, venture partners listen up.

Maybe the standard reference on design patterns is:

     o    Erich Gamma, Richard Helm, Ralph
          Johnson, John Vlissides, 'Design
          Patterns:  Elements of Reusable
          Object-Oriented Software', ISBN
          0-201-63361-2, Addison-Wesley,
          Reading, Massachusetts, 1995.

Most of the rest of what is mentioned in the article is in any of, say,

     o    Donald E. Knuth, 'The Art of Computer
          Programming, Volume 3, Sorting and
          Searching', ISBN 0-201-03803-X,
          Addison-Wesley, Reading,
          Massachusetts, 1969.

     o    Robert Sedgewick, Kevin Wayne,
          'Algorithms, 4th Edition', ISBN-10:
          032157351X, ISBN-13: 978-0321573513,
          Addison-Wesley, Reading,
          Massachusetts, 2011.

     o    Thomas H. Cormen, Charles E.
          Leiserson, Ronald L. Rivest, and
          Clifford Stein, 'Introduction to
          Algorithms, Second Edition', The MIT
          Press Cambridge.

          Also known as CLRS.

          Sometimes can find this on the
          Internet as a PDF. Amazon currently
          sells the third edition from 2009.

Two lectures of 90 minutes each are about enough to cover what the article has from the last three.

There's an ocean of more such stuff can stuff between ears.

E.g., once I needed something, worked it out, and later discovered I'd reinvented k-D trees. It's in Sedgewick. One project I was on made important use of extendible hashing as in

     Fagin, R.; Nievergelt, J.; Pippenger,
     N.; Strong, H. R. (September, 1979),
     "Extendible Hashing - A Fast Access
     Method for Dynamic Files", ACM
     Transactions on Database Systems 4
     (3):  315–344,
     doi:10.1145/320083.320092

or in

     Extendible hashing
     From Wikipedia, the free encyclopedia

     http://en.wikipedia.org/wiki/Extendible_hashing

In my current project, I needed to be able to read a list of, say, 10 million numbers and end up with the, say, 100 largest. To do that I borrowed the heap data structure from heap sort (in Knuth above).

Later I heard that a standard Google interview question asks much the same.

So, since I got some extra mileage out of the heap data structure in heap sort, I would criticize the article for asking for quick sort but not also heap sort: For sorting n items, heap sort is guaranteed to run in time proportional to (n)ln(n), but the corresponding expression for quick sort is n^2 (the guarantee including worst case) -- a bummer. Yes, usually in practice quick sort is significantly faster than heap sort.

There's too much of such stuff to carry it all around between one pair of ears. So, get an overview and use books such as above as references. Each of those algorithms takes only about an hour to understand and an hour to program. So, just wait until need such an algorithm.

Alas, CLRS tries to cover the simplex algorithm of linear programming, and my view that they botch the effort. Good coverage of linear programming takes more than an hour, but there are several good texts, e.g.,

     Vasek Chvatal, 'Linear Programming', ISBN
     0-7167-1587-2, W. H. Freeman, New York,
     1983.

But I believe that the article's author is missing a still bigger point: He is implicitly assuming that for writing a program, it is sufficient (1) to look at the real problem and (2) use knowledge of programming languages and algorithms to write the software. For problems we understand well enough how to do in principle essentially manually, sure. Otherwise, heavily not.

Here's an example (which we will generalize below): Write software to say how to feed update information to the control of the trajectory of a spacecraft touring the outer planets with, also, the ability to fly between Saturn and its inner most ring.

For this will also need to know (1) Newton's second law of motion, (2) Newton's law of gravity, and (3) how to setup and solve numerically with sufficient accuracy the resulting initial value problems for some nonlinear ordinary differential equations. So, need some physics, differential equations, and numerical analysis.

Note the 'pattern' that is going on here: (1) We start with the real problem, finding updates to the control of the trajectory of the spacecraft, (2) convert that real problem to a mathematical problem, (3) get a mathematical solution to the mathematical problem (e.g., with theorems and proofs from differential equations and numerical analysis), (4) program the mathematical solution.

So, we do not try to go directly from the real problem to a real solution and, instead, take a side detour from the real problem, into some mathematics, into some software, and then back to the real problem.

I claim that this 'pattern' will be of greatly increasing importance and value for computing starting now and for the foreseeable future. That is, we need ways to connect from the real problem to the real solution more powerful than just what we used to do manually or what we might do intuitively, with heuristics, genetic programming, 'machine learning', artificial intelligence, etc.

For an application of this pattern to information technology startups:

(1) Problem. Pick a "big ass" problem, one, say, 1+ Internet users in the US and around the world want solved. Pick a problem where for those users the first good or a much better solution will be a 'must have' instead of just a 'nice to have'.

The example from biomedical technology would be a safe, effective, cheap, patentable one pill cure for any cancer.

Note: Now with such a "big ass" problem we have a good shot at being able to f'get about subtle issues of UI/UX and 'product/market' fit. So, here we reduce 'market' risk.

(2) Do a faithful conversion of this problem into a precisely stated mathematical problem. The math involved might be original with advanced prerequisites. This conversion is usually from challenging to impossible. If cannot make this conversion, then return to (1) and pick another problem. Else continue.

(3) Solution. Find a mathematical solution to the mathematical problem. Want the solution to result, for the real problem, in the first good one or a much better one than anything else. The math involved might be original with advanced prerequisites. Finding this solution is usually from challenging to impossible. If cannot find such a solution, then return to (1) and pick another "big ass" problem. Else continue.

Note: A mathematical solution to a precisely stated mathematical problem is usually easy to check (follow theorems and proofs) with high reliability. So, if we have such a solution that checks, we have lowered project risk.

(4) Computing. Convert the mathematical solution to software to do the data manipulations. This work might be regarded as resulting in an 'algorithm' for the problem, but an 'algorithm' is just some code to do something, and without the mathematics such code has next to nothing to recommend it. So, we don't really want just an 'algorithm'; instead we want something logically solid from some mathematics.

The computing involved might be original with advanced prerequisites. This conversion is commonly from challenging to impossible. If cannot make this conversion, then return to (1) and pick another problem. Else continue.

(5) Deployment. Deploy the software, go live (say, on the Internet), get publicity, users, maybe ads, and revenue.

Note: Starting at step (1), this sequence has high risk. But given success through step (4), due to the big ass problem and the good or much better solution, step (5) has low risk.

So, if we can get through step (4), then we are GO for a valuable solution to a big ass problem, a solution 1+ billion people regard as a "must have" and not just a "nice to have", get to f'get about subtle issues of UI/UX and 'product/market fit' or a 'lean' development process with lots of feedback from the market and revisions of the work.

My view is that this sequence of (1)-(5) will grow rapidly in importance for computing and startups for the foreseeable future. In this case, the key is not some computing skills, classic algorithms, or computer science but some mathematics. possibly new with advanced prerequisites.

So, net my view is that for the future of computing the crucial academic material is from departments of mathematics instead of departments of computer science.

To be more clear, the advantage of the 'detour' into mathematics is get a solid, low risk 'logical chain' from the real problem to the computing and the real solution. That's part of what we want, right?

The only thing that surprises me about this sequence of (1)-(5) is that it has long been so standard in applications of applied mathematics, physical science, and engineering to the solution of important real problems but seems to be totally missing or nearly so from current venture funded information technology startups.