| I've been doing data analysis and machine learning client work for quite some time now and for companies as small as a 3 person startup to advising a department of the Canadian government. Almost always numpy matrix math + cython or C or Java on a single machine is enough. Not always-always; but if you can relax requirements slighly say by accepting a 45 minute lag from new data impacting the total model, or by caching the results of the top 10k most likely queries, or by putting more effort into stripping out the garbage parts of the data, or, sometimes, just throwing a $10k a month server or mathematician at the problem (sure is cheaper than a bunch of cheap servers + larger infrastructure team). The times you need real scalability you know you need it. You'd laugh at how silly someone would be for trying to put it onto one machine. You're solving the travelling salesman problem for UPS (although I can think of some hacks here - I probably can't get it down to a single machine), or you're detecting logos in every Youtube video ever made, or you're working for the NSA. Even if you know for sure you're going to need scalability. I don't think it hurts to just do it on a single box at first. Iterating quickly on the product is more important and once you have something proven you can get money from the market or from VCs to distribute it. |
We could write 30 microservices deployed on 30 docker images with load balancing and FT and all that magic for a basic webapp...
Or we could just write a pretty fast webserver and do it with 1 server. (Or if it is stateless, do it with a few for still a lot less work than a giant microservice cluster).
I think in the last year or so microservices have become a little less cool, and people are more along the lines of "code cleanly so we can microservice if we need to down the road, but don't deploy it like that for 1.0"... seems similar for this.