Is it just me or is every big release from Uber just a custom rewrite of an existing technology? It seems their engineering department has a large not-invented-here attitude. I could be wrong - they're certainly large enough to have custom requirements that aren't met with what's on the market but the pattern is just becoming suspect.
I think that what is going there is also a bit political. They started to grow their Engineering department so fast that they need to justify the headcounts now. So each team is trying to invent new projects all the time. Anecdotally, this was partially confirmed to me by a friend working there.
I said this before, but I still cannot understand why a service like Uber need so many engineers in the backend (multiple thousands). It is a complex distributed application, but nowhere near the scale or complexity of a Facebook or Google.
>I said this before, but I still cannot understand why a service like Uber need so many engineers in the backend (multiple thousands). It is a complex distributed application, but nowhere near the scale or complexity of a Facebook or Google.
Thank you so much, I thought I was going crazy. I understand the demands of running a service on the level Uber has, but well, for instance I can't imagine what kind of computational workload / infrastructure requirements would make developing your own resource scheduler a reasonable option - for a Taxi app? With non-essential (to the core product) machine learning?
Forgive me if I'm ignorant, but what exactly does Uber engineering team do?
edit: On their blog I was able to find that they namely "forecast rider demand", from a relatively small [0] article - that is, comapred to the article [1] about what essentially is "just" data visualization, which doesn't help my confusion much.
Those cars generate a lot of sensor data. (Tb per drive?). Id imagine that data needs to be made actionable and seperated into training and simulation sets pretty quickly. Mapping is a massive problem to automate.
Makes it possible for me to get a ride, process payments and refunds even when the data centers are having issues or when there are temporary internet problems.
Sure but that doesn't require thousands of backend engineers unless they are reinventing everything... and I'd be left wondering what they are rebuilding since, over the past year, every one of my trips with Uber has been a bar-lowering experience...
To be fair, operations becomes a much bigger deal as you get bigger. It’s not just the app, it’s having your infrastructure not fall over (because 0.1% failure rate means losing a lot of money)
Think of all the random bugs you’ve seen in your job and told yourself “eh, this would take someone 2 weeks to fix and is almost never hit by customers”
I think one of the bigger challenge is that when you become bigger you launch more projects to handle the scale and each of those projects introduce new bugs for which you need new team of engineers.
Basically, once you hit scaling I think you end up with a super small team that managed to keep simple (instagram for example) or you end up with a huge team that explodes in complexity and needs to grow exponentially to handle all the extra complexity. Uber is very obviously in the later bucket.
I periodically see the same content on my FB feed because well, it is NBD if I see the same update from my friend several times.
Let me assure you, it is a BFD if I get billed twice for the same trip. So I am pretty sure Uber needs quite a few engineers to make sure that their stuff works correctly every time in every market for every customer.
I appreciate their efforts on open-source projects. Jaeger is wonderful and the effort they put into both making something great, and supporting the open standards (Opentracing and the legacy Zipkin propagation) is greatly appreciated. I recently had the need to write a service in Typescript (most everything else is Go), and I felt very at home using the Jaeger node bindings. It felt like I wasn't losing any features for using a less-popular language and everything just worked.
Sure, they just reinvented Dapper from Google... but unlike Dapper I can download and use Jaeger. That counts for a lot. Do I use their ride sharing service? Nope. But I do like their open source projects.
* Everything they do is low data (no video, image or anything high bandwith).
* Their whole model can be subdivided into smaller local problems (all users//drivers in the bay area have nothing to do with the users//drivers currently in NYC).
yes there is a couple of algorithms to develop for Uber Pool, and for the real time matching but everything else looks like a fairly simple app backend to me.
> Everything they do is low data (no video, image or anything high bandwith).*
So is everyone else? Storage and CDN isn't nearly as complex as ad serving on Facebook (Which is like Uber's matching - it's a realtime marketplace). Ad serving takes up relatively little bits.
I'm not a network engineer so I'm unfamiliar with how problems scale by bandwidth. I do know that solving NP hard problems is difficult, so I respect Uber engineers for that at least.
I wouldn't say this is a rewrite of existing technology. They borrowed concepts from other well-known open source projects, but this is substantially a wrapper around Mesos, not a competing project. The technical overview of Peloton[1] is more clear about this than the open source announcement, which is what's featured here.
Anyone who had to take an Uber after they switched away from Google Maps and onto their in-house half-baked mapping/navigation solution knows this is a huge problem.
I'm assuming like all mapping solutions it'll get better but for now, it's just full of bad routes, over-optimizing turns, out-of-date detours (for MONTHS!) and non-sensical U-turns
They could just use a third-party (and at their scale, they can definitely negotiate a custom deal where they feed back usage data to improve the third-party’s service) or even use open source solutions like OpenStreetMaps. Even the latter (with the overhead of hosting it themselves) makes total sense at their scale.
The problem with a comment like this is that it only takes into account when something was released publicly and not when the problem was first worked on and implemented internally.
Many internal projects that eventually become open-source often are not NIH projects because when the project was proposed their may have been no public open source projects or at least none that is mature enough. Even if something exists but it still in its early stages, it presents a lot of risk because your company isn't in the driver seat building and maintaining it.
Claiming something is NIH based on when it first became polished enough to be open-source ignores all the history behind the state of the world when a project was first worked on.
Is it just me or is every big release from Uber just a custom rewrite of an existing technology?
I'm a little sad that this is the top comment here. I mean, maybe you're right. But so what? Some people find this useful, and some won't. Same as anything else.
At the end of the day, every line of code added to the world's pool of OSS code is a Good Thing™ as far as I'm concerned. Even if it's something I personally don't have a use for.
I think we should encourage companies to release code as open source, and give Uber at least some small measure of "props" for the stuff they release. Maybe none of their stuff is a game changer like Linux, but it doesn't need to be.
I’ve worked with Mesos pretty extensively before and when Uber first announced Peloton last year I was intrigued. Peloton seems to be a wrapper around Mesos that allows for running smaller, unique jobs without having to write a Mesos framework for each. Writing a Mesos framework for every small job you have can get annoying when you just want to define how your job should run and don’t really care for the resources or task allocation of the job, and it seems like Peloton solves this on Mesos. It’s similar to YARN but not limited to Hadoop. It would have been useful for the project that I worked on because it was more geared for our use case and shifting from Mesos to k8s would’ve been a huge engineering project.
Just call it (Uber) customized mesos. I find this article somehow deceiving and boring. I am pretty sure I can run this peloton thingy with most Mesos API calls.
It pretty much is, but the appeal (at least to me) is abstracting away all the framework writing. It seems like it’s easier to run small, unique jobs on it similar to something like YARN.
This post doesn't exactly tell us the true nature of their workloads (other than the crude categorization - batch, stateful, stateless), nor does it talk about the inflection points where off-the-shelf solutions don't cut it anymore and such customization is required. I mean some before & after numbers / graphs on resource utilization would have really helped.
The best thing I can think of that fits what you're asking is an Actor framework (which abstracts the compute and message passing between objects for you).
Immediate reaction: Well, now there's Peloton, the fitness tech company, Peloton, the self-driving truck caravan company, and Peloton, the cluster scheduler...
I remember when Mozilla got such heat for “usurping” the Firebird name (due to the name already being used by Firebird BD) - they then changed to Firefox.
Does Uber get held to the same standard or do we just assume all names are overloaded now?
Peloton is a somewhat common French word for "ball" that became common in English as a sports term for a grouping in a bicycle race, by way of the Tour de France. https://www.merriam-webster.com/dictionary/peloton
In addition to what others have said here, "peloton" also means "fearless" in Finnish.
I have no idea if or why they would've used that or if they're just referring to the cycling thing, but I guess "fearless" could also be kind of fitting for this project.
My first reaction was "this just sounds like Mesos?". And it's cited in their page (which on first read I thought meant they were trying to act as a single pane of glass for Mesos/k8s/etc.):
In the OP blog post though, they assert "to our knowledge, there is no other open source scheduler which combines all types of workloads for web-scale companies like Uber."
And then, when you dig...it's just Mesos. They built a framework for Mesos. So, that's cool. But man, the puff piecery borders on dishonesty. I mean--Singularity has existed, and is implemented at very large scales, for a while. I'm sure Peloton is a fine scheduler, but there's a lot of huffing-one's-own-farts in the documentation here.
Would people use open source stuff from a morally questionable company? Especially when its just a re-write of existing technology posted to a different github repo?
EDIT People get so up in arms about Google and Microsoft working with China and the military, but Uber has done some horrendous stuff on their own. Just curious where people think the line is OK to be.
Arguably it does indirectly - e.g. being able to say that you're responsible for Popular Tool X might help your brand, make it easier to make sales, etc