Hacker News new | ask | show | jobs
by SuperscalarMeme 1376 days ago
Not to mention that in-house silicon is all about economies of scale, this is even more of a puzzling move
2 comments

Tesla has a path to economies of scale: they already announced that if Dojo works as expected they'll make it available to others as an AWS-style service.

Which is brilliant: they might end up making money on this.

AI is clearly here to stay. The demand for AI training will clearly explode in the future.

Running training in-house is not easy or cheap. You don't just plug in 1000 NVIDIA GPUs. You need massive up-front payment for GPUs and you're basically running your own extremely energy hungry datacenter .

Tesla might built and operate massive datacenters. They'll use as much as they need for internal needs and sell the remaining capacity to others.

This might take 5 years but the path to do it is clear.

I don’t see how they’re going to commercialise this as a cloud compute service.

For one, they’ve built a chip that operates in a fundamentally different way to other chips. So any other company that wanted to use it would have to invest a considerable amount of resources in building up the institutional knowledge to use it effectively.

Additionally, the lack of virtual memory and multi-tasking support renders it pretty much impossible to divide up compute between multiple customers. So, commercialising this would require customers renting out the whole unit, which is contrary to how cloud computing usually works.

Are there companies out there that have the capital and use cases necessary to fit into Dojo Cloud? Maybe, though not one I’ve worked for. Would they trust the stable genius currently heading up Tesla enough to make such an investment? Perhaps, but I wouldn’t, but what do I know?

> Additionally, the lack of virtual memory and multi-tasking support renders it pretty much impossible to divide up compute between multiple customers. So, commercialising this would require customers renting out the whole unit, which is contrary to how cloud computing usually works.

Only if you want to subdivide the compute on each dojo chip. You can still provide multi-tenant, support by allocating entire dojo chips to a single customer at a time. Even traditional time division multi-tasking is possible as long as you’re happy to accept multi-second long time slices. Then the overhead of clearing an entire dojo chip (or batch of chips), and setting up a new application, isn’t too high.

If you’re doing AI workloads, then none of the above are an issue. Training a large net takes days to weeks of continuous, single task computation. So selling dojo access in whole 1 hour blocks is a perfectly reasonable thing to do.

Part of the plan is to have PyTorch compatibility.

Dojo has it's own IR but they also have a PyTorch to Dojo compiler.

People's opinion of Musk wont matter: either Dojo will be a capable service at a good price or it won't.

People will use it based on merits.

Reliability is an important factor here, and I don't mean technology. Things don't look so good for everything that has to do with Musk. Today like this, tomorrow like that
>Things don't look so good for everything that has to do with Musk. Today like this, tomorrow like that

Such as? Except for FSD, his record is unmatched AFAIK when you take into account the novelty / complexity / difficulty.

One example, and certainly his main achievement: he said Tesla would sell and produce half a million cars by 2020, back in early 2014, and they hit that number with a 93.6% precision. https://youtu.be/BwUuo6e10AM?t=156

Some of Musk's stuff is great - other stuff isn't.

SpaceX? Great. Starlink? Sounds neat. Tesla? Pioneered electric cars with respectable performance and range.

But on the other hand, where's the hyperloop? Where's the affordable tunnelling? Where's the $35k Tesla - not available for order on the website, that's for sure. Where's the miniature submarine for rescuing children trapped in caves? Why has my buddy in Europe been waiting over a year for his powerwall to be delivered? Why are these norwegian tesla owners on hunger strike? Where's the full self driving, with taxi service? Why on earth would anyone want to buy Twitter?

Makes it very difficult to know which of Musk's statements are just spitballing, which are unrealistic timescale guesses and which can be relied on.

Getting any serious project architecturally 'locked in' to a special type of CPU you can only get from Tesla would be a bold move.

How is the Las Vegas tunnel going? Or the brain implant?

He is good at marketing and developing existing technology.

But of his announced revolutions, none works.

Reliability? Name one major cloud service where you can count on not getting randomly banned overnight. And yet people still use them.
Like I wrote, it's not about technology but it's chief

Next week he tweets he will take the service offline to buy AWS and then calls it off, that kind of reliability

> I don’t see how they’re going to commercialise this as a cloud compute service.

The simples most obvious would be “give me your datasets and we’ll train your model”

I assume you never actually built any cloud infrastructure yourself. Plus Tesla (aka) Elon, well, say a lot of stuff, not always necessarily correct.

Internal research product is super far from any actual production usage. Especially if you go against some established paradigms, that require enormous amount of effort (more than developing silicon) to build tolling around, so people can design, program, debug, monitor it.

But that’s internal usage. Cloud is a totally different ballgame. You have to deal with thousands more requirements (and you cannot generally tell customer to do something else instead, as you can with internal teams). And customers that have operating procedures totally different from yours, 0 access to your internal knowledge and infinite less tolerance for BS answers (as you are paying customer, not a someone on the same boat).

Building cloud is extremely hard, and there’s a reason why Google is still losing money on it.

Plus, let’s even say that your 5 year estimate is correct, Dojo is amazing and the future of tech and they may have viable product by then. Do you think that Nvidia wont advance their AI offering by then? Google TPU will stop being developed? Or will Tesla continue investing to churn new generation of Dojo every year?

> You have to deal with thousands more requirements (and you cannot generally tell customer to do something else instead, as you can with internal teams).

You can. AWS started with S3 when everyone was using databases. As long as it’s cheaper than its competition, single use-case (you won’t serve a website on these) has a market.

> You can. AWS started with S3 when everyone was using databases.

AWS staryed when there was no competitor.

Google started with a ton of world-class expertise when AWS was up and running and while operating already a colossal network of server farms using special-purpose, which Tesla has none of which, and after all these years barely got a 10% market share.

What they want is a training engine that is cheaper than whatever AWS or Google (or anyone else) can offer. If I can point my PyTorch to it instead of an AWS GPU for less money, why not?
> What they want is a training engine that is cheaper than whatever AWS or Google (or anyone else) can offer.

Bold assumption, considering Tesla's hardware does not exist, the market is limited and Google has already years of providing machine learning services with special purpose hardware.

> Tesla has a path to economies of scale: they already announced that if Dojo works as expected they'll make it available to others as an AWS-style service.

"If we manage to put together a working processor, supporting hardware, OS, and possibly ad-hoc programming language, our next step is to also develop a bunch of web services to provide cloud hosting services."

Not very credible. As if the key to offer competing cloud hosting services is developing the whole hardware and software stack.

And network infrastructure, isolation between customers, scheduling hardware allocation, etc etc running one own data enter is quite different than inviting all sort of third parties in.
Yeah, but it’s not like this is rocket science or anything.
> Yeah, but it’s not like this is rocket science or anything.

The key difference between this goal and SpaceX is that Elon Musk bought a private space company that already had the know-how and the market presence in a market with virtually no competitor.

In this case, Tesla is posting wild claims about developing the whole vertical integration of the whole tech stack barely over mining semiconductor raw materials, with which goal? Competing with the likes of AWS, Google, and Microsoft, on a very niche market?

Digging holes in the ground is hardly rocket science as well.

He did what? The pre-existing know-how to build reusable rockets? Are you confusing SpaceX and Tesla?
You say that Tesla might do this for others, AWS style.

Then talk about the upfront in house costs of setting up for GPU ops. But ignore that if an AWS style model works for you, well, AWS is already capable of giving it to you in GPUs.

They aren't going after economies. If you look closely at their design choices, they are building a pure scale-out vector machine unlike anything else currently on the market. I'm guessing they expect it to be head & shoulders ahead for their inhouse workload.
Cerebras could decide to compete in that space.