Hacker News new | ask | show | jobs
by foobard 2601 days ago
> In a media briefing ahead of today’s announcement at Oak Ridge, the partners revealed that Frontier will span more than 100 Shasta supercomputer cabinets, each supporting 300 kilowatts of computing.

So 30 megawatts of computing, plus cooling and other supporting services. How do you power something like this? Does ORNL have their own power station (given they have reactor(s) on site)? If power comes from an external station do they coordinate with the station operator when bringing a system like this online?

5 comments

As has been noted in other comments, we do not have a power station at ORNL. We buy power from TVA at about 5.5 cents per kW hour which in part is because of the locality of the lab to TVA power plants.

TVA recently completed a 210 MW substation on ORNL's campus to better serve our needs. We do not need to coordinate with them for large runs on the machines.

Nice :-) Back in the day in the UK the RAE royal aircraft establishment twinwoods had a direct line to a local power station for their wind tunnels and used to control the speed form the power staion
With that much gear and those kind of loads do you still have a traditional UPS / transfer switch / genset arrangement for everything in the room? If not, how do you manage short duration power outages?
Yep, we have battery-backed generators for UPS and a transfer switch at the 480-V feed that comes into the room but it is not enough to power the compute nodes. The UPS allows cluster management nodes and the parallel filesystem (which is a small cluster by itself) to ride through full outages and other PQE.
Thanks for taking the time to provide context in thread!
Oak ridge national laboratory was built where it is partly because they could get lots of cheap power from the TVA, so probably from that. (TVA is a regional electricity provider that operates a lot of hydro plants.)
For those who are curious, a typical American home uses of order a kilowatt, time-averaged (10,400 kWh per year = 1.2 kW). So 30 MW is roughly the average power usage of a city of 30,000 homes, or 80,000 people, although total capacity will be larger to handle fluctuations.
...and yet, even a machine at this scale, or even 100 times that, could not come close to being on par, in terms of neural/neuron simulation, with that of the human brain.

We are definitely "doing something wrong" when it comes to artificial neural networking; even though are models are much simpler, it still takes an enormous amount of computing power, both in terms of raw CPU as well as actual electrical needs, just to be able to simulate things at a small scale (and if we use more accurate models, based on what we know about the brain and neurons, then at best our simulations can only be run to simulate, over our actual-time, what would be in actually fractions of a second in real-time).

That our brains can do so much using so little power (wattage), with such a high number of nodes and interconnections that dwarf anything we've so far have managed to simulate - it's a bit mind-boggling and humbling.

I just wonder where and what the issue actually is.

Why do our current practical models of a neuron, which are vastly simplified, require so much power to run at scale?

Is the issue related to the fact that they are simplified models, and actual neurons with their complexity are able to do things we don't yet understand or know about?

All of this is also related to back-propagation; such a thing doesn't seem to exist in nature (jury is still out on the theory, though) - so how do biological neural nets "learn"?

If we could eliminate or reduce the need for backpropagation, would that lower our power requirements for artificial network implementations?

As someone who has merely dabbled with artificial neural networks, these questions and conundrums fascinate me, and cause me to attempt to think up potential solutions, however far-fetched.

I highly doubt I will be the one to solve the issue, but I do hope to see it solved within my lifetime.

The main difference between the brain and a CPU is that the brain runs at a much lower, variable clock speed (10-100Hz) to reduce switching costs and makes up for this with extensive pipelining and parallelization when possible. The high node count and necessary connectivity is then possible due to the use of directed self assembly.

At any rate, it is quite likely that the neocortex of the brain simply computes a function(s) recursively upon sensory input (see Chomsky's minimalist program for suggestions on what it could be). What is unclear is what this function is, and how it comes about - for this reason the approach by some has been to attempt to simulate an entire brain to see what it does. But without the necessary abstractions, this will be inherently wasteful and generates nothing new other than validating your experimental data.

Connectivity. ICs are more or less 2D structures. That leads to less efficient connectivity and packing.
Probably current AI is base on FP64, FP32, biological neuron likely be large network of only one or few bits. Once we figure out how to build, use, optimized large network of nodes which only process a few bits each, the power consumption might just go way down.

Someone might also find that kind of network can run a KHz like human brain cells instead of MHz, GHz. The power usage will go down even more.

It seems you’re confusing AI and brain simulation.
brains can't run high resolution simulations of brains, but supercomputers can.

neural networks are only a small component of what we use supercomputers for.

They do not have their own power station. They have the Bull Run coal plant and hydro plants in the area. They do coordinate with TVA before a run.
They might coordinate with TVA for transients (i.e. going from an idle machine to a full-machine run), but in normal operations these systems are at least 50% full. In my experience as a user these machines are more than 80% most of the time. I don't have hard numbers on the average utilization over the last week/month/year (these might even be classified), but you don't by and build such a machine to be idle.

I could find the numbers for two German super computers I have used in the past. SuperMUC (Phase 1 and 2 combined) had "above the desired 85%" utilization in 2017 and Hazel Hen at HLRS in Stuttgart reported a utilization "between 92% and 98%" in 2017.

> When I took a tour of the Oak Ridge Leadership Computing Facility a few years ago, Buddy Bland, who is Project Director, told me he could tell when he comes to the Lab in the morning if they are running the LINPACK benchmark by looking at how much steam is coming out of the cooling towers.

https://blogs.mathworks.com/cleve/2013/06/24/the-linpack-ben...

Different programs consume dramatically different amounts of power.

That is true. But very few super computers track power consumption for different codes and do power aware scheduling. SuperMUC and IBM actually do a lot of research on this, because it is a rather new field in HPC.
Most of the super computers today have their own power station on site. I know blue waters at UIUC had one, which I believe caused a power outage at one point.