Hacker News new | ask | show | jobs
by bsder 1522 days ago
What is extremely telling is what is missing ... Design Rule Checking (DRC) and Layout Vs Schematic (LVS).

These require:

1) Longer bit length arithmetic

32-bit float simply isn't enough. 64-bit float is close, but limited. You really want 128-bit integer. And nVidia isn't delivering that.

2) Real algorithmic improvements

We're still stuck with computational geometry algorithms that don't parallelize. It would be awfully useful if nVidia would actually research some new algorithms instead of just waving around the ML/AI marketing wand.

But, then, this is the company that built itself on benchmarketing, so ...

4 comments

Can you explain why such large numbers are required?

Back-of-the-napkin maths is that a chip that is 3cm on each side -- which is huge -- can be subdivided into 0.007 nanometre increments using 32 bit integers. That's 1/7th of the diameter of a hydrogen atom!

The resolution with 64-bit floats (let alone integers) would be absurd, roughly a million times finer-grained still. That's probably enough to simulate individual electrons zipping around in their orbitals with acceptable precision.

Even if the simulation codes did something silly like simply assigning 1.0 = 1cm, a 64-bit float still allows resolutions of something like a billionth of a nanometre...

> Can you explain why such large numbers are required?

Absolutely.

Even if you start with 32 bits, you often have polygons with many sides. In the worst case, you are modeling a "circle" and have to increase your precision to enough level to be accurate (please note that nobody in the right mind in VLSI would ever draw a "circle"--however, you wind up with an "implied" one due to DRC, more down below ...)

The problem is that line sweep intersection checks in DRC require approximately 3n+a couple bits to differentiate intersections that may be close to degenerate or have multple intersections near to each other. So, if you start with 32-bit numbers, you require approximately 96 bits plus a little for your intermediate calculations. (See: Hobby -- "Practical segment intersection with finite precision output" -- I'll let people find their own copy of the paper so HN doesn't splatter some poor site that I link)

You would think that doesn't matter since VLSI tends to limit itself to rectilinear and 45 degree angles. Unfortunately life isn't that simple.

If you take a simple rectangle and say "Nothing can be within distance x", you get a slightly larger rectangle parallel to the sides. Easy. The problem is that you also wind up with an implied quarter circle (told you this would come back) near each corner. Not so easy.

Put those circles such that they overlap only very slightly and you may have segments that are pretty close to tangent. Super not easy. Unfortunately, VLSI design often consists of putting those metals such that they are riiiight at the limit of spacing. Consequently, your super-not-easy case also becomes a very common case. Ouch.

Of course, you could just move the rectangle completely outward so that you have squares at the corners. However, that gives up a non-trivial amount of area that most places aren't willing to concede.

There is a reason why Siemens (nee Mentor) Calibre is so egregiously expensive.

Disclaimer: I have zero silicon design experience.

However, I have designed computer game engines that use 32-bit floats throughout and encountered rounding errors in practice.

I’ve found that there’s always a solution that avoids the need to go past 64 bits, and even that is a last resort.

So for example the circle could be approximated with a polygon. Or fixed-point arithmetic can be used. Or simply use a quad-tree or related space partitioning algorithms to check for intersections.

There are literally thousands of algorithms that sidestep these issues and are used extensively in computer games, typically at 32-bit precision exclusively.

For example “back in the day” you would often see shimmering due to “z-fighting”. You would also often see white pixels due to “cracks” between adjacent polygons.

All of these are largely gone now. The problems have been solved without the enormous performance hit of 64-bit doubles, let alone 128!

Meanwhile contemporary CAD programs would insist on using 64-bit doubles, even through the OpenGL pipeline out to the screen.

But if you sit down for a second and just divide your screen (or wafer mask) by the range of your chosen numbers you’ll instantly see that you have thousands of steps per pixel (or atom!) to work with.

Any visible noise is your fault for choosing the wrong algorithm, not the fault of the hardware for not providing enough bits.

Please keep in mind that even the slightest error in a silicon mask is likely to cause hundreds of millions of dollars of losses and months of delay in time to market for a modern chip.

With that in mind, does it make more sense to come up with new, experimental, untested algorithms... or just use wider numbers and slowly iterate on well known algorithms? Especially with LVS/DRC you really want the dumbest, easiest to reason about thing that is most likely to catch design issues no matter what. Even if it's excruciatingly slow, it's your last line of defense against writing off a set of masks as a hundreds of millions of dollars loss.

EDA / silicon CAD is a totally different world of design requirements compared to video games or even MCAD software.

EDIT: and just for context, here's a DRC set for the (very much not modern) SKY130 process: https://github.com/RTimothyEdwards/open_pdks/blob/master/sky...

The exact same arguments were made by CAD people insisting on 64-bit maths for OpenGL. They were wrong. They too were working on projects worth billions of dollars, over decades, where mistakes were very costly.

Your link to a "DRC set" doesn't mean much to me out of context. I see some basic looking code with small-ish numeric constants in it. So what? This is not that different to the input to a simple physics simulation or a computer game.

So let's get this straight. You know nothing about this area and you assume the experts in it are wrong? Do you know what happens if you accidentally couple lines during one of the manufacturing steps? The wafer can, in the absolute worst case scenario, explode from super heating destroying not just the wafer but potentially the entire chamber it is in (any defect beyond what was designated as allowable by the design engineers means the chamber and everything in it is now scrap).

For somewhat obvious reasons, we have a vested interest in this never occurring. So we default to safety over speed. Meanwhile in the CAD world with 64-bit math not making it into OpenGL, they just wrote a library to do 64-bit math anyways on-top of or in parallel to OpenGL. They didn't switch away from 64-bit math, they just reduced its use where it isn't needed and kept it where it is needed. The semiconductor industry is full of absolutely brilliant engineers who know far too much about all of the problems and if they could use 64-bit instead of 128-bits for a data structure, they'd switch in a heartbeat to save massive amounts of compute time (and thus money).

Replying to myself to add:

Here's a Design Rule Checking (DRC) paper that mentions using 32-bit floats for things like checking distances between polygonal traces: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49...

If 32-bit is sufficient, then 64-bit definitely is, even for the latest & greatest silicon processes...

Of course when you're doing such intersection calculations you know the things you're intersecting are very close. You don't need a general method that can test arbitrarily sized and spaced polygons against each other. You need a method to determine what is sufficiently close to each other to be worthy of a more detailed check. Then a more specific method to do this check.

You could use 32 bit integers with all shapes specified vs say a 0.1 nm grid giving you around a maximum 0.4m x 0.4m chip size which seems ample. Then when you want to check for rules violations in the cases like you specify with very fine precision use a dedicated check that can assume the relevant geometry is within a small number of grid points of each other. For example the check could work using relative coordinates rather than absolute so say switch to a grid on a 0.00001nm basis (to pull an arbitrary precision out of a hat) and convert the 32-bit absolute 0.1nm coords to relative 32-bit 0.00001nm coords.

Easier said then done to be sure (as you say the tools are egregiously expensive!) but just saying I need a 64-bit or a 128-bit float isn't trying to get to the grips with the problem, just hoping you wave it away with more bits.

> You need a method to determine what is sufficiently close to each other to be worthy of a more detailed check.

Your line sweep data structure effectively already does that. I recommend reading the Hobby paper and thinking about how it works. And then you should think about how you differentiate inside from outside when you union/difference your polygons.

Any segments in the line sweep data structure simultaneously have already demonstrated that they need the detailed check.

If you want to argue this, you're going to need to study up on about 40 years of prior art. Given how much money this is worth and how many really smart people went after it (it basically drove the field of Computational Geometry for decades), the probability of you contributing something new to the current algorithms is basically zero.

However, the probability of you contributing something new to parallel DRC algorithms is really quite decent. Nobody I know of has yet come up with "good" parallel algorithms for DRC--most of them are hacks that quite often break down when they hit even common cases.

Being able to handle DRC on a billion VLSI polygons/10 billion line segments in a parallel fashion would be quite an advance, and the field is waiting for it.

> The resolution with 64-bit floats (let alone integers) would be absurd, roughly a million times finer-grained still.

Careful there! Floating point numbers do not form a proper field, not even a semi-group. Due to the uneven distribution of elements, the field axioms don't hold (e.g. both commutativity and distributivity can be violated) and great care has to be taken to assure the numeric stability of computations.

My physics professor had some good examples of how numeric precision vastly outstrips reality: if modelling a 1m iron bar using 32-bit numbers, the error in length is substantially less than a dust mote landing on the end of it. It's about the same as a virus (not a bacterium) on the end... or not. The oil from a fingerprint is thicker. The mere presence of a human in the room will warm up the iron rod enough to cause it to expand more than this.

You only get physically significant errors when using iterated algorithms where the errors accumulate, or when doing what amounts to equality comparisons, which is almost always an error.

Note that 64-bit numbers aren't "twice" as precise.[1] They're four billion times more precise. Going to 128 bits is absurd beyond belief. Numbers like these would allow the entire visible universe to be modelled, down to the width of a proton. You do not need 128 bit numbers for anything made on Earth, by humans, ever. If you think you do, you've made a mistake. It's as simple as that.

[1] floating point numbers and integers are obviously different, but the concepts are the same. A 64-bit double is "just" 536 million times more precise that a 32-bit float, but that is still an awful lot of precision for anything made of matter...

> [1] floating point numbers and integers are obviously different, but the concepts are the same.

No they're not. That's the entire point. 32 bit IEEE floats get you 6 to 9 significant digits, whereas 64 bit IEEE floats get you 15 to 17 significant digits. Loss of significance and catastrophic cancellation are real problems in numerical analysis.

Physics in particular doesn't often have closed form solutions, so you're forced to use iterative approximations. Same goes for large matrix operations, which is why you actually have to be very careful with the algorithm you choose and the order of operations.

If you're still not convinced, feel free to try it yourself:

  #include <iostream>
  #include <iomanip>
  #include <cmath>
  #include <tuple>

  // solve ax^2+bx+c=0
  template<typename T> auto
  solve_quadratic(T const& a, T const& b, T const& c) -> std::tuple<T, T> {
    auto t = std::sqrt(b*b - T(4)*a*c);
    return {(-b + t) / (T(2)*a), (-b - t) / (T(2)*a)};
  }

  int main() {
    std::cout << "Expected x1=0.000000075 x2=-200.000000075\n"
    std::cout << "Single precision:\n";
    {
      float a=1.0f, b=200.0f, c=-0.000015f;
      auto [x1, x2] = solve_quadratic<float>(a, b, c);
      std::cout << "x1=" << x1 << ", x2=" << x2 << "\n";
      std::cout << "ax1^2+bx1+c=" << (a*x1*x1 + b*x1 + c) << "\n";
      std::cout << "ax2^2+bx2+c=" << (a*x2*x2 + b*x2 + c) << "\n";
    }
    std::cout << "\nExpected x1=1.000000028975958 x2=1.000000000000000\n";
    std::cout << "Double precision:\n";
    {
      double a=94906265.625, b=-189812534, c=94906268.375;
      auto [x1, x2] = solve_quadratic<double>(a, b, c);
      std::cout << std::setprecision(16);
      std::cout << "x1=" << x1 << ", x2=" << x2 << "\n";
      std::cout << "ax1^2+bx1+c=" << (a*x1*x1 + b*x1 + c) << "\n";
      std::cout << "ax2^2+bx2+c=" << (a*x2*x2 + b*x2 + c) << "\n";
    }
  }
Even 64-bit IEEE floats don't help if you use the standard quadratic formula. Note that you can improve the 32-bit case above significantly by reformulating the solution to

  template<typename T> auto
  solve_quadratic(T const& a, T const& b, T const& c) -> std::tuple<T, T> {
    auto sign_b = b < 0.0 ? -1.0 : 1.0;
    auto t = -b - sign_b*std::sqrt(b*b - T(4)*a*c);
    return {(T(2)*c) / t, t / (T(2)*a)};
  }
This simple change will yield the precise result (within the fp32 precision) for x1 and x2 with a=1, b=200, c=-0.000015 (i.e. the error of ax²+bx+c will be lower than the 9 max significant digits).

However this won't help with the second (64-bit) example, which will still be wrong after the 8th digit (i.e. well within the supposed 12 to 15 significant digits of a 64-bit IEEE float).

Also note that in both cases all numbers involved have less significant digits than supported by the respective FP format. Just to give you a little insight on why available bits ≠ precision in floating point maths.

> You really want 128-bit integer. And nVidia isn't delivering that.

How much slower (per unit area) is that to do in software, compared to a full 128-bit hardware unit?

"as of 11.5, CUDA and nvcc support __int128_t in device code when the host compiler supports it (e.g., clang/gcc, but not MSVC). 11.6 added support for debug tools with __int128_t."

See:

https://developer.nvidia.com/blog/cuda-11-6-toolkit-new-rele... https://developer.nvidia.com/blog/implementing-high-precisio...

DRC and LVS are just logical checks right?

“Is the minimal distance between all metal routing > 10 nm” etc.

Can you explain why high precision is needed for that?

Having worked in EDA myself, though not on these final signoff steps, I agree that purely geometry-based checks really don't need doubles. Most of it should just be done as 32-bit fixed point. Both because it's better for performance, and because it drives home the point that you need to think carefully about precision issues for correctness reasons. Using doubles is just a band-aid.

I'm less confident about it when it comes to anything that involves calculating anything electromagnetic because I just don't know that subfield.

Can you explain why you need greater precision/range?