Hacker News new | ask | show | jobs
by pixelpoet 1684 days ago
Having written realtime ray tracers in the classic demoscene style on e.g. 300 MHz Pentium 2 Celeron (the original one with no cache, that overclocked to 450-550 MHz), I sometimes wonder about how cool it would be to open source a modern rendering engine around the time of the first Pentium 3 with SSE, in 1999.

You could completely revolutionise computer graphics on that era of hardware, with the view to increasing vectorisation, and probably strongly steer it towards ray tracing instead of rasterisation (even skipping over the local minimum of k-D tree methods, the introduction of Surface Area Heuristic and eventually settling on modern BVH building and traversal).

4 comments

I was under the impression that the theory was there but the hardware was not. Like, rasterization was a necessary evil because it gave better results more quickly (and artists needed that feedback).
Yes, absolutely right, and for games it was the only option with no fast floating point and very limited memory.
First time reading about BVH. Sounds like Kirkpatrick's hierarchy, but in arbitrary dimension.

What's the advantage over BSP/kD-trees/octrees?

And what do you mean by rasterization - we still have to deal with pixels in the end, so it has to happen somewhere? (..I'd love to play with a color vector monitor though!).

> What's the advantage over BSP/kD-trees/octrees?

With BVH, the partitioning is fundamentally over lists of objects rather than space; if you split by space, you can/will have objects on both sides of the splitting plane, leading to duplicate references.

Doing it by lists means there are no duplicate references, however the combined bounding volumes can overlap, which is to be minimised, subject to the Surface Area Heuristic cost. It winds up being something like a quicksort, although for the highest quality acceleration structures you also want to do spatial clipping... this is an extremely deep field, and several people have spent considerable part of their professional career to it, for example the amazing Intel Embree guys :)

It also happens to work out best for GPU traversal algorithms, which was investigated by software simulation quite a few years ago by the now-legendary Finnish Nvidia team, and together with improvements on parallel BVH building methods and further refinements is basically what today's RTX technology is. (As far as I'm reading from the literature over the years.)

Here's a fundamental paper to get started: https://research.nvidia.com/publication/understanding-effici... (Note that these are the same Finnish geniuses behind so many things... Umbra PVS, modern alias-free GAN methods, stochastic sampling techniques, ...)

> And what do you mean by rasterization - we still have to deal with pixels in the end, so it has to happen somewhere?

At a high level, you could think about it as where in your nested loops you put the loop over geometry (say, triangles). A basic rasterizer loops over triangles first, and the inner loop is over pixels. A basic ray tracer loops over pixels, and the inner loop is over triangles (with the BVH acting as a loop accelerator). Just swapping the order of the two loops has significant implications.

Octrees divide space into regular sized chunks. BVH divides space into chunks of varying size, but with a balanced population of bodies in each. The idealized BVH divides the population by 2 at each level.

Compared to octrees, BVHs deals well with data that’s unevenly distributed. At each level you split along the axis where you have the most extent. Finding the pivot is the interesting part. When I recently implemented a BVH from scratch, I ended up using Hoare partitioning and median-of-three and it worked really well. The resulting structure is well balanced, splitting the population of bodies roughly in half at each level, and that’s not even the state of the art, that’s just something my dumb ass coded in an afternoon.

The game uses ray casting, not ray tracing. Ray casting is when you send a ray once for every column of pixels to get a distance to a wall. Also, if the walls are only horizontal or vertical the calculations get simpler.

Also I wonder how you can achieve clock frequency like 300 MHz without a cache. Shouldn't CPU stumble on fetching every command?

> The game uses ray casting, not ray tracing. Ray casting is when you send a ray once for every column of pixels

Both ‘ray casting’ and ‘ray tracing’ are overloaded terms, the distinction isn’t as clear as you suggest. You’re talking about 2d ray casting, but 3d ray casting is common, and means to many people the same thing as ‘ray tracing’. Ray casting “is essentially the same as ray tracing for computer graphics”. https://en.wikipedia.org/wiki/Ray_casting

There’s also Whitted-style recursive ray tracing, and path tracing style ray tracing, but ray tracing in it’s most basic form means to test visibility between two points, which is what ray casting also means from time to time.

I've given up on the semantics of "ray-tracing", everyone has their own opinion. However it's fairly common for "ray-casting" to mean non-recursive, and the wiki article you link explicitly says this.

I think the biggest difference between ray-type algorithms is everything vs ray-marching, because regardless of recursion, and strategies towards lighting, texture sampling and physical realism, with ray-marching a single ray is not really a ray at all but lots of little line segments, and you don't usually bother finding explicit and intersections which gets really complex and expensive... that's the whole point, instead you find proximity or depth, which means you can render implicit surfaces like fractals.

Yes, ray casting does not ever imply recursion, it’s simply referring to casting a ray to test visibility from one point to another. Ray tracing now most commonly means exactly the same thing, and expecting anything else will often lead to confusion.

“Ray marching” is also overloaded ;) but what you’re referring to (also and originally called ‘sphere tracing’) is a new and separate idea from either ray casting or ray tracing (if you’re thinking of something other than ray casting when you say that.) Ray casting/tracing is most often done using non-iterative analytic intersection functions, where the style of ray marching you’re referring to is a distance field query, not a point to point visibility query, so ”ray marching” generally implies a different traversal algorithm, and (usually) a different representation of the geometry.

You can use any/all of these to build path tracers, but they come with different tradeoffs.

The Covington Celerons had an L1 cache but no L2 cache. The Pentium II of the era had an off-die L2 cache. So the accelerometer was a binned Pentium II without any L2 cache. A later model of the Celeron was released with a 128k L2 off-die L2 cache.

At its base clock speeds the Celerons were middling chips. But since they readily overclocked you could get them up to 450-466MHz. They wouldn't be equivalent to the same speed Pentium II (because of no L2 cache) but they punched above their weight for the price.

The Celeron-A's had on-die cache, making them pretty much a match for a regular P2 of the same clock/bus speed. There was also a mobile P2 with 256K of on-die L2, predating Coppermine P3's.
I must have misremembered, it's been a long time. I swore there were Celeron 300As that were Covington cores with the external L2 cache but it makes sense they were the Mendocino cores with the on-die L2.

I really wanted an overclocked Celeron set up but for the year or so they were hot shit my upgrade money went to storage. That was more pressing for me at the time. By the time I was due for (and could afford) a system upgrade I was able to go directly to a Pentium III.

Is there somewhere I can read more about the history of these different techniques?