| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by doctorpangloss 3615 days ago

I don't know if using Xeon Phi for rendering makes that much sense. It's sort of the problem it's least competitive to solve on a raw performance, performance per watt or development cost basis.

> However, ‘smaller’ is a relative term as current visualizations can occur on a machine that contains less than a terabyte of RAM. Traditional raster-based rendering would have greatly increased the memory consumption as the convoluted shape of each neuron would require a mesh containing approximately 100,000 triangles per neuron.

That sounds like a poor approach to this problem. You could write a shader that renders thick lines for the dendrites, and the rest of the geometry can be conventional meshes. The same shader could have a pass specially designed for lines and depth of field rendering. That's the one unusual shader. It's hard, but not super hard to write. [0]

Besides, unless you need this to run in real time (which the Xeon Phi doesn't anyway), you could just raster render and page in the mesh data from wherever. So what if it's slow.

I think highly technical platform decisions like Xeon Phi versus NVIDIA CUDA is really about the details. You have to educate the reader both on the differences that matter and why they should choose one over the other. The comment in the article, "no GPU dependencies," is a very PR-esque don't-mention-your-competitor dance around what they're actually trying to say: the CUDA ecosystem can be a pain since you can't buy the MacBook Pro with the GTX 750M easily, installing all its drivers is error-prone, SIP gets in the way of everything, Xcode and CUDA updates tend to break each other, etc. etc.

I sound like I know what I'm talking about, right? Intel's just not getting it. Show a detailed application of where Xeon Phi really excels. NVIDIA's accelerated science examples go back a decade, and some, like the accelerated grid solved Navier-Stokes fluids examples, are still state of the art.

The competition in rendering is intense. Some level of production-ready renderers like Arion, Octane and mental ray (specifically iRay, NVIDIA's GPU accelerated renderer) perform best or are exclusive to the CUDA platform. Conversely, you probably get the most flexibility from a platform like VRay or Renderman, whose support for GPU acceleration is limited. Intel embtree has a great presence today in baked lighting for game engines, but I think NVIDIA's OptiX is a lot faster.

[0] https://mattdesl.svbtle.com/drawing-lines-is-hard

3 comments

ActsJuvenile 3615 days ago

> That sounds like a poor approach to this problem. You could write a shader that renders thick lines for the dendrites, and the rest of the geometry can be conventional meshes. The same shader could have a pass specially designed for lines and depth of field rendering. That's the one unusual shader. It's hard, but not super hard to write. [0]

You will be surprised how bad medical research and visualization is compared to their gaming counterparts. Most medical researchers use 5-10 year old technological approaches they learned in their PhD program.

On a side note, I have yet to see a Phi-vs-CUDA comparison. Intel is comparing Phi to Pentiums, which is utterly ridiculous.

link

milcron 3615 days ago

Here is a comparison of the previous generation: https://www.xcelerit.com/computing-benchmarks/libor/intel-xe...

They hold their own against GPGPU, but are probably the inferior choice if your code already runs on a GPU (OpenCL/CUDA).

The real advantage of the Phi is of course combining this nearly-as-good-as-GPGPU parallelism with the x86_64 toolchain and infrastructure. x86 supports more languages with more libraries, and is easier to develop for.

link

berkut 3615 days ago

That's not quite fair - some of the research into volumetric medium interaction and scattering is way ahead of the VFX / Gaming fields...

link

CyberDildonics 3615 days ago

> It's sort of the problem it's least competitive to solve on a raw performance, performance per watt or development cost basis.

This is not true for anything beyond running compute shaders on large 1D, 2D, or 3D buffers. Just because something is 'graphics' doesn't mean that a GPU is automatically faster.

> Production-ready renderers like Arnold, Octane and mental ray (NVIDIA's renderer) perform best or are exclusive to the CUDA platform.

Arnold is a CPU renderer, Octane is FAR from what I would consider 'production ready' and mental ray is also a software renderer. Renderman does not use any GPU acceleration.

> I sound like I know what I'm talking about, right?

Not even slightly

link

doctorpangloss 3615 days ago

My bad, I wrote Arnold instead of Arion, I mix them up when writing it out. iRay is sort of a feature of mental ray, I guess if you're being pedantic. Octane isn't production ready, but I suppose if you're used to building render farms it's not production ready. It's certainly production ready for someone paying for all those licenses.

> This is not true for anything beyond running compute shaders on large 1D, 2D, or 3D buffers.

Yes, but rendering is a shader on a bunch of those buffers right? That's what I wrote. I'm not 100% confident that you can efficiently render with conventional shaders what they showed in that frame. But I think you can. You could at least cull and tesselate tubes on the GPU, if you really don't want to write a shader.

link

CyberDildonics 3614 days ago

> Yes, but rendering is a shader on a bunch of those buffers right?

No. Tracing rays is fundamentally a sorting problem when dealing with the acceleration structure. Rasterizing samples means accumulation of values and weights, which means either atomics or separate buffers (and if you are using the GPU creating a buffer for every core is out of the question). You could sort the samples into buckets and rasterize those separately, but you are again faced with GPU partitioning at the very least.

There are plenty of ways to use the GPU to do all aspects of rendering, but it is not even remotely as trivial as you are making it out to be.

link

berkut 3615 days ago

Drawing lines is a stupid way of doing it, as:

1. With the mess of overlapping lines they've got, you'd suffer from severe overdraw (which is where raytracing really shines in terms of efficiency) as you can't efficiently cull lines (without clipping them)

2. You wouldn't get the ambient occlusion look where lines close to each other occlude / darken.

As someone who's previously compared Embree and OptiX (and we were given free hardware and support from Nvidia), Embree stacks up really well, and a dual Xeon can match a single top-of-the-line GPU fairly easily for pure ray-intersection performance.

Once you start putting complex shaders and layered materials on top, GPUs start to really suffer: there's a reason a lot of the GPU renders are mostly being used for clean renders like archvis / product design / car renders - they're simple to render. As soon as you stick dirt layers on top, their efficiency really starts to plummet.

link

doctorpangloss 3615 days ago

> stupid way of doing it

I guess it really depends on what the objective is. I'm not talking speculatively, but concretely it seems like a reasonable way to achieve a few images that they show in the press release. They show two relatively flatly rendered lots-of-tubes images. I know SSAO isn't the same, and I get that there's overdraw, but there are a lot of details in the particular objective they want. In one shot, they show a lot of emissive tubes with depth of field, which is harder to achieve. I suppose if they're happy, they're happy.

> interactive performance for all datasets on a regular Intel Xeon processor, which can render images at 20-25 frames per second (FPS)

There's a big difference between interactive performance and a production-quality render. Something tells me it's not producing 25 frames of noise-free render per second. There isn't enough information here.

> and a dual Xeon can match a single top-of-the-line GPU

At what, like 3x-5x the price? At how many watts? And at what I.T. complexity? A GTX 1080, at better performance than a Titan X, is really a phenomenally good deal. Especially considering I can drop it into an existing workstation with all of my existing software installed on it; especially considering I can rent out computation time on Amazon by the hour.

I guess what I'm reacting to is how forced of an example it seems.

link

berkut 3615 days ago

I think the objective is rendering a ridiculous amount of stuff - the fact they talk about "lots and lots of RAM" indicates there's no way a GPU is going to be able to render it efficiently without an aggressive culling step: GDDR5 might be very fast, but you've got to get the data onto the GPU first and probably page data as well. This is very often a significant bottleneck for GPUs, and is another reason GPUs aren't used for VFX rendering (at high-end), as 16 GB isn't anywhere near enough.

Production-quality render implies decent lighting and materials - this stuff has neither, so shading is likely to be negligible, and then you're going to be generally constrained by ray / primitive intersection performance.

No, cheaper (for CPU) : two ~$950 CPU cores vs $3,300 GPU. Granted you need a dual-socket system and twice the RAM to balance and it's easier to stick multiple GPUs in a system than make the jump to 4 sockets, but GPUs aren't really that much of a win... Thermal output and power usage is often worse for GPUs as well.

link