CPU vs GPU Discussion

notareal

It's not that simple to say that path tracing would offer a better result than photon mapping. But sure path tracing is far more "user friendly", photon mapping requires usually some scene based tuning. I would not say that GPU rendering will be limited to path tracing... it just seems to be a simplest to implement and so offered first.

nVidia has CUDA. OpenCL is open... and ATI supports it, but nVidia has also released OpenCL drivers. So for me, in a long run, OpenCL sounds more interesting. Don't know if there will be significant performance difference between these two.

remus

@dale said:

Thanks for the input. But Of course I have more questions. Cuda is a NVIDIA format is it not, so does that mean other video card developers are developing other formats?
And as for my understanding of pathtracing, am I correct in my assumption that it follows the photon path from the camera to the light source vs photon tracing (mapping?) which does the opposite, traces from light source to camera?
What makes one better than the other? or are they? And am I misinformed that path tracing allows for better shading?

CUDA is indeed an nvidia format, openCL is the other alternative i know of but its not quite as mature as CUDA at the moment.

As you say, path tracing follows the photon from the camera to the light source, i havent heard photon tracing, but the major advantage of path tracing is that its pretty efficient, as by definition every path thats calculated will be part of the final image.

remus

With regards to why path tracing is used for GPU based renderers (at the moment), its to do with the fact that each path can be calculated independently of the others, thus each thread on the GPU can be run separately from the other and does not need to wait for results form other threads to continue running.

dale

@remus said:

With regards to why path tracing is used for GPU based renderers (at the moment), its to do with the fact that each path can be calculated independently of the others, thus each thread on the GPU can be run separately from the other and does not need to wait for results form other threads to continue running.

Are any rendering programs equipped to cut the calculation time involved in rendering by essentially saving a base photon path trace, and only alter what is necessary in a scene if you for instance alter a material?

remus

Not in the way your thinking. The problem is that with materials that have complex properties (SSS, specularity etc.) they can send paths of all over the place interfering in lots of different places, so if you change the material but keep the paths, the paths may well be wrong leading to an incorrect render.

Edit: a good example of this would be a glass ball. If you render this you will get lots of pretty caustics, but if you then went to re render the ball with a diffuse material, the caustics shouldnt be there.

Any renderer that has a multi-light style feature is using a similar idea to this, though. Very roughly, it just remembers which paths are from which light source, so when you play with the light intensity/colour it just adjusts the relevant pixels accordingly.

dale

@remus said:

With regards to why path tracing is used for GPU based renderers (at the moment), its to do with the fact that each path can be calculated independently of the others, thus each thread on the GPU can be run separately from the other and does not need to wait for results form other threads to continue running.

Same quote different question.
Maybe this is at the base of what I am trying to understand in how CPU, and GPU processing differs. Because the GPU is solely dedicated to graphics performance it theoretically has a better performance? So this is something that is only to do with speed, not necessarily quality? In other words even with a slower CPU based machine you would still be able to produce a rendering of equal quality with those of a GPU based machine, if you are willing to let it cook for a longer time period? If this is the case then about how much longer?

dale

@notareal said:

nVidia has CUDA. OpenCL is open... and ATI supports it, but NVIDIA has also released OpenCL drivers. So for me, in a long run, OpenCL sounds more interesting. Don't know if there will be significant performance difference between these two.

So Open CL ( boy there's a lot of topics in this topic) Again from my limited understanding is a cross platform language which can take advantage of the power of both the CPU and the GPU, is this what makes it attractive?

remus

The name GPU is a bit of a misnomer in that sense. Its best to just forget the 'graphics' part of the name and think of it as lots of small processors. Each individual processor is very slow compared to a CPU but because theres lots of them the total amount of calculations that can be performed is greater than that of a CPU, thus giving you the rendering performance we're seeing with the current crop of GPU based unbiased renderers.

The actual work being done by the GPU and the CPU is the same, so if you leave the CPU chugging away to do as many calculations as the GPU it will produce a practically identical result to that of the GPU.

dale

And time is money I guess.

remus

Perhaps an analogy would help: imagine you need to peel a billion potatoes. You could either use a couple of super-duper-peelomatics that can peel 10000 potatoes a second each or you could buy a 100,000 cheapo-peels that can peel 10 potatoes a second each.

With 2 super-duper-peelomatics it would take 50,000 seconds to peel all the potatoes whereas with our cheapo-peels it would only take 1000 seconds.

So although the throughput of each cheapo-peel is far less than that of the peelomatic because theres loads of them working together the end result is a lot faster. All rather communist.

This is a similar idea to the paintball thing jamie and adam demonstrated.

notareal

@dale said:

Maybe this is at the base of what I am trying to understand in how CPU, and GPU processing differs....In other words even with a slower CPU based machine you would still be able to produce a rendering of equal quality with those of a GPU based machine, if you are willing to let it cook for a longer time period? If this is the case then about how much longer?

They won't differ in quality, only in speed, if same rendering algorithm is used. At the moment far mode advanced rendering algorithms are implemented on CPU and as GPUs are at the moment some what memory limited - but that will change. So in the end, there will only be a speed difference.

CUDA or OpenCL matter mostly to the developer. They wont affect to rendering quality. Well... CUDA is only for nvidia, so it might affect on the end user, if he has no "right" hardware.

dale

@remus said:

The name GPU is a bit of a misnomer in that sense. Its best to just forget the 'graphics' part of the name and think of it as lots of small processors. Each individual processor is very slow compared to a CPU but because theres lots of them the total amount of calculations that can be performed is greater than that of a CPU, thus giving you the rendering performance we're seeing with the current crop of GPU based unbiased renderers.

The actual work being done by the GPU and the CPU is the same, so if you leave the CPU chugging away to do as many calculations as the GPU it will produce a practically identical result to that of the GPU.

Thank you that even puts the paintball example in a better light.

But you've really done it now, you said the word "Unbiased"
I know I'm all over the map here, but it appears beat just to follow this discussion to wherever it goes.

You will have to excuse me, I have been doing a lot of reading up on the subject of rendering, and a little knowledge is a dangerous thing (Dad used to say). So once again as I understand it "biased" rendering, the meaning is quite literal, in that the algorithm places a predetermined limitation(bias) on the process, mostly to preserve processing power and time. This then would mean that an unbiased renderer would place no limitations on the paths it takes to solve the equation?

honoluludesktop

Not sure if this was addressed, but 64 bit means more addressable memory, but speed is a function of the bus width? Doesn't dedicated graphics memory still need to be addressed by the computer, thus the OS. I read elsewhere that if you have 2G. on a graphic card, that this amount of memory addresses are not be available to the CPU, a big chunk of the memory in a 32 bit system. Is that right, since the graphics card still needs to run conventional programs? Is the CPU slower as in clock cycles?

I remember when the computer bus was ISA(?), and special graphic cards were made so that Cad programs could display faster. Isn't this the same thing? As the cost of multi-core comes down, won't this kind of card have less value?

remus

Your interpretation of biased Vs. unbiased is essentially correct, although to be pedantic even unbiased render engines make a few very basic assumptions. For example, in reality, light bounces around for ages reflecting of loads of things. if this behaviour was modelled exactly your render would take months to complete and look the same as a render from the current crop of unbiased renderers. To combat this unbiased renderers have a parameter called 'max number of ray bounces' which limits the number of bounces a ray goes through before its terminated, the higher you set this value the less biased your render is, although obviously it will take longer to render. There are more examples but the above is the one i remember.

About asking questions, im very happy to help as it helps me organise all the ideas in my head which i havent had to do until now.

dale

@dale said:

@remus said:

With regards to why path tracing is used for GPU based renderers (at the moment), its to do with the fact that each path can be calculated independently of the others, thus each thread on the GPU can be run separately from the other and does not need to wait for results form other threads to continue running.

Are any rendering programs equipped to cut the calculation time involved in rendering by essentially saving a base photon path trace, and only alter what is necessary in a scene if you for instance alter a material?

I am jumping back to this right now, because I was searching for a bookmark on a thread at the Kerkythea Forum which discussed the above, since I also take written notes I found them and In them I made reference to being able to "lock the photon map" by changing settings in the "Irradiance Estimators", confessing that at the moment this is over my head, but I remember thinking that it could be useful someday. If I find the link I'll post it out of interest.

tim

@honoluludesktop said:

Not sure if this was addressed, but 64 bit means more addressable memory, but speed is a function of the bus width?

A 64 bit word cpu can easily use all 64 bits to address memory if the system is set up to allow all 64 bits on the address bus. More typically I suspect that rather less address range it provided on current actual machines since 16 exabytes of memory is still a tad unwieldy. The memory data bus can be of various widths. It would be surprising to find a 32bit data bus in a machine with a 64 bit CPU, but it could be made to function. In the old days it was common to have an 8bit wide data bus even with a 16 (or even 32) bit cpu. Some machines have 128bit data busses since that allows filling the CPU caches even faster.
Key thought - they're related but not dependant on each other. You could build an 8bit CPU that was able to use 48bit addresses and had a 256bit wide data bus. You just wouldn't bother these days.

@honoluludesktop said:

Doesn't dedicated graphics memory still need to be addressed by the computer, thus the OS. I read elsewhere that if you have 2G. on a graphic card, that this amount of memory addresses are not be available to the CPU, a big chunk of the memory in a 32 bit system. Is that right, since the graphics card still needs to run conventional programs? Is the CPU slower as in clock cycles?

Not in any system I am familiar with. Dedicated graphics card memory isn't normally part of the CPUs address space at all but is used for texture bitmaps, drawing lists, workspace for the GPU algorithms etc. You'd typically find the icons and backdrop for your screen in there too. Shared graphics memory (often referred to as 'integrated') does exist in the CPU address space and certainly results in a slower overall system. It's cheaper and that is why it gets used. Of course, if your system is fast, integrated graphics may well not result in any noticable slowdown and be considered adequate. The hardware world is forever cycling between separate and integrated graphics systems as the technology for each changes and the balance moves one way or the other.

@honoluludesktop said:

I remember when the computer bus was ISA(?), and special graphic cards were made so that Cad programs could display faster. Isn't this the same thing? As the cost of multi-core comes down, won't this kind of card have less value?

That's pretty much the typical state now except that the old ISA bus went away decades ago and now it's what, PCIe or PCIx or... whatever.

Multi-core is almost certainly how it will have to be for the future. Although Moore's Law is in no current danger of failing we have reached a point where making the cycle time of CPUs faster is a real problem. At about the place where we are now you find that the power density - the amount of heat released per square millimetre in this case - gets to where there is no practical cooling fluid that can take away that heat fast enough. So you have to think of another way to use your increasing number of transistors (see Moore's Law above) to improve overall performance. Large caches have been a typical use for ages since main memory simply isn't anywhere near fast enough to supply all the data modern CPU can chew on per nano-second. Dual/quad core CPUs are a fairly cheap and nasty way of improving things a little because they still only have the single main memory and bus. You rely utterly on the caches to keep it all running.

Ironically, back in the dawn of time (well, 1980-ish) there was a serious attempt to solve this problem in the days of 1MHz being the high performance mark. The Transputer was a cpu that was intended to have a little memory connected but a lot of other Transputers to talk to. I had a 128-cpu machine in my office back when I was an IBM Research fellow working on interactive solid CAD system in '84. These days I suspect you could fit 128 Transputer-like CPUs and a few MB of memory for each one onto a single chip. The practical problems were two-fold -
a) parallel programming is hard and programmers are lazy. So everyone wanted to avoid thinking about it.
b) Intel had their 8086 system and a lot of money and wanted to crush everyone else. They threw many, many billions at the problems and made single CPUs run faster and faster.
Result - the Transputer went away and intel won the war for the desktop CPU. Now we need to go back to the issue of massively parallel systems. It's what, 5 years since 3GHz cpus appeared and we're still stuck there. Dual cores and even 8-core machines simply haven't improved performance that much - we would hope for about 8 fold improvement in that time. We arguably should have seen a change to hundred+ core machines and 20+ fold improvement in that time.

dale

Thanks Tim. I will ask you then given what you have said does OpenCL seem like a reasonable or at least short term answer? (in the context of rendering anyway) Using both the power of the GPU and CPU, or am I misunderstanding (which is highly possible) this.

honoluludesktop

So in terms of real time renders, the race is not yet on between multi-core CPU, and dedicated GPUs? And, with a GPU (to render), if my databases are not big, unless I need to run more programs at once, except for the bus size, 32 bits may (not?) do for a couple more years? (I have a unmentionable reason to ask:-) I wondered why we are stuck at 3Ghz. Don't know Moore's, read about heat but erroneously thought it had something to do with moving information faster then light.

dale

Interesting. I just read up a bit on Moore's law, and for anyone whom like me who doesn't know anything about it, it is an observation made by Gordon Moore, co-founder if Intel in 1965 that the number of transistors per square inch of integrated circuit had doubled every two years since the integrated circuit was invented. His prediction was that this would continue. (He has since apparently altered it to Every 18 months).
What I found most interesting, beside the heat limitation is there appears to be two other limits. Please correct me if I'm wrong.
The way that these transistors and it seems even the way multi cores work are in parallel. So the workload is split in half. However in one of the articles I read the author pointed out that most computer operations are sequential, and I think this really applies to rendering algorithms. His example is of a spreadsheet, in that a cell in a spreadsheet relies on the value of another cell which in turn relies on the value of another cell.
Using this as an analogy for a a shot photon in rendering, in that, the result of the photons action relies on the surface characteristics of the object it strikes, this quickly becomes sequential, and the calculations must be exponentially more complex.
His assertion is that the only way to increase this is to speed up the calculation.
If to do this requires more transistors, and there is a limitation at the moment because of the amount of heat produced, then this seems to really shackle a CPU speed increase.
The second limit is that in order to take advantage of any computing power increase in speed, the software must be written in a way that it tells the processors what tasks they will handle, without this the computing power just isn't utilized.

tim

@dale said:

... does OpenCL seem like a reasonable or at least short term answer? (in the context of rendering anyway) Using both the power of the GPU and CPU, or am I misunderstanding (which is highly possible) this.

Probably OpenCL has value; mostly because making it possible for people outside the fairly small number that normally work on the software for graphics card drivers to actually do things with the GPUs at least offers a chance of some exciting new uses and algorithms to appear. Whether it will last is another matter - I have a suspicion that if and when massively parallel systems become the norm we will see GPUs pretty much disappear and just use a few (dozen/hundred) of the pool of CPUs to do the rendering work. Then again, maybe some of the brain structure research will lead to us building systems with thousands of CPUs of dozens of different types each specialised to different jobs. Why not imagine processors specialised for doing sound synthesis (we have DSPs that do some of that now), for polygon to pixel conversion, for floating point work, for string manipulation, for data moving, for watching user input signals and so on. We're used to thinking of a CPU being able to do all of that because it is cheaper with current designs and production technology. It probably won't always be like that.