CPU vs GPU Discussion

dale

The name GPU is a bit of a misnomer in that sense. Its best to just forget the 'graphics' part of the name and think of it as lots of small processors. Each individual processor is very slow compared to a CPU but because theres lots of them the total amount of calculations that can be performed is greater than that of a CPU, thus giving you the rendering performance we're seeing with the current crop of GPU based unbiased renderers.

The actual work being done by the GPU and the CPU is the same, so if you leave the CPU chugging away to do as many calculations as the GPU it will produce a practically identical result to that of the GPU.

Thank you that even puts the paintball example in a better light.

But you've really done it now, you said the word "Unbiased"
I know I'm all over the map here, but it appears beat just to follow this discussion to wherever it goes.

You will have to excuse me, I have been doing a lot of reading up on the subject of rendering, and a little knowledge is a dangerous thing (Dad used to say). So once again as I understand it "biased" rendering, the meaning is quite literal, in that the algorithm places a predetermined limitation(bias) on the process, mostly to preserve processing power and time. This then would mean that an unbiased renderer would place no limitations on the paths it takes to solve the equation?

honoluludesktop

Not sure if this was addressed, but 64 bit means more addressable memory, but speed is a function of the bus width? Doesn't dedicated graphics memory still need to be addressed by the computer, thus the OS. I read elsewhere that if you have 2G. on a graphic card, that this amount of memory addresses are not be available to the CPU, a big chunk of the memory in a 32 bit system. Is that right, since the graphics card still needs to run conventional programs? Is the CPU slower as in clock cycles?

I remember when the computer bus was ISA(?), and special graphic cards were made so that Cad programs could display faster. Isn't this the same thing? As the cost of multi-core comes down, won't this kind of card have less value?

remus

Your interpretation of biased Vs. unbiased is essentially correct, although to be pedantic even unbiased render engines make a few very basic assumptions. For example, in reality, light bounces around for ages reflecting of loads of things. if this behaviour was modelled exactly your render would take months to complete and look the same as a render from the current crop of unbiased renderers. To combat this unbiased renderers have a parameter called 'max number of ray bounces' which limits the number of bounces a ray goes through before its terminated, the higher you set this value the less biased your render is, although obviously it will take longer to render. There are more examples but the above is the one i remember.

About asking questions, im very happy to help as it helps me organise all the ideas in my head which i havent had to do until now.

dale

@dale said:

@remus said:

With regards to why path tracing is used for GPU based renderers (at the moment), its to do with the fact that each path can be calculated independently of the others, thus each thread on the GPU can be run separately from the other and does not need to wait for results form other threads to continue running.

Are any rendering programs equipped to cut the calculation time involved in rendering by essentially saving a base photon path trace, and only alter what is necessary in a scene if you for instance alter a material?

I am jumping back to this right now, because I was searching for a bookmark on a thread at the Kerkythea Forum which discussed the above, since I also take written notes I found them and In them I made reference to being able to "lock the photon map" by changing settings in the "Irradiance Estimators", confessing that at the moment this is over my head, but I remember thinking that it could be useful someday. If I find the link I'll post it out of interest.

tim

@honoluludesktop said:

Not sure if this was addressed, but 64 bit means more addressable memory, but speed is a function of the bus width?

A 64 bit word cpu can easily use all 64 bits to address memory if the system is set up to allow all 64 bits on the address bus. More typically I suspect that rather less address range it provided on current actual machines since 16 exabytes of memory is still a tad unwieldy. The memory data bus can be of various widths. It would be surprising to find a 32bit data bus in a machine with a 64 bit CPU, but it could be made to function. In the old days it was common to have an 8bit wide data bus even with a 16 (or even 32) bit cpu. Some machines have 128bit data busses since that allows filling the CPU caches even faster.
Key thought - they're related but not dependant on each other. You could build an 8bit CPU that was able to use 48bit addresses and had a 256bit wide data bus. You just wouldn't bother these days.

@honoluludesktop said:

Doesn't dedicated graphics memory still need to be addressed by the computer, thus the OS. I read elsewhere that if you have 2G. on a graphic card, that this amount of memory addresses are not be available to the CPU, a big chunk of the memory in a 32 bit system. Is that right, since the graphics card still needs to run conventional programs? Is the CPU slower as in clock cycles?

Not in any system I am familiar with. Dedicated graphics card memory isn't normally part of the CPUs address space at all but is used for texture bitmaps, drawing lists, workspace for the GPU algorithms etc. You'd typically find the icons and backdrop for your screen in there too. Shared graphics memory (often referred to as 'integrated') does exist in the CPU address space and certainly results in a slower overall system. It's cheaper and that is why it gets used. Of course, if your system is fast, integrated graphics may well not result in any noticable slowdown and be considered adequate. The hardware world is forever cycling between separate and integrated graphics systems as the technology for each changes and the balance moves one way or the other.

@honoluludesktop said:

I remember when the computer bus was ISA(?), and special graphic cards were made so that Cad programs could display faster. Isn't this the same thing? As the cost of multi-core comes down, won't this kind of card have less value?

That's pretty much the typical state now except that the old ISA bus went away decades ago and now it's what, PCIe or PCIx or... whatever.

Multi-core is almost certainly how it will have to be for the future. Although Moore's Law is in no current danger of failing we have reached a point where making the cycle time of CPUs faster is a real problem. At about the place where we are now you find that the power density - the amount of heat released per square millimetre in this case - gets to where there is no practical cooling fluid that can take away that heat fast enough. So you have to think of another way to use your increasing number of transistors (see Moore's Law above) to improve overall performance. Large caches have been a typical use for ages since main memory simply isn't anywhere near fast enough to supply all the data modern CPU can chew on per nano-second. Dual/quad core CPUs are a fairly cheap and nasty way of improving things a little because they still only have the single main memory and bus. You rely utterly on the caches to keep it all running.

Ironically, back in the dawn of time (well, 1980-ish) there was a serious attempt to solve this problem in the days of 1MHz being the high performance mark. The Transputer was a cpu that was intended to have a little memory connected but a lot of other Transputers to talk to. I had a 128-cpu machine in my office back when I was an IBM Research fellow working on interactive solid CAD system in '84. These days I suspect you could fit 128 Transputer-like CPUs and a few MB of memory for each one onto a single chip. The practical problems were two-fold -
a) parallel programming is hard and programmers are lazy. So everyone wanted to avoid thinking about it.
b) Intel had their 8086 system and a lot of money and wanted to crush everyone else. They threw many, many billions at the problems and made single CPUs run faster and faster.
Result - the Transputer went away and intel won the war for the desktop CPU. Now we need to go back to the issue of massively parallel systems. It's what, 5 years since 3GHz cpus appeared and we're still stuck there. Dual cores and even 8-core machines simply haven't improved performance that much - we would hope for about 8 fold improvement in that time. We arguably should have seen a change to hundred+ core machines and 20+ fold improvement in that time.

dale

Thanks Tim. I will ask you then given what you have said does OpenCL seem like a reasonable or at least short term answer? (in the context of rendering anyway) Using both the power of the GPU and CPU, or am I misunderstanding (which is highly possible) this.

honoluludesktop

So in terms of real time renders, the race is not yet on between multi-core CPU, and dedicated GPUs? And, with a GPU (to render), if my databases are not big, unless I need to run more programs at once, except for the bus size, 32 bits may (not?) do for a couple more years? (I have a unmentionable reason to ask:-) I wondered why we are stuck at 3Ghz. Don't know Moore's, read about heat but erroneously thought it had something to do with moving information faster then light.

dale

Interesting. I just read up a bit on Moore's law, and for anyone whom like me who doesn't know anything about it, it is an observation made by Gordon Moore, co-founder if Intel in 1965 that the number of transistors per square inch of integrated circuit had doubled every two years since the integrated circuit was invented. His prediction was that this would continue. (He has since apparently altered it to Every 18 months).
What I found most interesting, beside the heat limitation is there appears to be two other limits. Please correct me if I'm wrong.
The way that these transistors and it seems even the way multi cores work are in parallel. So the workload is split in half. However in one of the articles I read the author pointed out that most computer operations are sequential, and I think this really applies to rendering algorithms. His example is of a spreadsheet, in that a cell in a spreadsheet relies on the value of another cell which in turn relies on the value of another cell.
Using this as an analogy for a a shot photon in rendering, in that, the result of the photons action relies on the surface characteristics of the object it strikes, this quickly becomes sequential, and the calculations must be exponentially more complex.
His assertion is that the only way to increase this is to speed up the calculation.
If to do this requires more transistors, and there is a limitation at the moment because of the amount of heat produced, then this seems to really shackle a CPU speed increase.
The second limit is that in order to take advantage of any computing power increase in speed, the software must be written in a way that it tells the processors what tasks they will handle, without this the computing power just isn't utilized.

tim

@dale said:

... does OpenCL seem like a reasonable or at least short term answer? (in the context of rendering anyway) Using both the power of the GPU and CPU, or am I misunderstanding (which is highly possible) this.

Probably OpenCL has value; mostly because making it possible for people outside the fairly small number that normally work on the software for graphics card drivers to actually do things with the GPUs at least offers a chance of some exciting new uses and algorithms to appear. Whether it will last is another matter - I have a suspicion that if and when massively parallel systems become the norm we will see GPUs pretty much disappear and just use a few (dozen/hundred) of the pool of CPUs to do the rendering work. Then again, maybe some of the brain structure research will lead to us building systems with thousands of CPUs of dozens of different types each specialised to different jobs. Why not imagine processors specialised for doing sound synthesis (we have DSPs that do some of that now), for polygon to pixel conversion, for floating point work, for string manipulation, for data moving, for watching user input signals and so on. We're used to thinking of a CPU being able to do all of that because it is cheaper with current designs and production technology. It probably won't always be like that.

tim

@dale said:

Interesting. I just read up a bit on Moore's law, and for anyone whom like me who doesn't know anything about it, it is an observation made by Gordon Moore, co-founder if Intel in 1965 that the number of transistors per square inch of integrated circuit had doubled every two years since the integrated circuit was invented. His prediction was that this would continue. (He has since apparently altered it to Every 18 months).

That's the guy. Interesting how a simple observation and hypothesis that it might continue for a while turns into a law that the whole industry works frantically to keep going. For many years people assumed that it meant that computer speed would double each time. Not necessarily.

@dale said:

What I found most interesting, beside the heat limitation is there appears to be two other limits. Please correct me if I'm wrong.
The way that these transistors and it seems even the way multi cores work are in parallel. So the workload is split in half.

Not completely correct but close enough unless you really want to study computer architecture and implementation technology. Which you probably don't.

@dale said:

However in one of the articles I read the author pointed out that most computer operations are sequential, and I think this really applies to rendering algorithms. His example is of a spreadsheet, in that a cell in a spreadsheet relies on the value of another cell which in turn relies on the value of another cell.
Using this as an analogy for a a shot photon in rendering, in that, the result of the photons action relies on the surface characteristics of the object it strikes, this quickly becomes sequential, and the calculations must be exponentially more complex.

Yup. Most algorithms are sequential and that's partly innate and partly a result of a long history of single, sequential CPUs. Why would people spend time on parallel algorithms if there are no parallel machines to run them on! It's also damn hard. There are lots of problems with interlocks - I need this result before I can do this bit, just like a very big and complex building project. Imagine planning the build of an entire nation in a single project. Every detail, every purchase, every single nut and bolt and dab of loctite.

dale

So to kind of summarize, from a rendering Archvis perspective: We will be relying on what the individual software takes advantage of, or the "trend" their developers buy in on, CPU or GPU or a Hybrid. This will probably be dependent on the development time and money the hardware developers throw at their systems, as the software developers won't expend a lot of their time and energy until the hardware is mature enough to warrant their time.
Since in terms of rendering the real difference is in the amount of time it takes to solve the algorithm, because they both will eventually arrive at the same solution, the real advantage of a GPU based system would come from being able to preview and rework your scenes and lighting for better results in a timely manner.( perhaps I'm simplifying here, as I have a feeling there may be other advantages if you are animating)
And as Tim pointed out, since the visualization community represents only a small percentage of computer use the development won't be spurred on by the revenues we will provide.
So there is really no way to anticipate how to prepare for the future?

tim

@dale said:

So there is really no way to anticipate how to prepare for the future?

Perfect straight line - there is an old aphorism relating to this that was coined by an old friend of mine by the name of ALan Kay (look him up - major, major, contributor to the modern world of computing)
"The best way to predict the future is to create it"

And possibly the biggest single driver of the hardware world these days is gaming. Anything that will make games run faster will get researched and developed and sold cheaply. And porn. The entire point of building out a global broadband system was to distribute porn, it's just an inevitable result of any new technology. As soon as paint was invented, porn. Telegraph, porn. Cameras - porn.

So, I think that apps like SU will benefit massively from these two drives since games need ways to create the worlds and characters in them and porn needs... bandwidth (which means storage and memory and speed generally) and display quality and hands free UI interaction

dale

My sister sent me this this morning, the next generation keyboard, I guess she's wrong about the music

Picture 80.jpg

dale

Looked up Alan Kay. I guess among many other things, he's one of the Xerox team we can thank for the Mac (and in a sense because of the success of that graphical interface, Windows also). I ran across something, I believe it was in "Wired" on his idea that potentially coding can be reduced from thousand of lines to a few and achieve the same results, and his interest in computers that learn. Not only is he a significant contributor the the computing field but a real visionary. You run in a good crowd. (think he'd be interested in a little Ruby work on the side )

tim

@dale said:

Looked up Alan Kay. I guess among many other things, he's one of the Xerox team we can thank for the Mac (and in a sense because of the success of that graphical interface, Windows also).

Oh and so much more. The whole field of Object Oriented Programming for example (yes I know about Simula and List, but Smalltalk was the first fully OOP system and is still the only one good enough to be worth the effort of critiquing) and a non-trivial part of the entire portable computing idiom (his 1979 PhD thesis covered the design, building, programming and documenting of a portable personal computer. In 1979)

@dale said:

I ran across something, I believe it was in "Wired" on his idea that potentially coding can be reduced from thousand of lines to a few and achieve the same results, and his interest in computers that learn. Not only is he a significant contributor the the computing field but a real visionary.

That's part of the power of OOP, when done right. You only program the parts that are different to what you already have code for.

@dale said:

You run in a good crowd. (think he'd be interested in a little Ruby work on the side )

Don't think so; Ruby has it's place, but it isn't in the same league as Smalltalk. Not even close. Take a look at http://www.vpri.org/index.html for what he's up to right now.

dale

@dale said:

You run in a good crowd. (think he'd be interested in a little Ruby work on the side )

Don't think so; Ruby has it's place, but it isn't in the same league as Smalltalk. Not even close. Take a look at http://www.vpri.org/index.html for what he's up to right now.[/quote]

This man really uses his talent well.
Interesting group of co-horts. If I'm not mistaken the first name on the Board of Advisors, John Perry Barlow is an old Timothy Leary apostle who ended up as a lyricist for the Grateful Dead.

dale

A really inspirational read. This paper written by Alan Kay entitled " The Real Computer Revolution Hasn't Happened Yet".
Yes it is a little off topic, but what the hell.

m2007007a_revolution.pdf