Game Development Community

Plan for Brett Fattori

by Brett Fattori · 08/24/2005 (1:19 pm) · 8 comments

The development of dRacer progresses. We have a milestones list that continues to get shorter every week. Some new tasks have been added to it, but overall we're getting closer to moving out of pre-production and into an early production phase. Each task on our list is a little chunk of what we think is necessary to grow the foundation into a more solid, core codebase.

What people saw at IGC'04 and what we have today are two very different beasts, internally. Many of the important systems have been rewritten or so heavily modified that they don't resemble their original self. dRacer contains a rewritten physics system which Clark detailed here, it also has major improvements for multiplayer, a dynamic decaling system, and a heavily reworked system for racing management. Most recently, however, I've been working on improving the rendering pipeline for the racetrack.

A little history to start this off... Back in early 2004, I wrote a simple demo to prove to myself that creating a track, similar to one in Wipeout for the Playstation, wouldn't be that difficult. After getting a simple ribbon of polygons, represented by a triangle strip, to render, I was quite pleased. A little while later I added walls, an underside, and the ability to cap sections to make tunnels. Alltogether this wasn't a bad thing, except it had slowed and was becoming troublesome.

The first iteration of speedups on the track was done by segmenting the track into tri-strips of no more than 30 polys. The number 30 wasn't very magical, it was just a number I grabbed out of the air. One should know that the track was (at that time) being rendered as 12 tri-strips, and as one continuous segment. The new method of breaking the track down into segments fixed a number of problems I was having with visibility. The entire track was visible at one time, which meant a lot of faces were being pushed onto the graphics card only to be rejected anyways. So, now I was able to limit what parts of the track were being generated by utilizing the pre-existing visibility tests by breaking the track into smaller chunks. So instead of passing, say, 24,000 polys to the graphics card each frame, I was only passing visible segments consisting of up to 360 polys each.

I had even implemented a simple method of reducing the number of polys that were being passed based on the distance of a segment from the camera. It was a kind of dynamic level of detail for the track. I'm not even sure if that gained me any extra time because I didn't do any profiling at that point. I just guessed that less polys equals less render time. And by passing half as many polys when the segment was half the distance to the world visible limit, and then half as many more polys at 3/4 of the way to the visible distance, I figured it would be huge speedups.

Well, with the current round of speedups, Clark insisted that I come up with some hard numbers to justify the work I was going to do. He guessed (from experience) that changing the glArrayElement calls to a glDrawElements call would speed things up. We were already using arrays of verts, texels, norms, colors, and so on. Changing it from filling an array of these, to passing the arrays themselves, would most certainly be a speed gain. Additionally, I had noted that the current render routines were doing a lot of client state initialization and resetting of that state, only to do it all over again. Maybe there were gains to be made by performing less state switching?

I should let everyone in on a little secret of mine... I'm not a math guy. I get the usage of matrices and vector math, but I couldn't prove it to save my life unfortunately. Clark, on the other hand, is quite adept at mathematics and for that reason he offered to derive the formulas necessary to show what kind of speed gains we would see through our changes and through profiling. I set out to modify the code so I could isolate the two optimizations I was going to perform and built four versions of the engine. One was built with no optimizations, the second was just client state opts, third was just glDrawElements opts, and last was both optimizations. I then recorded a journal so I could use the same setup for each version of the engine. Here is my setup:

Journal: One lap around Black Forest (54.652 sec)
Processor: Pentium 4 3.0Ghz
System Memory: 1GB
Video Card: PCI GEforce 6800 GT 256MB (no overclocking)
Video Drivers: 77.77 WHL
Video Features: No Antialiasing, no anisotropic filtering, no vert sync

The results from the profiler follow. These percentages are their part of the total time recorded in the profiler for all processes being profiled. I was only concerned with the time taken to generate the track.

No optimizations: 6.776%
Client state opts: 6.349%
glDrawElements opts: 4.331%
All optimizations: 4.047%

Now it might not seem to be very much when you consider that the change went from almost 7% of total time to 4% of total time in the profile. However, from this we can determine the overall speedup of rendering from before any changes to after each type of change. Overall, it was a positive gain for each of the steps:

Client state opts: 7% speedup
glDrawElements opts: 37% speedup
All optimizations: 42% speedup

As a side-effect of making it easy to switch state on and off, I split out the calls for this into separate functions. I figured there were maybe even a few more cycles I could squeeze out of the renderer by making those calls inline. Interestingly, I noticed a slowdown rather than a speedup -- and I checked repeatedly. Overall, the speedup dropped to 39% with inline functions.

Overall, I'm happy that we were able to make these kind of gains. Each time that we reduce the amount of time taken by one process, we dump a little time back into a pool to spend on other things. Keeping track of the gains by profiling is also a good thing. A lot of times people rely solely on an FPS gain, which one might not even really see. Being able to qualify the work you've done by showing actual gains is invaluable. And as Clark pointed out, on a machine where rendering is more of a bottleneck, these changes will matter more.

Until the next time...
- Brett


(reposted due to deletion...)

#1
08/24/2005 (2:19 pm)
Wow nice base hardware... I like where it's going. Tired of old machines holding people back.
#2
08/24/2005 (2:34 pm)
You know this post almost made me go insane..

I was *SURE* I had replied to your plan about this before... until I saw that last line.. :) thank god I'm not going completely crazy then!

Nice one Brett!
#3
08/24/2005 (3:07 pm)
Would be funny if the last line was deleted, then you could ponder if there was ever any change, and you are losing it.. Wait a minute, I just spoiled that chance with this post...
#4
08/24/2005 (6:26 pm)
Fantastic plan, Brett. Thank you!
#5
08/24/2005 (8:01 pm)
(FYI someone forgot to update your profile from Associate to Employee so it shows up in the employee blogs)
#6
08/24/2005 (8:04 pm)
Nice plan. Always looking forward to your plans, hearing the progress of dRacer. Keep us updated! (PS - does dRacer have a website?)
#7
08/25/2005 (12:07 am)
@Ben: Was that sarcasm my boy?

@Joshua: I'm not an employee, as far as I know... :-P

- Brett
#8
08/25/2005 (2:05 am)
Quote:I'm not an employee, as far as I know... :-P

guilt by association .. :-D