Skip navigation

Monthly Archives: March 2015

A couple of years into my work on OpenGL I was given responsibility for running the OpenGL team as the dev lead. The team – like many of Microsoft’s development efforts back then – was small enough so that I could continue to have a hands-on role in coding while being the team’s manager. Although I knew that I would have less time to code, I was comforted knowing that I wouldn’t have to leave it behind completely. I was excited to have the opportunity to lead the OpenGL development effort but also apprehensive about managing people, especially people who had been my peers.

By this time, I had turned my focus to enabling graphics hardware to accelerate OpenGL performance. Although OpenGL on Windows already had an available hardware acceleration model – the Installable Client Driver, or ICD – it required the hardware vendor to license the OpenGL technology from SGI. In addition to that obstacle, developing an ICD was also complex; it required implementing the entire OpenGL API stack rather than just the parts that the hardware could accelerate. The specific method for accelerating the parts that made sense for some specific piece of hardware was entirely up to the vendor to design and implement from scratch. There was no common template or protocol or framework to follow. This approach provided maximum flexibility but at a very high implementation and maintenance cost. An ICD vendor wasn’t simply maintaining a device driver; they were maintaining an entirely separate implementation of all of OpenGL. OpenGL ICDs and the Windows OS kernel had a mechanism to exchange chunks of generic data which allowed the hardware portion of the ICD to communicate with the rest of the ICD implementation. Again, this was very flexible but also meant that every vendor had to come up with their own custom approach to structuring the communications between client mode (where regular programs ran) and kernel mode (the trusted execution environment where the OS and hardware-level drivers executed).

At the time, ICDs were fine for their intended target of high-end workstations. But my passion was to continue to push 3D graphics into the mainstream. A number of hardware vendors had become interested in providing dedicated 3D-acceleration hardware at lower cost, and given Moore’s Law and the volume-based economics of the PC business, hardware-accelerated 3D graphics was well positioned for mass adoption. To help move things forward, I wanted to make it much easier for hardware vendors who were new to 3D to bring their products to market. The difficulty and investment required to write an ICD driver was a significant obstacle to an emerging ecosystem of commodity 3D hardware.

Having done a bunch of driver work in the past, I set out to architect an OpenGL driver model that would provide a standardized interface to lower-cost hardware and remove as much of the software complexity of ICDs as possible. I focused on exposing only the functionality that lower-cost hardware could reasonably support, namely, the rasterizing of 3D primitives near the bottom of a complex 3D graphics pipeline. For example, 3D transformations and lighting operations would be done on behalf of the driver; the driver just had to render the computed 3D primitives on the screen. This division of labor could provide massive increases in overall graphics performance.

I called the driver model the Mini Client Driver, or “MCD” since it was similar in flow to an ICD, but the vendor only had to implement the rendering-specific part of the OpenGL stack.

S3 VirgeI wrote a corresponding sample driver (if I remember correctly, it used S3’s Virge hardware), and with the help of the OpenGL team, got the sample code and the corresponding MCD documentation into the next releases of Windows NT DDK (Driver Development Kit).

It’s worth making a few comments on driver development in general. Writing driver code can be one of the most satisfying and frustrating experiences possible as a developer. It’s incredibly exciting to have a new driver you’re building to actually do something useful with a piece of hardware for the first time (for example, rendering a test triangle on the screen). But drivers run as part of the operating system, so bugs and driver crashes can take down the whole OS. And with graphics drivers in particular, you always risk screwing up the thing you rely on the most to program and interact with the machine – the display. Add to this the fact that hardware doesn’t always work as documented, and that it’s very easy to miss setting the needed bit on some register or to have an off-by-one or some other error send the hardware into oblivion.

With enough persistence, lots of reboots, and the occasional debug print when all else fails, a robust driver will eventually emerge. And with any luck, you will never hear about your device driver because the only time that you do is when it’s NOT working. As with so many jobs in technology, writing drivers can be a thankless and invisible job despite being critical to making the technology we take for granted actually work.

Back to the main story: having released MCD for Windows NT, any graphics card vendor could now quickly and relatively easily implement OpenGL hardware acceleration using a standard driver model. Since the driver model itself was largely OS-agnostic, I then shifted our focus to providing a Windows 95 version of ICD to satisfy both growing developer and hardware vendor interest in OpenGL and 3D graphics. Windows NT had a growing but still relatively small share of the market compared to Windows 95, and I wanted to see OpenGL fully enabled on both operating systems. We engaged the hardware community around making OpenGL MCD drivers available on Windows 95, got the Windows 95 version of the code up and running quickly, and everyone was expecting the DDK update to be released very soon.

And then, I was asked to do something that would change everything.


My wife bas been pestering me to write a post or two about some of my early years at Microsoft. Thinking about what to write took me back to a time when my entire focus at work was writing the best code possible. In those days, I would even sometimes dream about code. This post talks about software implementation details and may leave some readers behind in a few spots. Bear with me; future posts will be less technical.

For many years, one of my main pursuits as a developer was software-based graphics acceleration. This meant using the CPU to render graphics on the screen as fast as possible by carefully tuning software and algorithms. The goal was to extract every last bit performance from the CPU. One of the key reasons I joined Microsoft in 1993 was for the opportunity to work for Mike Abrash who even then was renowned for mastery of CPU-based performance optimization, and he was aware of some of my work. He was running the Windows NT GDI (Graphics Device Interface) team at the time, and I had come on board to focus on optimizing GDI performance in the first version of Windows NT (somewhat confusingly named “Windows NT 3.1”).

GDI was a 2D graphics framework doing the important work of rendering fonts, lines, rectangles, and all of the primitives that rendered the Windows desktop UI and programs written for it. But being strictly 2D-based, GDI was never going to include 3D graphics, and 3D was as fascinating to me as VR is to some people today (including Mike, who is now Oculus’ chief scientist).

One of NT’s goals was to take a share of the workstation market, and Microsoft had licensed the OpenGL graphics API from Silicon Graphics (SGI). SGI’s business was based on selling expensive high-end hardware, and they had put little effort into the performance of their reference software implementation of the API. Without very expensive hardware, OpenGL was pretty useless.

I had fallen in love with 3D graphics in the late 80’s, and I joined the recently-formed OpenGL team soon after Windows NT shipped. Even though speeding up OpenGL wasn’t part of my job when I joined the team, I was excited to be part of effort that would broaden access to 3D graphics capabilities. My dream was to have 3D graphics be standard on every PC. My job was to help integrate OpenGL into the Windows operating system.

But official responsibilities aside, I couldn’t resist the temptation and challenge of speeding up OpenGL to make it useful without requiring very expensive hardware. I immediately started tinkering with the OpenGL stack when I had free time. This was the mid-90’s – the era of Intel’s new Pentium processor line. This family of CPUs allowed overlapping of floating-point and integer instructions – a primitive form of parallel processing. A floating-point instruction could be started, and then integer instructions could continue to be executed while the much lengthier floating point command was processed. This mixing of floating-point and integer operations was perfect for speeding up 3D rendering operations which could be broken into floating-point setup and fixed-point, integer-based scanline fragment processing.

A complicating factor in speeding up OpenGL was the complexity of its state machine. There were many possible combinations of rendering modes based on attributes such as color depth, z-buffering, shading model, transparency, texture-mapping, etc. An early (Windows 3.1 or Windows 95) solution for optimizing GDI rendering was to build the rasterizer on the fly on the stack based on the GDI state (I believe this was all or mostly Todd Laney’s handiwork). But I was working on Windows NT, and such clever hackery was not allowed in a next-generation operating system. After considering my options, I determined that my best bet was to pre-compile a set of renderers that represented the most common cases for rendering (for example, Gouraud-shaded, 16-bit color, 16-bit z-buffered triangles). I did this by building rendering functions that consisted of groupings of macro statements that themselves were chunks of hand-crafted inline assembly code. These macros could then be grouped together to perform unique sets of rendering operations. In order for this approach to work, I constructed a framework for the chunks of assembly code in the macro blocks to be able access variables and registers in a common and consistent way so that they could interoperate predictably and efficiently.

This was definitely not how the C language was intended to be used. It wasn’t pretty – I believe I even used “goto” statements out of necessity – but the code was highly effective. The approach also entailed risk because things could go horribly wrong with unanticipated edge cases. I remember one embarrassing bug that in certain cases failed to return the floating-point control register to its previous state which effected floating-point operations in the rest of the operating system. I quickly found and fixed the issue but it was a reminder that there was no safety net with what I was doing.

It’s interesting how thinking about past work jogs the mind. In the middle of writing this post, I had a dream about how I may have implemented dithering. Dithering is an old technique going back to the print business that allows gradations in tone and color to appear smoother by breaking up transitions using patterns of dots or pixels at different densities. For example, an area halfway between one color and another would have half the pixels in that area set to one color, and the remaining half set to the other color. Today, even our phones are capable of producing millions of colors eliminating the need for such techniques, but in the mid-90’s, the capabilities of most PCs were far more limited. In my dream, I implemented dithering look-up-tables (LUTs) for red, green, and blue values so that I could construct the right 15 or 16-bit RGB value using three highly-cached indexed memory operations and two OR instructions. I probably did something along those lines, but who knows…I’d love to have access to that old code (of no use to anyone at this point) just to remind myself how I did what I did.

My initial performance improvements were compelling enough to justify making the software acceleration work a full-time endeavor. I added functionality over time culminating with real-time texture-mapping using quadratic subdivision. Even though it was still part of a fully compliant general-purpose OpenGL implementation, textured rendering throughput got reasonably close to that of Doom, the texture-mapping speed champion at the time. In fact, my software-accelerated pipeline became competitive with high-end hardware graphics cards and beat them in some instances (anti-aliased lines comes to mind).

I had taken rendering performance from being measured in seconds per frame to dozens of frames per second; many operations were over 100 times faster than the original versions. Ah, the power of highly-tuned, efficient coding! I often worry that Moore’s Law is being buried under so many layers of frameworks, objects, interpreters, and interfaces that it’s hard to tell what the hardware is actually doing at the bottom of the pile multiple layers of indirection. Then again, even CPUs now hide some of their internal operation with sophisticated out-of-order instruction execution engines. Sadly, the last time I tried to out-optimize a modern C compiler with an integer-only routine, the results were a draw.

Pipes screen shotmaze2Our OpenGL implementation was incorporated into both Windows NT and Windows 9x code bases and was starting to get significant traction. Being part of the operating system and eliminating the need for hardware acceleration meant that an OpenGL application could now target a very wide audience. As a means of promoting our OpenGL’s capabilities, I wrote the first set of 3D screen savers (“Flying Objects”) which proved popular and inspired other people on the team to write their own as well (such as “Pipes” and “Maze” pictured here). For many years afterward, I would get a chuckle when I saw one of our screen savers running in the background of a TV or movie set. Our efforts had taken what had been an expensive workstation technology and made it readily available on millions of desktops.

Despite the effort’s success, the writing was already on the wall for the future of software-accelerated computer graphics. Hardware acceleration and the rise of the GPU were just around the corner.