software | Otto Berkes' weblog

Category Archives: software

OpenGL

March 8, 2015 – 5:11 pm
Posted in computer graphics, computer history, OS, software, software development, technology
Leave a Comment

My wife bas been pestering me to write a post or two about some of my early years at Microsoft. Thinking about what to write took me back to a time when my entire focus at work was writing the best code possible. In those days, I would even sometimes dream about code. This post talks about software implementation details and may leave some readers behind in a few spots. Bear with me; future posts will be less technical.

For many years, one of my main pursuits as a developer was software-based graphics acceleration. This meant using the CPU to render graphics on the screen as fast as possible by carefully tuning software and algorithms. The goal was to extract every last bit performance from the CPU. One of the key reasons I joined Microsoft in 1993 was for the opportunity to work for Mike Abrash who even then was renowned for mastery of CPU-based performance optimization, and he was aware of some of my work. He was running the Windows NT GDI (Graphics Device Interface) team at the time, and I had come on board to focus on optimizing GDI performance in the first version of Windows NT (somewhat confusingly named “Windows NT 3.1”).

GDI was a 2D graphics framework doing the important work of rendering fonts, lines, rectangles, and all of the primitives that rendered the Windows desktop UI and programs written for it. But being strictly 2D-based, GDI was never going to include 3D graphics, and 3D was as fascinating to me as VR is to some people today (including Mike, who is now Oculus’ chief scientist).

One of NT’s goals was to take a share of the workstation market, and Microsoft had licensed the OpenGL graphics API from Silicon Graphics (SGI). SGI’s business was based on selling expensive high-end hardware, and they had put little effort into the performance of their reference software implementation of the API. Without very expensive hardware, OpenGL was pretty useless.

I had fallen in love with 3D graphics in the late 80’s, and I joined the recently-formed OpenGL team soon after Windows NT shipped. Even though speeding up OpenGL wasn’t part of my job when I joined the team, I was excited to be part of effort that would broaden access to 3D graphics capabilities. My dream was to have 3D graphics be standard on every PC. My job was to help integrate OpenGL into the Windows operating system.

But official responsibilities aside, I couldn’t resist the temptation and challenge of speeding up OpenGL to make it useful without requiring very expensive hardware. I immediately started tinkering with the OpenGL stack when I had free time. This was the mid-90’s – the era of Intel’s new Pentium processor line. This family of CPUs allowed overlapping of floating-point and integer instructions – a primitive form of parallel processing. A floating-point instruction could be started, and then integer instructions could continue to be executed while the much lengthier floating point command was processed. This mixing of floating-point and integer operations was perfect for speeding up 3D rendering operations which could be broken into floating-point setup and fixed-point, integer-based scanline fragment processing.

A complicating factor in speeding up OpenGL was the complexity of its state machine. There were many possible combinations of rendering modes based on attributes such as color depth, z-buffering, shading model, transparency, texture-mapping, etc. An early (Windows 3.1 or Windows 95) solution for optimizing GDI rendering was to build the rasterizer on the fly on the stack based on the GDI state (I believe this was all or mostly Todd Laney’s handiwork). But I was working on Windows NT, and such clever hackery was not allowed in a next-generation operating system. After considering my options, I determined that my best bet was to pre-compile a set of renderers that represented the most common cases for rendering (for example, Gouraud-shaded, 16-bit color, 16-bit z-buffered triangles). I did this by building rendering functions that consisted of groupings of macro statements that themselves were chunks of hand-crafted inline assembly code. These macros could then be grouped together to perform unique sets of rendering operations. In order for this approach to work, I constructed a framework for the chunks of assembly code in the macro blocks to be able access variables and registers in a common and consistent way so that they could interoperate predictably and efficiently.

This was definitely not how the C language was intended to be used. It wasn’t pretty – I believe I even used “goto” statements out of necessity – but the code was highly effective. The approach also entailed risk because things could go horribly wrong with unanticipated edge cases. I remember one embarrassing bug that in certain cases failed to return the floating-point control register to its previous state which effected floating-point operations in the rest of the operating system. I quickly found and fixed the issue but it was a reminder that there was no safety net with what I was doing.

It’s interesting how thinking about past work jogs the mind. In the middle of writing this post, I had a dream about how I may have implemented dithering. Dithering is an old technique going back to the print business that allows gradations in tone and color to appear smoother by breaking up transitions using patterns of dots or pixels at different densities. For example, an area halfway between one color and another would have half the pixels in that area set to one color, and the remaining half set to the other color. Today, even our phones are capable of producing millions of colors eliminating the need for such techniques, but in the mid-90’s, the capabilities of most PCs were far more limited. In my dream, I implemented dithering look-up-tables (LUTs) for red, green, and blue values so that I could construct the right 15 or 16-bit RGB value using three highly-cached indexed memory operations and two OR instructions. I probably did something along those lines, but who knows…I’d love to have access to that old code (of no use to anyone at this point) just to remind myself how I did what I did.

My initial performance improvements were compelling enough to justify making the software acceleration work a full-time endeavor. I added functionality over time culminating with real-time texture-mapping using quadratic subdivision. Even though it was still part of a fully compliant general-purpose OpenGL implementation, textured rendering throughput got reasonably close to that of Doom, the texture-mapping speed champion at the time. In fact, my software-accelerated pipeline became competitive with high-end hardware graphics cards and beat them in some instances (anti-aliased lines comes to mind).

I had taken rendering performance from being measured in seconds per frame to dozens of frames per second; many operations were over 100 times faster than the original versions. Ah, the power of highly-tuned, efficient coding! I often worry that Moore’s Law is being buried under so many layers of frameworks, objects, interpreters, and interfaces that it’s hard to tell what the hardware is actually doing at the bottom of the pile multiple layers of indirection. Then again, even CPUs now hide some of their internal operation with sophisticated out-of-order instruction execution engines. Sadly, the last time I tried to out-optimize a modern C compiler with an integer-only routine, the results were a draw.

Pipes screen shot Our OpenGL implementation was incorporated into both Windows NT and Windows 9x code bases and was starting to get significant traction. Being part of the operating system and eliminating the need for hardware acceleration meant that an OpenGL application could now target a very wide audience. As a means of promoting our OpenGL’s capabilities, I wrote the first set of 3D screen savers (“Flying Objects”) which proved popular and inspired other people on the team to write their own as well (such as “Pipes” and “Maze” pictured here). For many years afterward, I would get a chuckle when I saw one of our screen savers running in the background of a TV or movie set. Our efforts had taken what had been an expensive workstation technology and made it readily available on millions of desktops.

Despite the effort’s success, the writing was already on the wall for the future of software-accelerated computer graphics. Hardware acceleration and the rise of the GPU were just around the corner.

Turbo Pascal

March 2, 2014 – 6:33 am
Posted in computer history, software, technology
Tagged Otto Berkes, programming, Turbo Pascal
Leave a Comment

I’ve tried to keep the number of obsolete reference manuals and technical books I have to a minimum over the years. That stuff has been getting outdated at the same rapid rate as the evolution of the technology industry. And with on-line references available for all things technology-related, there is almost no need to keep paper copies of anything.

Despite best intentions, however, possessions tend to accumulate, and when we moved from Seattle to New York a few years ago after being in the same house for close to two decades, it was necessary to do some significant culling. If I had a book or manual that didn’t pass the “will you ever use this again” question, it went into the donation pile. The Friends of the Seattle Public Library organizes book sales every year to support the library, and this made saying goodbye to about thirty boxes of books our family assembled much easier. In this process, I did make allowances for sentimental reasons.

One of the exceptions I made was to hang on to my original copies of Borland Turbo Pascal. It came on a single 5.25” floppy disc along with a paperback reference manual. This is a picture of the original 1.0 and 2.0 versions that I’ve kept:

I credit this product as much as any other for taking me down a path that would lead me to become a professional software developer.

I was an undergraduate at Middlebury College when I bought it. Much of the software development I was doing was self-taught using one of the earliest IMB PC clones available – a Sanyo MBC-555. The Sanyo was not a very good machine and had lots of problems with compatibility, but it was the cheapest PC I could convince my parents to buy.

I had reached the limits of what I could do with Basic, and let’s face it – a real program was a compiled, self-contained executable package (a proper “app” for all the young readers out there), not some Basic file that you had to run through a slow interpreter. Also, I had been involved with assembly-level programming since the beginning of my interest in computers, and wanted a tool that allowed access to BIOS- and hardware-level functionality, even if it meant hand-compiling the opcodes using the 8086 CPU reference manual.

Turbo Pascal would let me do all of this, and at a price that a college student could justify to his parents: $49.95. This was a bargain compared to the high cost of any of the Microsoft tools available then; Microsoft’s Pascal compiler was $400. That was a lot of money back in the early 80’s, and a $400 compiler for a student was out of the question. At the time, I couldn’t have imagined that I would eventually go work for the Microsoft that wanted so much money for a software development tool.

I bought Turbo Pascal mail order, sight unseen. There was no Internet as we know it today, no Amazon, no on-line reviews, and my connectivity consisted of a 300 baud modem (that translates to 0.00029 megabits). Everything I knew about the product was contained in a glossy advertisement in Byte Magazine. I realize how quaint that all sounds, but when I got the package with the small paperback reference manual and the floppy, I was in programming heaven. The compiler was incredibly fast even by today’s standards, and produced real executable programs even if they were limited to the smaller .com variant rather then .exe files. And the fact that Middlebury’s math department taught a few Pascal classes (the college did not have a computer science department back then) was a big plus.

I would remain a big Turbo Pascal fan for a number of years until I fell in love with the C programming language, but that’s another story that also involves a thin paperback that I have also kept to this day.

New Microsoft Store!

November 6, 2010 – 8:56 pm
Posted in consumer devices, OS, software, technology
Leave a Comment

I had the opportunity to be part of the opening of the Microsoft Store at the Mall of America in Minneapolis today. The energy and the excitement was amazing, and the store itself is gorgeous.

The Mall of America has an amusement park in the middle including a roller coaster.

The day before…the store was still under wraps.

At a separate Kinect experience demo in the mall, people were lining up and having fun trying out the controller and some of the new games.

The dancing title was very popular but my colleague and I chose table tennis. I lost on match point.

Next day…ready for the opening. A very large crowd showed up.

Microsoft cut some big checks in support of community groups like this $300,000 gift to the High School Technology Program.

The curtain was finally removed and the store was revealed and officially opened…

…and people who had invested a lot of time in line started to make their way into the store.

The store staff greeted everyone coming in with high-fives.

This view shows the competition directly across from the store. They were looking rather empty today.

Where has all the power gone?

As I sit here writing my first entry, the operating system of my computer has decided that it needs to update itself. It’s been going at it for a while…certainly a lot longer than I had hoped, and as usual has made using my PC almost impossible: the process brings everything to a crawl. I really hate having my time wasted.

Which brings me to my current topic – performance and responsiveness.

I think personal computers were supposed to perform repetitious tasks quickly and to make our lives more wonderful because they were going to save time. There have been many generations of improvements in the PC since I bought my first IBM in 1983, but I feel like I am still waiting. Waiting for the machine to boot up, waiting for the machine to start, waiting for something to update, waiting for technology to be more amazing. When I turn on my PC, it’s a short eternity until the desktop shows up. And once the desktop shows up, it’s just a tease. A dozen other programs and processes then need to load and it’s another yawn of impatience until I can actually do anything.

And after the machine is actually ready, still nothing happens quickly. Simple tasks such as opening a document, or opening a new browser window, or viewing some photographs are a chore. I’m staring at the hourglass, or the screen is simply frozen indefinitely. I click on something, and nothing happens. The hard drive churns away. I am known to ask the obvious question aloud: what is it doing?

Ok, on some level I know what it’s doing: I can look at the list of low-level processes and get a pretty good idea of who is doing what.

But what’s missing is a good reason for all the waiting. With the equivalent of a supercomputer on my desk or laptop, I shouldn’t have to wait to accomplish the simplest chores. I can understand needing some time to compute tomorrow’s weather forecast. But checking my email?

Of course, some delays are inevitable–particularly operations that rely on the network or other slower peripheral devices. The fact is that performance and responsiveness are afterthoughts in the design of technology products where the almighty feature list rules the day. That’s unfortunate because performance and responsiveness are the cornerstones of user experience: how fast something works defines how well it seems to work. The effectiveness of even the most advanced user interface of features is lost if responsiveness is compromised.

Remember how windowing in the operating system was supposed to improve usability by being able to perform tasks in other windows while the one that was busy finished doing its thing? Right?

I’ve lost track of how many times I’ve lost the bet “I can look up the number/address in the phone book faster than you can on your laptop”. It would be refreshing to see performance and responsiveness trumping long feature lists. Do less, but do it better. And faster.

We have a long way to go! And I’m still updating…

Otto Berkes’ weblog

Category Archives: software

OpenGL

Turbo Pascal

New Microsoft Store!

Where has all the power gone?

« Home

Pages

Categories

Archives

Search

Blogroll

RSS Feeds

Meta