The texture generator is working pretty well now, generating random 3D fields of rocks of various patterns and sizes textured with the user's images and normals, if available, rendering the 3D models to 2D diffuse, normal, and height map textures, rendering the 2D textures to screen with a demonstration shader and moving lighting and writing the textures to disk on demand. Now I have to come up with the rest of an actual bump-mapped texture-generating utility with this at its core.

These are the first randomly generated rock wall textures that got written to disk. I was surprised when I saw them because I hadn't tested things like the orthogonal projection and wasn't sure about the camera position. It worked. On to height maps and normals.

Just some randomly generated patio stones. I found the bump and height mapped stone texture on the web and used it as a testbed for learning how to write shaders, but now I want to be able to generate my own textures.

This was a biggie for me. A week of struggling to learn how to write shaders using GLSL was rewarded with adding parallax height and bump mapped shaders with handling of diffuse and specular light sources to my bag of tricks. This is a 2D texture painted on two triangles, but it appears to have depth when you move around on it. After I got the height mapping working, I tackled the lighting and was able to put a point light source floating back and forth over the texture and see detail shadows and light shining off the surfaces in a Blinn-Phongy kinda way.

Here is the vertex/fragment shader set that did the above. I wouldn't have had a chance without the kind work and documentation and project blogs posted by many others that have already done this, but it was still surprisingly difficult to debug. There's no way to step into the shader code and check to see what the values in a matrix are. Various tricks like writing values into pixel colors are possible but very time consuming, especially when the values you're after are an additional transform away. After several days of wanting to punch the monitor and sleeping on it and wanting to punch it again, I was eventually able to start to intuit where things were probably getting screwed up when pixels would skew 90 or 180 degrees off or a light would dim when I would face the camera a particular direction. There appears to be more of a need to find one's center and breathe and tap into The Force with shader programming. Fortunately, there's also a need to keep shader code paths really short, especially those of fragment shaders, so that approach isn't entirely out of the question. Seriously though, GPUs could really use some debugging tools, even if they were nothing but random snapshots of values from single vertex or pixel shader runs left in a buffer somewhere. Ideally, they'd be obtainable using a very lightweight standalone app receiving data from a manufacturer's driver rather than EXT/ARB API additions so that they'd be useful independent of OpenGL or Direct3D platforms or versions.

   vshader = "\
attribute vec3 tangent;\
varying vec3 eyeVec;\
varying vec3 lightVec;\
varying vec3 halfVector;\
void main() {\
  gl_TexCoord[0] = gl_MultiTexCoord0;\
  vec3 n = normalize( gl_NormalMatrix * gl_Normal );\
  vec3 t = normalize( gl_NormalMatrix * tangent.xyz );\
  vec3 b = -cross(n, t);\
  vec3 vertexWorld = vec3(gl_ModelViewMatrix * gl_Vertex);\
  vec3 lightWorld = gl_LightSource[0].position.xyz;\
  vec3 v = normalize(lightWorld - vertexWorld);\
  lightVec.x = dot(v, t);\
  lightVec.y = dot(v, b);\
  lightVec.z = dot(v, n);\
  v.x = dot (vertexWorld, t);\
  v.y = dot (vertexWorld, b);\
  v.z = dot (vertexWorld, n);\
  eyeVec = normalize(-v);\
  halfVector = eyeVec + lightVec;\
  gl_Position = ftransform();\

   fshader = "\
uniform sampler2D color_texture;\
uniform sampler2D height_texture;\
uniform sampler2D normal_texture;\
varying vec3 eyeVec;\
varying vec3 lightVec;\
varying vec3 halfVector;\
void main() {\
  vec2 texUV, srcUV = gl_TexCoord[0].xy;\
  float height = texture2D(height_texture, srcUV).r;\
  float v = height * 0.02 - 0.01;\
  vec3 eye = normalize(eyeVec);\
  texUV = srcUV + (eye.xy * v);\
  vec3 rgb = texture2D(color_texture, texUV).rgb;\
  vec3 light = normalize(lightVec);\
  vec3 halfV = normalize(halfVector);\
  vec3 n = normalize(texture2D(normal_texture, texUV).rgb * 2.0 - 1.0);\
  float nDotL = max(0.0, dot(n, light));\
  float nDotH = max(0.0, dot(n, halfV));\
  float power = (nDotL == 0.0) ? 0.0 : pow(nDotH, gl_FrontMaterial.shininess);\
  vec4 ambient = gl_FrontLightProduct[0].ambient;\
  vec4 diffuse = gl_FrontLightProduct[0].diffuse * nDotL;\
  vec4 specular = gl_FrontLightProduct[0].specular * power;\
  vec4 color = gl_FrontLightModelProduct.sceneColor + ambient + diffuse + specular;\
  gl_FragColor = color * vec4(vec3(rgb) * max(1.0,height*2), 1.0);\

I know a lot of that stuff is "deprecated", but I tend to replace that word with "stable" in my mind because I want to be able to run on everything from Windows 2000 to Windows 10. I've been frustrated by utilities that were written by people who interpreted "deprecated" to mean that they shouldn't use it with the result being that their software fails to run on anything under Win7 and OpenGL 3.3 without actually needing to make use of any additional functionality that didn't exist in OpenGL 2.1.

Here's how I do FPS-aiming style mouse movement. First, the engine's main struct, app, knows the screen width and height, so app->width >> 1 and app->height >> 1 are the center of the screen. Second, the cursor is hidden. My main window proc message handler gets updates to the mouse cursor with this:

    case WM_MOUSEMOVE:
      app->win->xpos = (unsigned short)(lParam & 0x0000ffff);
      app->win->ypos = (unsigned short)(lParam >> 16);
My game's main loop applies those values to the player's pitch and yaw angles and then recenters the cursor position like this:
      actor->angles.yaw += (app->win->xpos - (app->width >> 1)) * 0.1f;
      actor->angles.pitch += ((app->height >> 1) - app->win->ypos) * 0.1f;
      app->win->xpos = app->width >> 1;
      app->win->ypos = app->height >> 1;
      SetCursorPos(app->width >> 1, app->height >> 1);
What is cool about that is that it works fine in conjunction with keyboard or game controller button handlers doing acceleration and speed-based yaw and pitch control. I've played some console ports that would've been much better if they'd dropped this in instead of whatever convert-mouse-to-speed-capped-turn-rate thing they did. The 0.1f can be made configurable to adjust sensitivity, and axis can be made invertable by allowing the subtraction order to be changed. I thought it wasn't too bad as-is for a few lines of test code that worked on the first compile... at least in fullscreen mode.

It didn't work so hot in windowed mode. The problem was that the WM_MOUSEMOVE messages deliver the cursor position relative to the application window, but Windows treats SetCursorPos() coordinates as being relative to the screen even when the application is windowed. Go figure. I don't know if I'll keep doing it this way or not, but here are additions that made it work in either fullscreen or windowed modes:

   case WM_MOVE:
      // Use these offsets to make WM_MOUSEMOVE always screen-relative
      app->win->xoff = (unsigned short)(lParam & 0x0000ffff);
      app->win->yoff = (unsigned short)(lParam >> 16);
      app->win->xpos = (unsigned short)(lParam & 0x0000ffff) + app->win->xoff;
      app->win->ypos = (unsigned short)(lParam >> 16) + app->win->yoff;
   case WM_SETFOCUS:
      app->win->flags |= WIN_HAVE_FOCUS;
      app->win->flags &= ~WIN_HAVE_FOCUS;
And then I avoid calling SetCursorPos() when I don't have focus:
      if (app->win->flags & WIN_HAVE_FOCUS) {
         actor->angles.yaw += (app->win->xpos - (app->width >> 1)) * 0.1f;
         actor->angles.pitch += ((app->height >> 1) - app->win->ypos) * 0.1f;
         app->win->xpos = app->width >> 1;
         app->win->ypos = app->height >> 1;
         SetCursorPos(app->width >> 1, app->height >> 1);

I finally got my first interesting thing on the screen. It's an RGB-painted skydome with geometry and interpolated color that comes from the .lot resource loading system and gets rendered using VBOs, VARs, or a glBegin/glEnd loop depending on the capabilities of the host. I'm pretty happy with how it turned out and what it can do with sky background colors. The emesh.csv entries to construct it look like this:

"COLR",2,"U","L","skynight","0x101c33ff0207243f702037661e335c1c2f55182a4e14233e121f38a ..."
The DOME section 0x4000 scales the hemisphere to 64 OpenGL units, the 10 sets the perimeter to 16 verts, the 07 sets the skydome "rib" height to 7 verts, and then there's a list of 7 angles used to place the rib verts. The top center is implied, so that particular dome winds up with 113 verts. The color lists are an arbitrary number of strips that go up the ribs, with the engine interpolating colors as needed. The idea is to be able to morph between dome color sets in game time, so sunrise colors can appear on the east horizon and change to daytime, then interpolate to sunset colors on the west in the evening.

Not long after completing Robokrusher, I find myself writing some more OpenGL code. Robokrusher was intentionally light on polygons because I wanted it to be able to run on the same kind of hardware that someone might be running MAME on; ie: low-end PCs. I didn't make use of shaders or VBOs or display lists or even vertex arrays to implement it, but now I'm tinkering on another game, or more specifically, a general purpose engine to base another game on, and I'm thinking about something that's able to push a lot more polygons around.

While looking into display lists, I ran across Song Ho Ahn's display list demo and the results were interesting. On a 1.6Ghz T43 Thinkpad with an ATI Mobility Radeon 300, enabling display lists made it run slightly slower than when they were disabled. Huh? That wasn't what I expected. I tried it again on a 4.1Ghz AMD 6800K A10 with an 8670D, with the same results. Whaa? Baffled, I tried it yet again on an older 1.8Ghz machine with an ATI X1650 and got 150fps without and 650fps with display lists enabled. The little cartoon lightbulb finally flickered over my head. The T43 and the A10's graphics hardware shares system memory. It takes roughly the same time to copy vertex data from one place in system memory every frame as it does to copy it from another. The X1650, however, is a separate card with its own memory. Without display lists, the vertex data has to be copied from system memory to the card memory over the PCI bus every frame. With display lists, the card is able to get the vertex data directly from its own memory every frame. The T43 and A10's implementations are impressive, but a shared memory design makes it impossible to take advantage of the potential performance boosts that could be had by hosting data that doesn't necessarily change between frames in the graphics hardware's memory. I think I'll continue down the display list path, especially since it seems to work great on everything from X1650s circa 2006 to GTX 270s circa now. (I wound up switching to a VBO, VAR, or glBegin-glEnd last resort approach, but nothing is cast in stone.)

I've got an asset loading system working that I'm hoping will be really fast as well as mod friendly. Resource files, called lots, are used to contain digital assets like textures, meshes, sounds, fonts, animations, etc. all packed together and optionally zlib-compressed. People familiar with Bethesda modding will see some similarities: A plugins file lists one or more .lot files in order that the game will index on startup. .lot files containing resources that conflict with those in previous .lot files take precedence. Loose files in a data directory with names that conflict with records in the .lot files take precedence in turn. .lot files are constructed by an external resource compiler that takes .lot definitions in the form of .csv files that look like this:

If a modder wanted to replace the console font, for example, they would supply a .lot file that contained the resource named "cfont". DMXY contains how many characters across and how many down are in the font texture file, and the third byte is which ASCII character the font texture begins with. A font file with ten rows of ten characters beginning with space in the upper left corner would be DMXY 0x0A0A20.