Since I converted the crazy clock to straight AVR GCC, I've been learning more and more about the AVR libc.
Before, I had written my own delay_ms() method for the ticking, but in looking into the library-supplied _delay_ms(), I discovered enough that I've switched entirely over to using it.
In particular, as long as you call _delay_ms() with a value that's constant at compile-time, the actual assembler code that will be generated will be nothing more than the assembler equivalent of "for(i = 0; i < magic_value; i++) ;" The compiler will select the "magic_value" based on the F_CPU macro (which is the frequency of the clock in Hz) and the actual number of clock cycles that the compiler knows each instruction will require. In my case, the code actually shrank by a few bytes compared to my old code which watched the timer0 count (the old code was also problematic in that it would lock up if the counter overflowed during a delay). This is all despite the fact that the ostensible argument to _delay_ms() is a double. But as long as the value is a compile-time constant, the compiler will optimize away all of the floating point requirements and will figure out that it needs to delay a particular number of cycles, and will generate assembly code to do exactly that. Brilliant!
And if you just need to waste a certain number of cycles, then __builtin_avr_delay_cycles() will generate exactly the correct code. I believe this might be useful to simplify some of the bit-bang USB code in the usbtinyisp. That will need careful consideration...