Optimizing a 6502 image decoder – part II: assembly

https://news.ycombinator.com/rss Hits: 4
Summary

In the first part of this article, I focused on the algorithm itself, removing parts I wouldn’t use (like color), making it simpler, less loopy, etc. This yielded a ten-times improvement on a modern architecture, but would still be very slow if simply translated to assembly by a compiler like cc65. In this part, I’ll focus on things one can do to get “speed” out of a 1MHz 8-bits processor. It will include a number of “tricks”, some of them being largely accepted as “good code” – like predictive branching or lookup tables, others now being regarded as “ugly”, “dangerous”, “bad practice” – like self-modifying code, global variables, absent bound-checking, etc. Taking shortcuts was very often necessary 30 years ago and caring about contracts, scopes and separation and so on was… less of a thing. For memory, here is the reference C algorithm. Aligning your buffers Let’s get that out of the way right from the start: you don’t want the extra page-crossing penalty. Align the buffers, (stuff the holes with something that fits if memory is an issue), and be done with it. Bonus: patching addresses will only require patching the page byte. Sticking to 8-bits Anything involving 16-bits will be at least twice slower. Probably three times slower. When presented with a algorithm using int types, verify every variable. Is it really an int? Does it fit in 8 bits? Anything that fits in 8 bits should be 8 bits. Sometimes it can look like an uint8 could overflow, but is it by more than one bit? r = (a + b) >> 1 if a and b are 8-bits might look like it would overflow, but thanks to the carry bit, it would not. In this decoder, there were a lot of variables that were not initially, but ended up being 8-bits, and this helped a lot. The bitbuffer was 32 bits initially. I got it down to 16-bits easily, and it did seem like a good idea to have a second byte ready so the first one would always be complete and allow for direct matching… But shifting 16-bits was much more intensive than shiftin...

First seen: 2025-10-04 06:56

Last seen: 2025-10-04 09:57