This time we are going to implement line drawing in 3 different ways. All of them will be in assembly only, since we already had enough fun competing with C compiler in the last two exercises.
The Simple WayThe simplest way of line drawing is to walk through the line step by step. The stepping size, dx and dy for each direction, is calculated before the walk begins. Let D be the max of the distances in both x and y direction, then we have dx = (x1-x0)/D, dy=(y1-x0)/D. Translation to assembly is straight forward:
There are 24 instructions in the code above.
It takes about 3.3s to draw 1 million random lines within a 512x512 canvas on Raspberry Pi 3B+ running at 600MHz.
To simplify things, the canvas is just a byte array buffer pointed to by a global register
Note that the time also counts 4 random number generations for each line.
The Bresenham Way
Bresenham algorithm is a rather clever way of drawing lines with only integer arithmetic. In depth explanation of the algorithm can be found in any textbook on computer graphics. It can be summarized as below:
The algorithm is implemented in 32 instructions. It takes 2.8s to draw 1 million random lines, about 12% speed up over the simple version.
The SIMD Way
In the two previous ways, dots on the line are calculated one after another. It doesn't have to be sequential though, since the position of each dot can be calculated independently. In fact, using vector instructions of arm64, also known as SIMD, we can calculate 4 dots at a time.
The program is about twice the size of the simple version. It may look quite complicated with all those vector instructions, but it's essentially doing the same stepping, except doing it 4 at a time. The loop of at the bottom should give you a rough idea of how it works.
The SIMD implementation draws 40% faster than the simple version, and 30% faster than the Bresenham algorithm. However, if I have to choose one of these to draw lines, I will probably use the slowest yet simplest way. Cleverness is like entropy, the less of it in a system, the better we are.
Some other words
You may have noticed that in these implementations we didn't touch stack at all. ARM64 has a lot of registers. The abundance of registers allows us to store a fair amount of program states directly in CPU. Unfortunately, as register usage grows, it quickly becomes too much a headache for an assembly programmer to remember which is for which.
In the code listings above,
.greg directives to the rescue.
They are borrowed from MMIXAL specially for allocating and naming registers.
I had to write a simple script to translate the source to the standard GAS format.
It makes life so much easier that I can't really complain about the extra step.