16 November 2018

Anatomy of a procedure (1)

Only recently the IDE features ‘Proc Disassembly’, an option available under the Edit | Proc menuitem. This is a valuable resource if you want to get a better understanding of the code generated by the compiler. Once you understand the disassembly of a proc you can use the information to your advantage, especially when it comes to optimizing procedures.

Bare minimum: Naked
Let’s start with a Naked procedure. A Naked procedure is fully optimized, both in size as in performance. This comes with a penalty though, a Naked procedure lacks support for dynamic variables, structured exception handling, and runtime debugging (Tron, Trace). The Naked attribute forces the compiler to produce code much like if it would be done in a pure assembly program. The assembly code of a procedure has great similarity with textbook samples. It’s not hard to understand the procedure flow when it is compared to the theory in the assembly books. Therefor, I start this series on the anatomy of procedures with these bare minimum procs. Examining naked procedures allow us to understand how a proc is constructed and this knowledge can later be used to examine regular procedures.

The following sample shows a Naked proc taking two parameter of a simple type (Long). For now, we’ll omit the use of dynamic datatypes like String, Variant, Object, etc. The procedure contains the local variable tmp, also of a simple datatype, and assigns the product of x and y to tmp. This is the entire program:

TestMul(2, 3)
Proc TestMul(x As Int, y As Int) Naked
  Local Int tmp
  tmp = x * y
EndProc

Now put the caret inside the procedure TestMul and select Proc | Disassembly, it produces the following listing in the debug output window:

--------  Disassembly -----------------------------------
1 Proc TestMul(x As Int, y As Int) Naked (Lines=5)
042704D0: 53                             push    ebx
042704D1: 55                             push    ebp
042704D2: 8B EC                          mov     ebp,esp
042704D4: 8D 5D 80                       lea     ebx,-128[ebp]
042704D7: 2B C0                          sub     eax,eax
042704D9: 50                             push    eax
042704DA: DB 45 0C                       fild    dpt 12[ebp]
042704DD: DA 4D 10                       fimul   dpt 16[ebp]
042704E0: DB 5B 7C                       fistp   dpt 124[ebx]
042704E3: 8B E5                          mov     esp,ebp
042704E5: 5D                             pop     ebp
042704E6: 5B                             pop     ebx
042704E7: C2 08 00                       ret     8

The first line specifies the line number of the procedure (1), its entire prototype, and the number of lines (here 5, but might be more if the procedure includes any trailing empty lines).
The numbers at the start of each line show the memory address of the instructions, which might be different from your result. Consequently, in this case, the function ProcAddr(TestMul) would return the address of the first byte of the procedure: 0x042704D0.
After the memory address follow the opcodes for the assembly instruction. For instance, the opcode with value 0x53 corresponds to the push ebx assembly command. Some instructions  require a one byte opcode only, others require multiple opcodes.

The first 6 lines make up the the procedure’s entry code (sometimes called prologue). The last 4 lines are the procedure’s exit code (or epilogue). The lines in between represent the actual functionality of the procedure.

Entry code
The procedure’s entry code prepares the procedure’s code to handle parameters and local variables:

push    ebx            ; save ebx
push    ebp            ; save ebp
mov     ebp,esp        ; establish stackframe
lea     ebx,-128[ebp]  ; let ebx reference local vars
sub     eax,eax        ; eax = 0
push    eax            ; clear first local var

Whenever a procedure takes a parameter or declares a local variable you’ll always find the same three instructions at the start of each procedure: push ebx / push ebp / mov ebp, esp. If the procedure also contains local variables the fourth line lea ebx, –128[ebp] is present as well. Following this line you’ll find the code that initializes the local variable; all local variables are initialized to zero.

Local variables, the purpose of ebx
In GFA-BASIC 32 the ebx register has a special purpose and thus ebx cannot be used as a general purpose register. It is used as a fixed reference point to address the local variables.

Note - According to the documentation it allows to layout variables that require more than 4 bytes (Double, Date, Large, Currency) on 8-byte borders increasing performance when accessed.

The ebx value points to an address 128 bytes down on the stack relative to the value in ebp, the stackframe. Although the first local variable is actually located at ebp – 4 , it will be referenced using the value in ebx. The location of the local variable is +124 bytes relative to the value in ebx, in assembly syntax the tmp variable is located at 124[ebx].

The value stored at that position is obtained using dword ptr 124[ebx]. This is illustrated by the next three lines of code where the parameters are multiplied by the fpu (floating-point processor) and where the result is assigned to the local variable tmp.

042704DA: fild    dpt 12[ebp]  ; load value of param x into fpu
042704DD: fimul   dpt 16[ebp]  ; multiply by value in param y
042704E0: fistp   dpt 124[ebx] ; store result in tmp

The parameters x and y are accessed using the value in ebp as we will see.

Stack structure
When the procedure is called, the caller puts the parameters y and x on the stack, in reversed order. GB subroutines conform to the stdcall convention, which means that the parameters are pushed from right to left and that the subroutine corrects the stack before returning. Since y is the most right parameter it is pushed first, followed by the parameter at the left (here x). Then the CPU adds the return address on the stack and executes the subroutine.

From this point on the stack is prepared according the procedure’s entry code discussed above. The result can be viewed in the next picture:

The entry code saves the current values from the ebx and ebp registers on the stack. Then ebp is assigned the new esp value. Now ebp is used to address the parameters: parameter x is located at a positive offset of 12 bytes from the value in ebp; in assembly code 12[ebp]. Parameter y is 16 bytes up the stack relative to the value in ebp, in assembly code 16[ebp].

To address the parameters throughout the procedure ebp needs to remain constant during the execution of the procedure. The same is true for ebx that is used to address the local variables. We cannot use esp to reference both parameters and local variables because esp changes automatically during the execution of the procedure. (Although C/C++ compilers sometimes keep track of esp and address all stack variables using an offset to esp.)

Allocating and initializing
After the stackframe is established (mov ebp, esp) the next step requires the reservation of stackspace for local variables, see listing. The general idea, and described in most textbooks, is to subtract the required number of bytes from esp and then clear that piece of memory. In our sample esp would have to be decreased by 4 bytes (for the Long variable tmp) and then cleared by zero. Although GB produces the same effect, it proceeds a bit different.

Note that the first byte of the 32-bits local variable tmp is located at ebp-4. After creating the stackframe by mov ebp, esp the registers esp and ebp point to the same stack address. To reserve and initialize the 4 bytes below esp GB uses the instructions sub eax, eax / push eax.

Subtracting a register by itself results in zero. By pushing zero, now the value in eax, GB both reserves and initializes the local variable in one step. It prevents the additional step to first decrease esp explicitly. The technique to use push to reserve and initialize is typical for GB. The push eax can be repeated to clear and reserve all stack memory necessary for local variables. Thus, if the procedure would have contained two local variables of type long it would have had two push eax instructions.

Exit code
Before leaving the procedure the stack must be returned to the state it was when the procedure was entered. In addition, because of the stdcall convention, the procedure must remove the bytes necessary for the parameters (2 * 4 bytes for two long parameters). This is how its done:

042704E3:   mov     esp,ebp ; restore esp
042704E5:   pop     ebp     ; restore ebp
042704E6:   pop     ebx     ; restore ebx
042704E7:   ret     8       ; return, discarding parameters

When the program returned to the caller the registers that matter and must remain constant are restored. This makes sure the caller can use the correct ebp value to access its parameters and that ebx can be used to access its local variables.

Optimize using disassembly
Inspecting a procedure’s disassembly is useful to get an idea what’s going on underneath the GFA-BASIC statements. The example presented in this blog proves why. The example performs a multiplication of two integer parameters and stores the result in another integer. As you can see, the compiler generates floating-point assembly instructions to perform the math. Since all variables are of type Long, the compiler could have generated more efficient code using the integer multiplication instruction imul. However, the compiler generates integer instruction only for addition and subtraction operators. Now its up to the programmer to optimize this procedure by replacing the multiplication operator * by the Mul operator. The optimized procedure then becomes:

Proc TestMul(x As Int, y As Int) Naked
  Local Int tmp
  tmp = x Mul y
EndProc

Now compile the code and inspect the disassembly. As you can see the floating point instructions are vanished and replaced by the imul instruction.

Conclusion
Inspecting a procedure’s disassembly requires knowledge to identify the three parts  of a procedure; the entry code, the actual code, and the exit code, We discussed how to identify parameters and local variables and saw how GB uses a specific technique to reserve and initialize local variables.

In coming blog posts we’ll discuss non-naked procedures and how you can tell a procedure is a good candidate to be naked.