GFA-BASIC 32 for Windows

01 October 2024

DEP and GB32

"Data Execution Prevention (DEP) is a system-level memory protection feature that is built into the operating system starting with Windows XP and Windows Server 2003. DEP enables the system to mark one or more pages of memory as non-executable. Marking memory regions as non-executable means that code cannot be run from that region of memory, which makes it harder for the exploitation of buffer overruns."

I underlined the sentence that is important to GB32 developers. When a program is RUN (F5), the GB32 inline compiler creates an executable in memory and runs it from there. The DEP setting of your system can prevent the running of the code. This happened to a user who bought a new PC with a factory DEP setting of 3. Not only couldn't GB32 run the code, other nasty things happened as well, for instance programs couldn't be saved anymore.

To obtain the your system's DEP setting, you will need to follow the following steps:

Go to This PC -> right click and select Properties.Then select Advanced Settings and choose the Advanced tab, now click the Settings button of the Performance section. Here you can select the Data Execution Prevention tab. Normally, the option to protect Windows programs and services is selected. This conforms to DEP setting = 2.

You could also try this small GB32 program to obtain the system's DEP setting:

$Library "gfawinx"
$Library "UpdateRT"
UpdateRuntime      ' Patches GfaWin23.Ocx

Declare Function GetSystemDEPPolicy Lib "kernel32" ()
Debug "DEP-Setting: "; GetSystemDEPPolicy()

The fact that you can Run the code tells you that the DEP setting isn't 3. A setting of 3 wouldn't allow the execution of the code stored in memory by the GB32 compiler.

It isn't a problem solely for the GB32 developer, but the final EXEs created with GB32 will also suffer from a DEP setting of 3. The EXE is a stand-alone program and its code can be executed, but the UpdateRuntime function 'hacks' the GfaWin23.ocx runtime and reroute some code to new code in memory, and executing new code in memory is not allowed with DEP = 3.

Conclusion
To be able to execute a GB32 produced EXE the DEP system setting of the user must be 2 or lower.

03 July 2024

Function returning UDT

Last time we discussed why you better not use a Variant in a user-defined type. This time we look at functions that return a user-defined type. Let's look at a function - called MakeRect - that creates and returns a RECT (a built-in type). Here is non-optimized, but often used, version:

Dim rc As RECT
rc = MakeRect(0, 0, 100, 200)
Function MakeRect(ByVal x%, ByVal y%, ByVal w%, ByVal h%) As RECT
  Local rc As RECT
  rc.Left = x%
  rc.Top = y%
  rc.Right = x% + w%
  rc.Bottom = y% + h%
  Return rc
EndFunc

The function declares a temporary local variable rc and fills it's members with the passed values. Before the function ends the Return statement returns the local rc RECT variable to the caller. What actually happens is: the Return statement copies the contents of rc to a special local variable before it exits the function. Functions use a temporary local variable with the same name and type as the function declaration. In this case, the local variable is called MakeRect. Normally, the contents of the function-local variable is returned through the stack, meaning it is copied from the function-local variable to the stack.

In short, a Return value statement first copies the value to the function-local variable, it is then copied to the stack, and then the function returns. A two step process. Therefor, it is advised not to use the Return statement, but instead assign the return value to the function-local variable directly. When we change MakeRect function, it looks like this:

Function MakeRect(ByVal x%, ByVal y%, ByVal w%, ByVal h%) As RECT
  MakeRect.Left = x%
  MakeRect.Top = y%
  MakeRect.Right = x% + w%
  MakeRect.Bottom = y% + h%
EndFunc

As you can see, there is no local variable rc anymore, and no Return statement. The function is optimized for speed (and size, because there is less copying to do).

Last, but not least, GFA-BASIC 32 has another performance optimization. In case of UDT return values, GB32 passes the UDT variable as a byref parameter to the function. So, MakeRect actually receives 5 parameters. When GB32 creates the function-local variable MakeRect, it sets a reference to the hidden byref UDT parameter. So, when assigning values to the MakeRect function-local variable, we are writing directly to the global variable rc that is secretly passed to the function.

03 April 2024

Variant in an UDT?

Can you put a Variant in an user-defined type? It depends.

User defined type
A user-defined type is a data structure that can store multiple related variables of different types. To create a user-defined type, use the Type…End Type statement. Within the Type…End Type statement, list each element that is to contain a value, along with its data type.

Type SomeType
  Left As Long
  FirstName As String * 5
  Ages (1 To 5) As Byte
End Type

This code fragment shows how to define an user-defined type, it contains a 32-bit integer, a fixed string occupying 5 characters (bytes) and a 1-dimensional array of type Byte, also occupying 5 bytes. In total (theoretically) the user-defined type occupies 14 bytes. To get the actual size of the UDT you would use the SizeOf(SomeType) function, which reports a size of 16 bytes. The member Left is located at offset 0, FirstName at offset 4, and Ages starts at offset 9. Because the size of the structure is a multiple of 4, due to the default Packed 4 setting, the type is padded with empty bytes to fill it up until it is a multiple of 4. (A Packed 1 setting would return a size of 17.)

It is important to note that all data is stored inside the UDT. GB32 does not allow data to be stored outside the boundaries of the UDT. So, any Ocx type, Object type, or String type are forbidden, because they are pointer types that store their data somewhere else. (If it would be allowed, a Type with only one string member would have a size of 4, because only the 32-bits pointer to the string data would be reserved.) Because pointer types aren't allowed as Type members, GB32 does not call an 'UDT-destructor' when the UDT variable goes out of scope as it does with a String or Object variable as showed in the following sample:

Proc LocalStr()
  Local String s = "Hello"
  // ...
EndProc         // Destroys the string s

Now, suppose a dynamic String would be allowed in an UDT. When the UDT goes out of scope, GB32 would need to free the memory allocated by the string, which it cannot. In fact, GB32 does nothing at all when an UDT goes out of scope. A local UDT variable is located on the stack and GB32 simply restores the stack to its state it was before invoking the procedure.

Variant in UDT
Because of these restrictions, people are tempted to store a string in a Variant Type member, simply because GB32 does allow this. Then, when the Variant is then assigned a string (or an object) GB32 does not complain. However, when you assign a string to a Variant, the string is converted to a BSTR type and stored somewhere in COM-system memory (using SysAllocString*() APIs). Because GB32 does not release the Variant in the UDT, this memory is never freed. In addition, since the string memory isn't located on the program's heap, it won't even be released when the program is ended. It remains in COM-memory forever.

Conclusion
You can use a Variant in and UDT, but only to store simple numeric types. Dynamic data types (String and Object) are never released.

09 February 2024

What is the purpose of New?

The New keyword is used to create a COM object instance that cannot be created otherwise. In GB32 the most objects are created using the Ocx command, however not all COM types (Collection, Font, DisAsm, StdFont, and StdPicture) can be created this way.

The Ocx command is defined as:

Ocx comtype name[(idx)] [[= text$] [,ID][, x, y, w, h]

It performs two things:

Declares a global variable name of the specified and predefined comtype
Creates an object instance and assigns it to name.

The comtype is one of the GB32 provided COM types: a window based objecttype like Form, Command, but also Timer. Creating concrete objects of these types require initialization parameters, sometimes optionally, but the syntax must be able to accept additional parameters.

To create objects of non-window based types GB32 provides the New keyword to be used in a variable declaration statement. Here also, the two step process must be followed: declare a variable and assign the object instance.

Dim name As [New] comtype

Without the New keyword only a variable is declared and initialized to Nothing (0). New forces GB32 to create an instance (allocate memory for it) and assign that memory address to the variable name. In contrast with the Ocx command which creates global variables, these objecttypes can be declared and created locally. When the procedure is left the object is released from memory.

When declaring a COM type (Ocx or Dim) GB32 only reserves a 32-bits integer for the variable. The variable is a pointer to the allocated memory of the object and initially zero (Nothing). For instance:

Dim c As Collection

The variable c is a 32-bits pointer that can be assigned another instance of Collection by using the Set command.

Passing a COM object
An often asked question is whether an object variable should be passed ByRef or ByVal to a procedure. Since the variable is a pointer a ByVal parameter accepts a copy of the pointer and ByRef the address of the 32-bits pointer variable. In both cases GB32 knows how to handle the passed value, although a ByVal passed pointer outperforms a ByRef parameter. To access a ByRef parameter there is always an extra level of indirection.

Passing an object does not increment the reference count on the object.

20 October 2023

File Creation Date and Time

To read the file-time of a file GB32 offers the following functions:

Dim ft As Date
ft = FileDateTime(file$)        // last write time
ft = FileDateTimeAccess(file$)  // last access time
ft = FileDateTimeCreate(file$)  // creation time

They return a Date datatype with resp. the last write time. last access time, and the creation time.

A Date stores a date/time value as an 8-byte real value (Double), representing a date between January 1, 100 and December 31, 9999, inclusive.

The integer part of the Double represents the day:
- The value 2.0 represents January 1, 1900
- 3.0 represents January 2, 1900, and so on.
Adding 1 to the value increments the date by a day.

The fractional part of the value represents the time of day. Therefore, 2.5 represents noon on January 1, 1900; 3.25 represents 6:00 A.M. on January 2, 1900, and so on.
Negative numbers represent dates prior to December 30, 1899.

You can display the Double representation of the current time like this:

Dim dt As Date = Now
Debug CDbl(dt)

The FileDateTime* functions are wrappers around Windows API functions and from their name you expect the date of the last write, last access, or the creation time. For instance, the FileDateTimeCreate() should return the time at which a file is created, thus - for instance - the time a program executed the Open command to create a new file. (BTW the time of the file is not set before the Close command is invoked.) However, if the same file was created earlier, the date/time of the earlier creation is returned!
According to the MS documentation:

"If you rename or delete a file, then restore it shortly thereafter, Windows searches the cache for file information to restore. Cached information includes its short/long name pair and creation time."

Ok, that is a little disappointment, we cannot rely on FileDateTimeCreate to return the file's creation time. What's left is the most obvious function FileDateTime(), which is a kind of default function for a file's time. This first choice function returns the last write time, which seems to be the best thing to get the last time the file was created. The last write time returns the time the file is recreated, last updated or written to. So, you can use FileDateTime to obtain the 'last creation time', because FileDateTimeCreate returns the first creation time. To be sure I tested this by recreating an existing file created first at 3 March 2023:

Open "c:\tmp\test.txt" for Output As # 1
Print # 1; "Text"
Close # 1

After executing these codelines and subsequently choosing the file's properties in the Windows Explorer it indeed still shows a creation date of 3 March 2023. To force the file to a new creation date I changed the code with a Touch command:

Open "c:\tmp\test.txt" for Output As # 1
Print # 1; "Text"
Touch # 1
Close # 1

Now the Windows Explorer properties show the current date as the creation date. Using the Touch command will change all three filedates to the current time. To set each file date separately use one of the SetFileDateTime* functions after closing the file.

03 August 2023

File I/O using Get and Put

This time I want to put the focus on the Get and Put commands that were first introduced in GFA-BASIC 32. These are high-performance commands to save and load the contents of a variable in a binary format. The syntax for these commands:

Get #n, [index%], variable
Put #n, [index%], variable

The index% parameter specifies the record number, but it is only required for a Random mode file. A Random mode file needs a Len value when opened, which is the last parameter in the Open command:

Open pathname [For mode] [Access access] [share] [Commit] [Based 0/1] As [#]filenumber [Len=reclength]

When the index% parameter is used, the file pointer is positioned at index% * Len. For all other files (mode is Output, Input, Binary, Update, or Append) the index% parameter should be omitted and:

Get reads from the current file position.
Put writes the variable at the current position.

Depending on the Based setting in the Open command, which is 1 by default, the file position is actually calculated as (index% - Based) * Len. If Len is omitted the record length is set to 1. Consequently, when index% is used with a Len setting of 1, the index% value is multiplied by 1. Therefor, it is important to omit the parameter in non-Random files, so most often these commands are used like this:

Put # 1, , a$
Get # 1, , a$

The binary format
The contents of the variable is not saved as a string representation of the value like Print # and Write # do. The contents of the variable is saved in the same format as it is stored in memory. For instance, a Float variable is stored in 4 bytes and Put only copies those 4 bytes to the opened file. Get reads the 4 bytes and moves them directly into the Float variable. Saving a the floating-point as a 4 byte binary prevents the rounding otherwise necessary for writing a string representation using Write # or Print #. (BTW Print # isn't a suitable file I/O command unless you are saving a string.)

If the variable being written is a numeric Variant, Put writes 2 bytes identifying the variable type of the Variant and then writes the binary representation of the variable. So, a Variant holding a Float is saved in 6 bytes: first 2 bytes identifying the Variant as basSingle (4) and followed by 4 bytes containing the data.
If the Variant holds a string, Put writes 2 bytes to identify the type (basVString) , then 2 bytes specifying the length of the string, followed by the string data.

A variable-length string is saved likewise, only it omits the 2 bytes that specify the datatype. Put writes the length in 2 bytes, directly followed by the string data. The disadvantage is the limitation of 65535 characters, because that's the maximum value that can be stored in 2 bytes (Card).

File mode
Since Put and Get are binary I/O commands you might think they are restricted to Random mode or Binary mode files, but that isn't true. Binary I/O can happen with files opened in all modes, including Output, Input, Update, and Append.These modes do not differ much; opening a file with a certain mode is more of a reminder to the developer than that it defines an I/O operation. It is the file I/O command that defines the input-output format. Print # always writes a string, and Out # always write a binary value, whatever the mode of the opened file. Making a file Random allows Get and Put to read/write to a record directly with out setting the file pointer, otherwise it is the same as Update.

Conclusion
The Get and Put commands are fast binary I/O commands and are an interesting addition to the file I/O commands from previous versions of GFA-BASIC.

29 May 2023

Sleep doesn't wait?

Once in a while you might have created a quick and dirty program that repeatedly prints some value from inside the Sleep message loop. However, there is something funny about that. Shouldn't Sleep wait until there is a message in the queue? So, how can you print something repeatedly? How can Sleep return from its wait state?

The issue
The next small program shows the issue. It opens a window and enters the message loop while printing a dot each time. (I wanted to print the number of the message (_Mess) that is retrieved from the queue, but that isn't possible. The _Mess variable is set in the window procedure and is not part of the Sleep (GetEvent or PeekEvent) command. In addition, GB32 filters the message and not all message numbers are stored in _Mess. So, instead, we print a dot each time the loop is executed.)

OpenW 1
PrintScroll = True
PrintWrap = True
Do
  Print ".";
  Sleep
Until Me Is Nothing

According to the documentation Sleep should 'sleep' until there is a message available. However, the Print command is executed frequently at a pretty high speed printing dots in the window. This shouldn't happen, should it?

Now, lets copy the code above into the IDE, save it and execute the code as a stand-alone EXE. For this choose the 'Launch EXE' button on the toolbar:

Now we see something completely different: in the stand-alone EXE the Print command is not executed repeatedly; now Sleep does wait.

Difference between IDE and standalone EXE
When a program is run from inside the IDE the Sleep command doesn't seem to wait, because it returns repeatedly. However, the clue here is the word 'seem', because Sleep does actually wait (as does GetEvent), but it receives WM_TIMER messages at a pretty high speed. These WM_TIMER messages are missing in the stand-alone EXE. So, obviously, the IDE must be responsible for the WM_TIMER messages and indeed it is.

The IDE is to blame
The IDE uses several timers for different purposes. Well known timers are Gfa_Minute and Gfa_Second, that fire every minute or second resp. These two timers are disabled before a program is run (F5), so these are not the cause of the issue. However, there are two more IDE timers that are used to update UI elements like the toolbar buttons, for instance. Every 200ms the clipboard is checked to see if there is text available. If so, the toolbar's Paste button is enabled. Another timer is used to update the sidebar, this one fires every 100ms.

As it happens, these timers are not disabled before a program is run from within the IDE.These timers keep firing their WM_TIMER messages to the thread's queue, which are read by our Sleep in the program's message loop. Our program is executed in the same thread as the IDE and thus takes over the retrieving and dispatching of messages. While running a program, the IDE's message loop isn't executed any longer and all the messages for the IDE are obtained by the program's message loop. After quitting the program the IDE returns to its main message loop and takes over again.

Conclusion
Be aware that the number of times Sleep and GetEvent return from their wait state depends on the timer resolution of the IDE's timers that are still active while running the program.

04 January 2023

The danger of And and Or

The previous blogpost discussed the difference between And vs &&, and Or vs || in conditional statements. It showed why && and || should be used instead of And and Or. The post also contained an example where it is legitimate to use the And operator:

Local Int x = %111      ' binary notation of 7
If x And %11 > 0        ' test if lower 2 bits are set
  ' do something
EndIf

However, this condition is true for the wrong reasons. The condition is also true when only bit one is set. To prevent this and only check for the two lower bits we could change it into:

If x And %11 == 3       ' test if lower 2 bits are set

This seem to work ok, but the condition is still true for the wrong reason and that's gone haunt you. Let's try something different, let's test if bit 1 (value is 2) is set using the same method:

If x And %11 == 2       ' test if bit 1 is set
  Message "Condition is True"
Else
  Message "Condition is False"
EndIf

When we run this code, the program displays:

As you can see, the statement does not recognize that bit 1 is set (binary %10 == 2), despite the fact that the bit is 1.

So, what is going on here? Can't we trust And anymore? Maybe it is a bug? It is not a bug; it works like this is in any language and to understand this behavior we need to look at the operator precedence table. The helpfile contains the table under the topic name Operator Hierarchy:

( )	parenthesis
+ - ~ !	unary plus, unary minus, bitwise NOT, logical NOT
$ &	explicit string addition
^	the power of
* /	multiply, divide (floating-point)
\ Div Mul	integer division, integer multiplication
% Mod Fmod	integer and the floating point modulo
+ - Add Sub	addition (the string addition, too) and subtraction
<< >> Shl Shr Rol Ror	all shift and rotate operators (also: Shl%, Rol\|, Sar8, etc.)
%&	bitwise And
%\|	the bitwise Or
= == < > <= >= !=	all comparisons (also: NEAR ...)
And	bitwise And
Or	bitwise Or
Xor Imp Eqv	bitwise exclusive Or, implication and equivalence
&&	logical And
\|\|	logical Or
^^	logical exclusive Or
Not	bitwise complement

When an expression is evaluated the rules of operator precedence are applied. For instance, multiplication and division go before addition and subtraction. The table shows this by placing the * and / operators before (or above) + and -. Now look at the And operator; it comes after the comparison operators. This means that if an expression contains a comparison operator, it is evaluated (computed/calculated) before the And operator is applied. In our case, the part %11 == 2 is evaluated before And is applied. Since %11 = 3, 3 is not equal to 2 (results in False) and x And 0 is always 0, the result is 0 (False). These are the steps involved in the evaluation:

(Step 1) If x And %11 == 2
(Step 2) If x And False
(Step 3) If x And 0
(Step 4) If 0

Can you now see why the original example form above results in True?

If x And %11 == 3       ' True for the wrong reason

The problem can be solved by using either parenthesis or %&:

If (x And %11) == 2     ' test if bit 1 is set

If x %& %11 == 2        ' GB way

In all other languages than GB the first solution is used. However GB defines an additional 'And' operator with a higher precedence that comes before the comparison operators: the %& operator. See the table for its location.

The Or operator suffers from the same problem, so GB defines an or-operator with higher precedence, the %| operator.

Is there a moral to this story? Try to avoid the use of And (&) and Or (|), unless - for instance - when translating C/C++ code to GB. (C/C++ does not support %& and %|.) Otherwise you might prefer the use of %& and %|.

16 October 2022

And / Or versus && / ||

Are you aware between the difference between And and && and Or and ||? And and Or are mathematical operators, while && and || are logical operators, so they are quite different. The only place where the && and || operators are used are in conditional statements like If, While, and Until. And and Or are used in mathematical expressions (calculations) and are generally not meant to be used in the conditional commands If, While, and Until.

The following example shows the difference:

Local Int x = 2, y = 3
' The inefficient way using And:
If x == 0 And y == 3
  ' do something
EndIf
' The efficient way using &&:
If x == 0 && y == 3
  ' do something
EndIf

The commands following the If statements aren't executed, but for different reasons.
The first If statement evaluates both expressions and then performs a mathematical operation on the result of both boolean expressions. The second If statement only evaluates the first expression and never bothers to check y == 3. Because And is a mathematical operator like +, -, *, / the expression x == 0 And y == 3 is completely calculated. First x == 0 is evaluated, which produces the boolean value False (0). Then y == 3 is evaluated, which returns the boolean value True(-1). After calculating both expressions the boolean results are combined by the And operator: 0 And -1 which then returns 0 (False), because only one expression is True. These are steps involved:

(1) x == 0 And y == 3
(2) FALSE And TRUE
(3) FALSE

The second If statement uses &&. Because && is not a mathematical operator the second expression is only evaluated if the first expression is True. Note that an AND operator requires both operands to be TRUE to result TRUE, as you can see from this AND table:

Value 1	Value 2	AND-Result
0	0	0
0	1	0
1	0	0
1	1	1

From the table you can see that if Value 1 equals 0 (FALSE) the result is always 0 (FALSE), despite Value 2. Only when Value 1 equals 1 (TRUE) Value 2 needs to be evaluated. The compiler produces code to do just that. It first evaluates the first expression x == 0 and because the result is FALSE the next expression y == 3 is never executed. This results in a considerable increase of performance, especially if the second expression calls a function. For instance, in the next example the function div3() is never called if x is 0 ( otherwise div3() would generate a division-by-zero exception):

Local Int x = 0
If x != 0 && div3(x) == 3
  ' do something
EndIf
Function div3(ByVal n As Int) As Int
  foo = 3 / n

The same is true for Or and ||. Look at this OR-table, the result is always TRUE if the first value is TRUE.

Value 1	Value 2	OR-Result
0	0	0
0	1	1
1	0	1
1	1	1

In the next example the second expression (y == 4) isn't evaluated because the first expression (x == 2) is already TRUE, making the result TRUE despite the result of the second expression.

Local Int x = 2, y = 3
If x == 2 || y == 4
  ' do something
EndIf

If we used the mathematical operator Or, first both expressions would be evaluated and then the boolean results would be or-ed, an unnecessary extra operation.

The And operator may have a purpose in a conditional statement, but it would be used as a mathematical operator in the expression, for instance to test if certain bits are set:

Local Int x = %111      ' binary notation of 7
If x And %11 > 0        ' test if lower 2 bits are set
  ' do something
EndIf

Here x (=7) is mathematically And-ed with 3 which results in 3, and 3 is larger than 0. The If statement will be executed.

Conclusion
Rather than using And and Or, use the logical operators && and || in conditional statements.

19 July 2022

Numeric/string conversions

Ever wondered how numbers are printed? Check out the following snippet:

Dim i As Int = 2345
Print i

Before the number is printed to the active window, the number is converted to a string using a hidden call of the Str(i) function. The same is true for printing a Date (numeric) value; the date-value is converted to a string before it is output in the Debug Output window:

Dim d As Date = Now
Debug.Print d
Debug Str(d)            // short for Debug.Print

The output is (at the time of writing this blog) is:

18-7-2022 17:03:14
18-7-2022 17:03:14

The snippet demonstrates that the date is converted using a hidden call of the Str() function. It is not possible to print any numeric value without converting it to a string first. GFA-BASIC 32 uses hidden calls to the Str() function to convert the numeric value(s). The need for a numeric-to-string conversion before printing to screen (or printer) is that the underlying output functions require strings. For instance, to put the value on the screen the Print command uses the API function TextOut(hDc, x, y, strAddr, strLen), which requires a string.

Another implicit to string conversion is demonstrated in this example:

Dim s As String, i As Int = 2345, d As Date = Now
s = i & " " & d
Debug.Print s   // shows:  2345 18-7-2022 17:12:44

The & string operator allows to concatenate numeric and string values without converting them to a string explicitly. Still, under the hood, everything is converted to a string first. A nasty habit of the Str() function is to insert a space in front of converted value (a VB compatible setting). To prevent the space insertion use Mode StrSpace "0" somewhere at the beginning of the program.

The Str() and CStr() functions
The Str() function converts the value argument using a classic-style method, the output of the function is not language dependent. The produced string does not contain locale info for punctuation to separate the thousands and decimals in the value. A floating point is converted to a string with a single dot as separator. If the value is too large or too small the exponent notation is used.

s = Str( 1 / 3)    // argument is a Double
Debug s            // 0.333333333333333
s = Str(_minDbl)   // a very small number
Debug s            // -1.79769313486232e+308

The output of Str() can not be manipulated. However, there is another function that converts the numeric value according the user's regional or language settings, the CStr() function.

Dim f As Float = 1000.2345
Debug.Print CStr(f)     // output: 1000,234

Here the output follows the regional setting for the Netherlands where the decimal part is separated with a comma. The CStr() function follows the rules defined by the OLE API function VariantChangeTypeEx() that handles coercions between fundamental data-types, including numeric-to-string and string-to-numeric coercions. One of the parameters of the API is the LCID, a value that uniquely identifies the language to use. GFA-BASIC 32 stores the LCID value at start-up, by querying the user's default LCID from the OS. The LCID value is stored in the Mode structure and cannot be changed directly. The only way to modify the internal LCID is by using the Mode Lang command. By specifying a 3 letter language abbreviation GB32 will change the internal LCID value accordingly.

Dim sMyLanguage As String = Mode(Lang)
Mode Lang "DEU"         // change to German, LCID is changed.

If you're not satisfied with the numeric format produced by CStr() you can switch to the Format() function. The Format() function not only allows language depending conversion, but you can fine tune the output for your particular needs. When your user inputs a value in a language dependent value, for instance by using a TextBox, your program must be able to convert the string to a numeric datatype in GFA-BASIC's internal format using one of the C*() conversion functions, like CInt($).

String to numeric
Converting from string to numeric, is done using the Val*() functions or the other C*()-conversion functions. The Val* functions accept strings containing numeric values in the classical format, ie. the only punctuation allowed is a dot in floating point values. To convert to a Double use either Val() or its synonym ValDbl():

Dim d As Double, sValue As String = "1000.234567"
d = Val(sValue)
d = ValDbl(sValue)

To convert to a 32-bit integer use ValInt() and for a 64-bit integer ValLarge().

The Val() functions are not suited for language dependent formatted values. In the next snippet, the number is formatted according the regional settings of the Netherlands and is then converted to a Double using CDbl().

Dim d As Double, sValue As String = "1.000,23"
d = CDbl(sValue)
Debug d         // output: 1000.23

CDbl() uses the same API function VariantChangeTypeEx() to process the string and change it into a floating-point value. To parse a language dependent formatted numeric string use any of the following functions:

Bool = CBool(); Byte = CByte(); Currency = CCur(); Date = CDate(); Double = CDbl(); Short = CShort(); Integer = CInt() or Long = CLong(); Handle = CHandle(); Large = CLarge(); Single = CSng() or Single = CFloat(); String = CStr(); Variant = CVar()

These function not only parse strings, but can also be used to convert any other data-type using an explicit cast. For instance:

Dim i As Int, b As Bool = True
i = CInt(b)     // i becomes -1, same as i = b

These explicit casts produce the same code as simply assigning one datatype to another.

Conclusion
CStr() produces a language dependent formatted string using the current LCID value. The C* conversion functions use the LCID language value for conversion to a numeric datatype. Format() offers more possibilities to format a numeric value. Str() and Val*() function use classical formatted values.

25 April 2022

Variables and parameters

Let’s discuss some basic issues of variables and parameters and explain auto-complete information about variables and parameters.

This blog post is a sequel to: Where are variables stored?

Variable declaration
Before a variable can be used it must have been declared explicitly. A declaration introduces the variable-name into the compiler’s database. The declaration requires a name for the variable and a datatype so that the compiler knows what to do with that variable; an integer variable is handled completely different than a string variable. Usually, the type is specified in a declaration statement. If the type is omitted the variable gets the default type Variant.

Global i As Int, v      ' an integer and a Variant

Not specifying a type introduces a Variant that might quickly cause confusion and problems. Variant-operations use different functions than – for instance – integer and floating-point operations. When a Variant is used to store a numeric value, simply incrementing it would require the call of a special Variant-Increment function in the runtime. Incrementing a simple data type as Int and Float is an (almost) atomic operation and requires only one CPU or FPU instruction. Therefor, it requires some attention when declaring variables. An error is quickly made as shown in the next line:

Global pic1, pic2 as Picture  ' probably not wanted

This line declares two variables. Two Picture variables are required, but pic1 is a Variant!
Instead use this:

Global Picture pic1, pic2

Before the introduction of auto-complete it was difficult to note these errors. Now auto-complete shows you the type of the variable. Here the type of pic1 variable that was wrongly declared:

Global, local and static
Other statements to declare variables are Local, Dim, and Static. The Local statement declares a variable that only exists in a procedure. When Dim is used inside a procedure it declares local variables, when it is used in the main part of the program it introduces global variables. (Note that the main part of the program can have local variables as well using the Local statement.) When a procedure returns the local variables go out of scope and the variables are removed from the stack or and the dynamic variables are released (their memory is deleted).
The variables declared with Static are global but only locally visible. They are not freed when they go out of scope, they keep their contents.

Initialization while declaring
Declaring a variable adds it to the compiler’s database, a declaration does not introduce any executable code! A common practice is to collect the declaration of global variables in a separate procedure often called Init or something like that. Note that such a procedure doesn’t need to be executed, i.e. called from the main-part of the program. The procedure would not contain any executable code.
However, this changes if the declaration is used to initialize the variable with a value:

Global s As String = "Hello"

Now the declaration statement contains executable code that needs to executed when the program is run. The statement introduces the variable s into the compiler’s database and produces code to copy “Hello” into the string variable. If the program uses a procedure to declare globals that also initialize the variables, the procedure should be executed when the program is run. The procedure must also be run if the contains array declarations.

A special case is the Static local-variable, which is usually initialized while being declared. The initialization code is executed only once: the first time the Static statement is executed. (This is accomplished by guarding the Static statement by a hidden global boolean variable. After executing the Static statement the hidden boolean is set to true and the statement is never executed again.) Here is an excerpt form gfawinx.lg32’s WinDpi function:

Function WinDpi(hWnd As Handle) As Long

  Static Long pfnGetDpiForWindow = GetProcAddress( _
    GetModuleHandle("user32.dll"), "GetDpiForWindow")

  If pfnGetDpiForWindow       ' works from Windows 8.1
    WinDpi = StdCall(pfnGetDpiForWindow)(hWnd)
  Else
    WinDpi = GetDeviceCaps(Screen.GetDC, LOGPIXELSX)
    Screen.ReleaseDC
  EndIf
EndFunc

The pfnGetDpiForWindow is only initialized once with the function pointer to GetDpiForWindow() API or null if it isn’t supported. If the WinDpi() function is executed again, the pfnGetDpiForWindow variable is still pointing to the API or it is still null. If the API isn’t supported by the Windows version, the DPI of the screen-device context is returned.

Simple datatype parameters
When declaring procedure parameters you need to decide whether to pass a value or variable by value (ByVal) or by reference (ByRef). In general, a parameter is passed by value unless the passed variable needs to be modified. A by value parameter is pushed on the stack by the caller and popped from the stack by the called procedure. Passing a 32-bit integer by value requires 4 bytes of stackspace, passing a Variant by value takes 16 bytes (the size of a Variant).
When a variable is passed by reference the storage location of the variable – a 32-bit memory address - is pushed on the stack. A Variant passed by reference takes only 4 bytes of stackspace. However, a by reference variable requires an additional step from the compiler: it needs to obtain the address of the variable before pushing it on the stack.

Dynamic datatype parameters
How about passing an array, hash, string, variant, or object (OCX) parameter? Well, an array is simple, it can only be passed by reference. A hash can not be passed without problems due to a bug in GFA-BASIC.

Passing a string by reference is faster than passing it by value. A by value string is first copied in the calling procedure and then the (hidden) copy is passed by reference. It isn’t possible to copy an entire string on the stack! Because the string is first copied, it takes a malloc to allocate the string memory and a memcpy to copy the string’s characters. So, it can be (much) faster to pass a string by reference, you only need to make sure you don’t change the contents of the by reference string parameter.
Auto-complete cannot differentiate between these types of string parameters and always presents a string parameter with the Ref clause.

A COM object variable is best passed by value, it only takes 4 bytes to pass the contents of a COM variable. A COM or Ocx variable is a 32-bits integer pointing to the actual COM object. The only need for a by reference COM parameter is when the object must be set to Nothing.

Passing a Variant by value may cause trouble and even a program crash if not handled properly. The rule of thumb is:

Don’t write to a by value Variant parameter (don’t use the by value variant parameter as a convenient extra local variable).

Explanation of variant parameter issue

Often a subroutine parameter is used as an extra local variable that can be written to. For instance, the ByVal s parameter in the procedure foo above can be used to temporarily store a string, s is a copy of the string passed to the procedure. Writing to s won’t affect the string in the caller. A variant containing a string that is to be passed to a procedure by value does not copy the string before invoking the procedure.

Dim vnt = "Hello"
foo(vnt)        ' by value
Trace vnt       ' wrongly displays Hello

Proc foo(ByVal v As Variant)
  v = "GFA BASIC GFA BASIC GFA BASIC"

This code sample produces problems. The vnt variable stores a pointer to an OLE string containing “Hello”. When passed by value the parameter v is a copy of vnt, a 16 bytes data-structure with type information (VT_BSTR) and a pointer to the OLE string “Hello” on the stack. Assigning a new string to v will release the OLE memory currently pointed to by v. The new OLE string’s memory address is stored in v, together with the new data type (again a VT_BSTR). When leaving a procedure parameters aren’t cleared, so the foo procedure does not free the new contents of v. The variant’s 16 bytes occupying the stack are simply popped off the stack, leaving the new OLE string unreferenced. After returning from calling foo there is nothing that holds a pointer to the new OLE string and the OLE memory will never be released, the program is leaking memory.

Now, why does Trace vnt display “Hello”? After executing foo, the vnt variable is still referencing the OLE memory allocated by the assignment of “Hello”. The OLE string was released in foo when the new string was assigned, but the original variable vnt is never updated. The variable vnt still references the memory bytes where Hello was stored, bytes that weren’t actually cleared when released. The variable vnt references released OLE memory. When vnt goes out of scope, at the end of the program, it is released by GFA-BASIC by calling the OLE system function VariantClear(). Since the variable vnt points to released memory, the program may crash.

The type of procedures and parameter-defaults
To declare a subroutine you can choose between a Procedure, Sub, Function, or FunctionVar. The rule of thumb here is to use a Procedure or Function, unless you explicitly need a Sub or FunctionVar. The Sub is needed for event procedures where the parameters are passed by reference, the default behavior for a Sub. However, using a Sub as a general procedure might cause problems due to a flaw in the default by reference behavior. See the link at the end of this post for more information. If you use a Sub for something else than event subs make sure to use an explicit ByRef or ByVal clause in the declaration of the parameters.

Default behavior of procedures and functions:

Type of subroutine	Default ByVal or ByRef	Default datatype	Default return datatype
Procedure	ByVal	Double	-
Sub	ByRef in event subs, otherwise flawed	Variant	-
Function	ByVal	Double	Ref Variant
FunctionVar	ByRef	Variant	Ref Variant

As you can see, a function’s default return type is a by reference Variant. This means that the caller of the function passes a Variant by reference, the variant is ‘owned’ by the caller. This is illustrated by the following example:

Proc foo()
  Dim v As Variant
  v = GetValue()        ' passes v by reference
EndProc
Function GetValue()     ' default is variant
  GetValue = 10         ' this references v from foo
EndFunc

The GetValue = 10 assignment writes to the v variable from foo() directly.

Here you see the auto-complete information of the function-variable GetValue:

If the function’s return type is String, Object (or other Ocx type) or a user defined type, the caller passes a by reference variable to the called function.
A function cannot return an array or Hash datatype.
Note - Each function automatically gets a ‘local’ variable with the function’s name and type. This function-variable can only be used to assign a value, it is not available as a real local variable that can be used to read from; consequently auto-complete won’t show the function variable in a ‘read-context’.

Finally
Pay attention when declaring variables to not introduce unwanted Variants. Explicitly use ByVal or ByRef when declaring subroutine parameters, and also explicitly specify the datatype (either by name or by postfix). Auto-complete always shows by reference for string parameters, even if they are passed by value. The default return type of a function is a by reference Variant. Don’t use a Variant parameter as a general local variable, ie. don’t write to the variant.

13 March 2022

Update 2.62 March 2022

A picture says more than 1000 words, therefor a screenshot with notes of improvements and fixes of the IDE (running on Windows 11):

Runtime Updates
Since last year virus-scanners report the patched runtime (GfaWin23.Ocx) as malicious - a false positive! Consequently, it is no longer possible to release a runtime binary with bug fixes. A new way of fixing bugs had to be introduced. Starting with this update version 2.62 the runtime bugs are fixed on the fly, they are patched when the program is run (F5). For this to happen each new program automatically inserts the following lines:

$Library "UpdateRT"
UpdateRuntime      ' Patches GfaWin23.Ocx

The $Library “UpdateRT” command loads the compiled code with the exported procedure UpdateRuntime. The UpdateRuntime should be the first statement executed, so that the runtime is fixed before the actual program is run. By taking this approach we can continue fixing runtime bugs!
You can inspect the source code in UpdateRT.g32, which is stored in the Include directory.
For older programs these lines have to be added manually, so that they benefit form the fixes as well.

Fixed: the Branch-Optimization compiler bug
This update sets an important step forward by fixing the compiler’s Branch-Optimizing bug. When Branch-Optimizations was set to any other value than zero the program could crash in one particular situation: when a Case or Otherwise statement was directly followed by an Exit command or one of its variants (Exit, Exit For/Do, Exit Proc, etc). The EXE crashed also if ‘Full Optimization for EXE’ was set. Now this problem is fixed, your program can benefit from full branch optimization, it reduces the size of the program by 10% and increases performance.

A couple of new IDE features.

The editor now shows format lines (toggle with Ctrl+F7) and you can use PeekView to hoover over format lines to see what statements are connected by a format line.
New projects can be started from a template, ranging from a simple Hello World app to a sophisticated dpi-aware application. The templates can be found in the File | New menu-item.
AC (auto-complete) comes with many improvements.
In dark-mode the proc-line color is changed and the auto-complete list box is displayed in dark mode as well.

See Readme262.rtf for more fixes and improvements.

New Commands/Functions in gfawinx.lg32

PrevInstance helps in starting a program only once.
DlgCenter centers all GB32 dialog boxes in the main thread on top of their owner (Me).
DlgCenterHook is used to center the dialog boxes in a additional threads.
WorkWidth & WorkHeight return the real size of the clientarea of a Form with a toolbar and/or statusbar.
OffsetXY sets the graphical offset using positive values for GB32 graphical commands.
VStrLen, VStrLenB, and VStrPtr return the length (in character or bytes) and the address of a string in a Variant.
AutoFreePtr returns a COM object for storage of handles and (memory) pointers that need proper releasing, even when the program halts with a runtime error or Stop/End. The COM object has a Ptr (read/write) property to read and update the resource after the object is created.

See the updated CHM help-file for detailed information about these new features.
Note – gfawinx.lg32 is loaded automatically in new projects. Optionally, this can be disabled in the Extra tab of GFA-BASIC 32 Properties.

The new App_Close event
Probably the most important new feature of this update is the introduction of an application close event: the App_Close event sub. This event sub is automatically called after the program is closed. It provides a way to free global resources that might never be released otherwise. This happens when developing a program and the program stops abruptly with a runtime error. This happens all the time. When a runtime error occurs the IDE shows a message box telling about the error and puts an arrow on the line that caused the error. Maybe you don’t always realize, but after execution stopped no more code is executed and allocated resources won’t be released. For instance:

OpenW 1
' Create global resources
Do
  Sleep
Until Me Is Nothing
' Delete resources: might never be reached!

The releasing of resources also fails when a program ends with the End or Stop statement, no more code is run after these commands. Any API handles, memory pointers, bitmaps, and other global resources are not released and the program is leaking memory. When the program is run again and then stops again in the middle of the execution the leakage accumulates and soon you might experience unexpected errors. Even more, there are situations where the program can’t be run again, because of locked handles (for instance mutex handles). By using App_Close this leaking is over:

OpenW 1
' Create global resources
Do
  Sleep
Until Me Is Nothing

Sub App_Close()
  ' Delete resources: guaranteed to be released!
EndSub

Now put the program-lines that free the resources in the App_Close sub and the global resources are guaranteed to be released in all situations.

The AutoFreePtr object
The implementation of App_Close is only possible through the introduction of the AutoFreePtr object. Instead of using the App_Close event sub a resource can also be assigned to an AutoFreePtr, which will call a predefined or custom procedure to free the resource. Because App_Close can only be used for globally stored resources, any local resource can be automatically released by using an AutoFreePtr. For instance, when a procedure temporarily allocates some memory that needs to be freed at the end of procedure:

Proc Something
  Local pmem As Long, Mem As Object
  pmem = mAlloc(100)            ' alloc some memory
  ' Allocate an AutoFreePtr object and
  ' let it call mFree() automatically
  Set Mem = AutoFreePtr(pmem, AfpMfree)
  ' use pmem or Mem.Ptr
  pmem = mShrink(pmem, 50)      ' resize memoryblock
  Mem.Ptr = pmem                ' assign new pointer
EndProc                         ' No mFree call necessary!

The AutoFreePtr object provides a Ptr property (Long - Get/Put) to read the assigned pointer or handle and to re-assign a new pointer/handle.
The AutoFreePtr can free several predefined pointer or handle types. By specifying an Afp* constant you can instruct the AutoFreePtr object to invoke that specific release function for you. See the helpfile for more info on which types can be freed automatically.
Instead of specifying a constant to execute a predefined freeing-function, you can set a custom procedure to call when the AutoFreePtr object goes out of scope.

Proc Something2
  Local pmem As Long, Mem As Object
  pmem = mAlloc(100)            ' alloc some memory
  Set Mem = AutoFreePtr(pmem, ProcAddr(FreeMem))
  ' use pmem or Mem.Ptr
EndProc                         ' mFree executed in FreeMem()
Proc FreeMem(ByVal ptr As Long)
  ~mFree(ptr)
EndProc

Start using App_Close and AutoFreePtr to develop without leaking memory and handles.

Download the new update: GFA-BASIC 32 for Windows: Download

12 November 2021

Code Performance Timing

This post discusses performance timing of a piece of code. Basically, there are two methods of timing, either by using the _RDTSC function or the Timer function. _RDTSC timing uses the CPU’s internal clock and Timer (QueryPerformanceCounter) uses the preferred, most reliable and portable timing method selected by Windows. Note that code using _RDTSC isn’t portable, it cannot be used in production code to perform timing, instead always use Timer (QueryPerformanceCounter).

Setting the scene
Recently I converted a piece of GFA-BASIC 32 code to assembler and wondered how much faster it was. I used the _RDTSC function to time the execution-speed of the code. The structure of the program was this:

Dim t As Large
$StepOff
. db 0x0F, 0xAE, 0xE8       // LFENCE opcode
t = _RDTSC
' execute GB32 code
. db 0x0F, 0xAE, 0xE8       // LFENCE opcode
t = _RDTSC - t : Debug "GB32-code:"; t
. db 0x0F, 0xAE, 0xE8       // LFENCE opcode
t = _RDTSC
' execute assembler code
. db 0x0F, 0xAE, 0xE8       // LFENCE opcode
t = _RDTSC - t : Debug "Asm-code:";t
$Step

The $StepOff command disables the inclusion of debugging code; the GB32-code following $StepOff is compiled without instructions to call a Tron proc. It also disables the possibility to break (or stop) the program. The compiler switch $Step(On) re-enables the inclusion of Tron calls (and the possibility to break). Usually, these compiler switches are only used temporary for a small piece of code that needs to run at optimal speed in the IDE. A compiled to EXE (LG32 or GLL) program does not include the $StepOn functionality.

The test program gave me the following results: the GB32-code executed at ~4500 ticks and the assembler at ~3600 ticks. But what do these values mean? Are these the real CPU clock cycles necessary to execute the code?

The _RDTSC function
The answer to this question is no, the returned timing values did not represent the real number of assembler code clock-cycles as is advertised. The test-values simply returned the passed time between both _RDTSC calls. _RDTSC reads the time-stamp counter (TSC) of the CPU-cores. The value gives the number of elapsed ticks of the CPU’s internal clock. Maybe, in the past, in the era of one-core processors that only executed one app a time (MSDOS), the value might have represented the actual number of assembler clock-cycles. But not anymore. On a multi-core CPU the task is most likely split to run on multiple cores and the code is executed much faster than it would have done on one core. A consequence of splitting the code on multiple cores is that the _RDTSC (assembler) instructions could be executed on different cores as well, and might return different values. Only when the cores synchronize their time-stamp clocks (TSC) the _RDTSC values are reliable. You can test if the CPU supports the _RDTSC instruction by testing bit 4 of the value in _CPUIDD:

Debug Btst(_CPUIDD, 4)  ' must give -1 (True)
Debug InvariantTSC()    ' must return True

To check if the TSC is invariant use:

Function InvariantTSC() As Bool Naked
  _EAX = 0
  . push ebx            ; always save ebx
  . mov eax, $80000007  ; Advanced Power Management Information
  . CPuid               ; invoke cpuid
  GetRegs               ' store the register contents
  . pop ebx             ; restore ebx to access local vars
  InvariantTSC = Btst(_EDX, 8)
EndFunc

If all is well you can use the _RDTSC for performance timing, only, what are we measuring exactly? The timing values I measured were always different and sometimes ten times more than average… The reason for these irregular values is for instance multitasking, memory latency, and CPU power-saving. The test code will probably be interrupted by the OS to process other applications, writing and reading to memory costs time, and due to power-saving the CPU won’t execute the code at its highest clock frequency. These issues will remain regardless of the timing method you choose. I take closer at these issues at the end of this post.

Flush the CPU before using _RDTSC
There is one other thing related to the use of _RDTSC. The CPU does not guarantee that the machine code is executed in the order it is compiled. In other words, the _RDTSC command could be executed when the code to test has already started executing. To overcome this problem the CPU must be flushed before invoking the rdtsc assembler instruction. This is either done using the cpuid instruction or the lfence assembler instruction. We’ll use the lfence instruction, because the cpuid destroys the registers eax, ebx, ecx, and edx, where ebx is essential to GB32 and destroying it might give access violation errors.

For a proper working of TSC timing insert the lfence instruction before each rdtsc instruction:

. db 0x0F, 0xAE, 0xE8       // LFENCE opcode
t = _RDTSC

The lfence instruction is not part of the GB32 built-in assembler, so we must insert the opcode bytes of lfence into the code stream manually. (You could use cpuid, but first save ebx (push ebx) and restore it afterwards (pop ebx), see the InvariantTSC() code above for an example.)

Convert the TSC values to micro-seconds
To be able to compare timing results among different PCs we need to convert the timing values to a standard unit, seconds or micro-seconds. To convert 4500 clock-cycles to micro-second, the number must be divided by the CPU’s clock frequency. On my PC the CPU’s internal clock has a base frequency of 1.51 GHz, as reported by the Windows Settings. However, I can obtain the frequency (in MHz) from the CPU as well:

Function GetCPUMhz() As Long Naked
  _EAX = 0              ' clear global variable
  . push ebx            ' save, ebx is used for accessing local vars
  . mov eax, 0          ' get CPUID level
  . CPuid
  . cmp eax, $16        ' support level $16?
  . jnz .end            ' no, return 0
  . mov eax, $16        ' get CPUID level $16
  . CPuid
  GetRegs               ' copy registers (_EAX, ..)
  .end:
  . pop ebx             ' restore ebx to access local vars
  GetCPUMhz = _EAX      ' set the return value
EndFunc

The GetCPUMhz function returned 1500 MHz (1.5 GHz). So, my ~4500 clock ticks take ~4500 / 1500000000 seconds, or, by multiplying the result with 10E6 to convert to micro-seconds, gives ~30 µs. The assembler version of my code took ~24 µs. (The values are an average after running the test code multiple times.)

Should you use _RDTSC?
Microsoft strongly discourages using the RDTSC processor instruction, because “you won’t get reliable results on some versions of Windows”, see Acquiring high-resolution time stamps - Win32 apps | Microsoft Docs. Instead, Microsoft encourages you to use the QueryPerformanceCounter() API. However, these warnings apply to using RDTSC in production code, not to a temporary test situation on one PC.

Using QueryPerformanceCounter
The alternative to RDTSC is the portable QueryPerformanceCounter() API together with the QueryPerformanceFrequency() API. Well, that’s easy in GFA-BASIC 32, because these APIs are implemented by the the _TimerCount and _TimerFreq functions resp. Instead of using _RDTSC you could use _TimerCount, for instance:

$StepOff
t = _TimerCount
' execute code
t = _TimerCount - t : Debug "Ticks:"; t

For my test code the intervals with _TimerCount were ~30 and ~24 ticks, which are the same values as the reported _RDTSC intervals in micro-seconds. So, let’s see what these timer count values mean. The values tells us that the clock used by QPC was incremented by 30 and 24 ticks resp. As it turns out, QPC uses a different clock than RDTSC. The frequency of the QPC clock is returned by _TimerFreq and is - on my Surface 4 - 1000000 Hz (1 MHz) . So, the 30 timer count ticks correspond to 30 µs (Interval / _TimerFreq * 10E6), and the 24 ticks to 24 µs. So, as might be expected, both methods return (approximately) the same timings.

At start-up Windows decides how to implement QPC. It might use the CPU’s TSC for this purpose, but it might also use another clock provided by the PC’s hardware. Windows selects the most reliable method to obtain timing intervals. As you can see, on my PC Windows selects an alternative time source for QPC, the 1MHz clock.

GB32 has the right function for you: Timer
Now we know how to time intervals, we can use the GFA-BASIC 32 Timer function rather than divide the _TimerCount interval by _TimerFreq. The Timer function returns the current time in seconds as a Double:

Timer <=> _TimerCount / _TimerFreq

To convert to micro-seconds multiply the interval from two Timer calls with 10E6.

t# = Timer
' execute code
t# = Timer - t# : Debug "Time in µs:"; Round (t# * 10E6)

After changing my code to use Timer the results were again 30 µs and 24 µs.
How fast are my test-results, or how fast is 1 µs? On my PC 1 µs takes 1500 TSC (CPU) ticks and 1 QPC tick. Code that operates below 1500 TSC ticks or 1 µs can not be measured with QPC (at least on my computer). You can calculate your Timer’s resolution as follows:

Debug "Timer-resolution:"; 1 / _TimerFreq * 10e6;" µs"

Other timing issues
Whether you use _RDTSC (with lfence) or Timer (QPC) the timing results simply return the elapsed time as measured by one of the internal hardware clocks. The interval is not the exact time necessary to accomplish the task. Because Windows is a multi-tasking OS the code is split over multiple cores and these cores might be interrupted by the Windows scheduler to execute other running processes. These issues can be addressed by limiting the test code to one core by using SetThreadAffinityMask() API and reducing the chance that that core is interrupted by using SetThreadPriority() API. The structure of such a program could look like this:

Dim Mask As Long = SetThreadAffinityMask(GetCurrentThread(), 1)
~SetThreadPriority(GetCurrentThread(), 2)   ' THREAD_PRIORITY_HIGHEST
Try
  $StepOff
  Dim t# = Timer
  ' execute code
  t# = Timer - t# : Debug "Time in µs:"; Round (t# * 10E6)
  $StepOn
Catch
  MsgBox ErrStr("Timing code")
EndCatch
~SetThreadAffinityMask(GetCurrentThread(), Mask)
~SetThreadPriority(GetCurrentThread(), 0)   ' THREAD_PRIORITY_NORMAL

This might give more reliable results if you have multiple cores. (Again, you won’t do this in production code. It will affect your application's performance by restricting processing to one core or by creating a bottleneck on a single core if multiple threads set their affinity to the same core when calling QueryPerformanceCounter.)

Making the test code run on one core still doesn’t return exact timing values. The executed code must be loaded into the CPU and reading the code from RAM memory might be slow(er) or fail (page hit faults) and the code might have to be re-read from memory again taking some extra time. In addition, the tested code might load and save values to memory that suffer from memory latency as well. Finally, on a laptop the actual CPU frequency might be lower than it’s base frequency, due to CPU power savings. Unfortunately, timing is not an exact science.

See also: CPUID on Wikipedia