27 June 2014

Passing default ByRef parameters to a Sub, a flaw?

In my opinion the GFA-BASIC’s Sub procedure type is an anomaly. When possible you should use Procedure instead. Oh, and when you are too lazy to type Procedure than just use its abbreviation Proc. A Procedure takes all arguments by value by default and is completely backwards compatible to previous GFA-BASIC versions.

On the other hand, if you are stubborn enough and want to use Sub anyway, make sure you use ByRef and ByVal explicitly with all parameters. Do not expect the default ByRef passing will work, because it won’t. We discussed this earlier in The Sub-ByRef flaw. This posting ended with the quote “Why this is happening is unclear, but it is easily repaired by using ByRef explicitly in Sub headings.” I was never convinced this was a real bug and I always assumed a reasonable explanation would come up some day. And it did ….

VB is the flaw; ByRef turns into ByVal
Let us consider the default Sub by reference implementation in VB, VBA, and VBScript. An argument is passed by reference only when the caller uses a very specific syntax. Only when the program executes the subroutine without parentheses the sub gets a reference to the actual argument. When the programmer puts the arguments between parentheses the arguments are passed by value. The next sample demonstrates how and when VB(A/Script) executes the default:

' Normal VB call of a Sub
' passing a reference to Hello$
SPrintStr Hello$ Sub SPrintStr(sArg As String)

When VB encounters a call with arguments inside parentheses, the arguments are evaluated first. In other words, before something is put on the stack the expression inside the parentheses is evaluated (executed, calculated, etc) and a local copy of the variable is assigned the outcome of the evaluation. The local copy is passed by reference to the subroutine, which than becomes a by value passing of the original parameter. Using the parentheses-syntax in the next example will pass the address of a local variable to the Sub SPrintStr.

' VB call of a Sub using ()
’ making and passing a copy of Hello$
SPrintStr(Hello$)
Sub SPrintStr(sArg As String)

Conclusion VB
When VB calls the subroutine with parentheses the sub gets a reference to a copy of the argument and the default ByRef situation suddenly turns into a ByVal passing. The VB Sub and its implicit, default, by reference passing introduces performance loss and unnecessary memory consumption! Remember this when you port VB code to GB32.

You’ll find the whole story on http://blogs.msdn.com/b/ericlippert/archive/2003/09/15/52996.aspx

GFA-BASIC supports with parentheses
You can almost feel the agony FO must have felt when he found out about this VB quirk. How to maintain backward compatibility to previous GFA-BASIC versions and VB? In GB we are used to place parameters between parentheses. Should we suffer from this VB anomaly as well? Either way, FO decided to support the Sub implementation with parentheses only. Passing arguments without parentheses is equal to passing them with parentheses.

Although the Sub-ByRef problem is much clearer now, the GB32 implementation is still a bit ambiguous. Because GB32 mimics the call with parentheses only the subroutine always receives a by-value argument to a default ByRef parameter. You may omit the parentheses in GB32, but it still executes the version with parentheses. Only when a global variable is passed to a default by reference argument the actual variable address is put on the stack.

Conclusion GB32
Despite the ambiguity between global and local parameters, you should, as a general rule, no longer consider a default by reference parameter as [in/out] when you omit ByRef in a Sub declaration. Inside the Sub you don’t know if you are dealing with the actual variable [in/out] or with a copy [in]. Always use ByVal and ByRef explicitly in a Sub declaration.

31 May 2014

Problem with local Hash variables

GB32 does not support the destruction of local Hash variables. This is by design. Global Hash variables are released when the program terminates. This does not mean you can’t use local Hash variables, you just have to add the termination code yourself.

This behavior might be the cause of many reported memory leaking problems.

Let us look at some examples. First suppose you have a subroutine like this where a local hash variable is used to store the results of the Split regular expression command:

Proc Split_Local(ByRef t$, sep$)
  ' Declare a local Hash variable.
  Dim hs As Hash String

  ' Split creates a new hash table of String,
'
and destroys the hash allocated memory first. Split hs[] = t$, sep$ ' Explicitly erase Hash variable, because ' memory allocated by hs[] is not destroyed ' automatically when going out of scope. Hash Erase hs[] EndProc

The Dim command declares a Hash String variable hs. The variable is put on the stack. Stack memory is temporary space limited to the scope of the procedure call. A hash variable occupies 8 bytes divided in two Long types. The first Long is a pointer to heap memory. This pointer is set to the hash-table descriptor once a value is added to the table. Initially, the pointer is Null. The second Long contains the code for type of data the hash is going to store. Here the type indicates a String, so the hash is used to store String values.
When the hash variable goes out of scope, at the end of the procedure, the stack is cleared or reset to the value it was at the point where the procedure started executing. The 8 bytes reserved for the hash variable are simply discarded without freeing allocated memory first.
A hash table allocates memory dynamically. When the hash needs more memory, it is allocated automatically. The hash grows and shrinks automatically allocating and freeing heap memory on demand. When the hash variable goes out of scope the allocated memory is no longer referenced by any GB variable or GB garbage collector. This memory gets freed after the application has ended and all of the application’s memory is released to the OS.

  • A local hash variable must be freed explicitly using Hash Erase on the variable name.

The Split command clears the Hash table as well, that is when the first Long of the hash variable isn’t Null. Before Split starts splitting the string in to tokens that are to be stored in the Hash String variable, it completely erases the Hash. In fact, the Split command invokes the runtime function HASHERASE, which is also called with the Hash Erase command.

Static Hash
Now suppose you would like to use Static on a local hash variable. That would prevent unnecessary memory de-allocations in your procedure and would improve performance when the procedure is executed, wouldn’t it? Wrong. Look at the next example:

Proc Split_Static(ByRef t$, sep$)
  ' Static declares a global Hash variable
  ' with local scope.
  Static hs As Hash String
  Split hs[] = t$, sep$, 10
EndProc

Although the hash variable is declared local, a Static variable is actually a global variable. They are treated the same as other global variables, only their visibility is limited to the procedure they are declared.

Note When asked for the variable address using VarPtr(hs) the address returned is located in the global data section of the program. VarPtr() does not return a stack address, the Static variable is not stored in the stack. For the duration of the program execution the static/global hash variable hs[] can not be changed by other code than the procedure it is declared in. Because a static variable is assigned to the program’s data section, the GB32 application will release the memory it allocated. The static hs[] variable is destroyed when the application quits. When a program is executed from within the IDE all globals will be destroyed and so will be the static hs[] hash variable.

Let us consider the example where the static/global hash variable is passed to Split. Upon entry Split will destroy any entries the hash variable references, the hash variable is destroyed. Than, a new hash table is allocated and its new pointer is stored in VarPtr(hs) + 0. The contents of  VarPtr(hs) + 4 remains unchanged, because it specifies the hash data type. When Split finishes the global/static variable hs[] points to a new hash table. Access to the elements in hs[] is now limited to the code between the Split command and EndProc. Since the hash isn’t destroyed when the procedure returns, the program now carries with it allocated memory that cannot be accessed until the next time the procedure is called. And when the procedure is executed the Split command immediately destroys the hash table. Instead, you could insert a Hash Erase at the end of the procedure, but what does that give you? A local hash variable that is stored as a global variable.

If you like to read more on hash variable, go to Passing a hash variable to a subroutine

More on local scope
Automatic destruction of variables with local scope is only supported for String, Variant, Object,  BSTR and arrays. When a local array is used the compiler inserts a call to CLEARARR(). The code responsible for clearing other type of local variables – a function called CLEARMULTI located in the runtime GfaWin23.Ocx – accepts a Long integer (4-bytes) where each byte contains the number of local variables of one specific type. The Long is coded like this: BBVVOOSS. The Lo byte contains the number of String variables to clean and the high byte of the lo-word contains the number COM objects to release. The hi-word contains the number of Variants and BSTR types.

Yes BSTR types. Unfortunately GB does not provide a data type BSTR we can use to store UNICODE strings. However, all COM objects that require a string use BSTR as a parameter type. When the compiler creates code to invoke a method or property taking a COM string as an argument, the ANSI string is converted to a BSTR before the method or property is called. When necessary the BSTR is stored in a hidden local variable and should later be destroyed.