26 August 2017

A little array magic

Without going into a formal description of an array, we simply state an array stores multiple values of the same type in contiguous memory. In code an array is recognized by a variable name followed by parenthesis, either with or without indices. Like any other variable an array should be declared before it can be used. (Declaring a variable introduces a variable to the compiler.) Generally, a declaration specifies a name and a type. In case of an array the declaration may include values for lower and upper boundaries up to 7 dimensions.
Array declaration and dimensioning
An array declaration always results in the creation of an array-descriptor. For a global array the descriptor is added to the program’s global data section and for a local array the compiler inserts code to allocate an array descriptor dynamically.
' Declaration and allocation separated:
Global Dim a() As Long  ' adds descriptor to data
ReDim a(6)              ' code to allocate memory

' Declaration and allocation at once:
Global Dim b(3, 1) As String
The second declaration forces the compiler to add a descriptor to the global data and to generate code to allocate memory. It is the exactly the same as Global Dim b() As String : ReDim b(3,1).
A local array variable declaration is handled differently from a global declaration. First of all, the array is not assigned a static descriptor by the compiler. The declaration of the local array let the compiler insert code to obtain (or allocate) an array descriptor dynamically when the procedure is executed. The pointer to the descriptor is stored in a hidden local memory location on the stack of the procedure. Then the address of the descriptor is passed to the same ReDim to allocate memory for the array elements.
Proc LocalArr()  ' Naked forbidden
  Dim dum$       ' prevent compiler bug
  Dim h()        ' allocates a descriptor
  ReDim h(4, 5)  ' allocates memory for descriptor
  Dim v(3)       ' 1-step: descriptor + memory
EndProc          ' destruction for h() and v() and dum$

Local arrays have the same anatomy, but they have no descriptors in the global data-section. Both the descriptor and the memory are allocated – from the heap - when the subroutine is executed. Room is reserved on the procedure stack for a (hidden) pointer to store the address of the descriptor. Later this pointer is necessary to clean the local stack and call the array-destruction code when the subroutine is left.

Local Array Destruction
Local array destruction is part of the termination handler of the procedure, that is if it has one. A Naked procedure doesn’t include termination handlers; the procedure needs to clear pointer variables manually (= the developer). However, a local array cannot be destructed manually, there is no statement to do so. The obvious Erase would only release the data memory, not the array-descriptor, leaving it on the stack. Eventually, the stack might overflow when the procedure is executed repeatedly. 

  • An array cannot be destroyed explicitly, Clr doesn’t work with arrays (and hashes).
  • Local arrays are not allowed in Naked procedures.Naked prevents the compiler from insertion of destruction code for all pointer variables (String, Variant, Object).

Be aware of two possible problems
When a subroutine contains only one or more local array variables, without any other local variables of pointer types (String, Variant, Object), the compiler ‘forgets’ to insert the array-destruction code at all. This is a bug. In this specific situation it is necessary to force the compiler to add array-destruction code. This requires the introduction of another dynamic data-type that requires destruction code. A local String is the easiest solution as is demonstrated in the example above.( A bug still waiting for resolving ….)

The other problem involves ReDim, which - unlike Dim - does not default to the Option Base setting. Instead, ReDim always uses 0 as the lower bound of the array. When Option Base 1 is the default setting for your application, you need to use ReDim ar(1 .. x) explicitly, rather than ReDim ar(x).

- Important note on a Hash
A local Hash isn’t destroyed automatically as well (Naked or otherwise). Clr cannot be used and there is no way to force the compiler to insert Hash destruction code. All local Hash variables must be released manually using Hash Erase. You might want to use Static Hash for local variables. A Hash is a (relative) time consuming type, all entries of a Hash are released one by one. Static preserves the contents and prevents time consuming destruction.

Global Array destruction
GB implements hidden destruction for releasing arrays.  A local array is destroyed on exit and a global array when the program is terminated. For a global array the descriptor is static and part of the global data-section and is an inherent part of the program. After a program exits (either as an EXE or in the IDE) the global data-section simply disappears and the descriptors with it. In case of an EXE-process all memory is released to the OS, and in the IDE the global data is destroyed after ending the program (RUN). There is no cause for memory leak on global arrays.

Anatomy of an array
In GB32 an array is described using a variable name, a descriptor, and a piece of contiguous memory to store the array data. When the compiler hits on a global array declaration it will create a mapping between the variable name and an array-descriptor stored in the global data section. This is true for in-memory compiling and when an EXE is created. A local declaration introduces a mapping between a hidden local pointer variable (32-bits pointer) and the name. The hidden variable stores the pointer to the dynamically allocated array descriptor.
An array-descriptor is a structure defining the attributes of an array. This ArrayDesc - structure is defined like this (note how the last member reserves LBound/UBound information for a maximum of 7 dimensions):

Type ArrayDesc
  -Int    Magic         ' "arry" or "ArrY"
  -Int    ptype         ' vtType (internal const)
  -Int    size          ' size of datatype
  -Int    dimCnt        ' number of dimensions
  -Int    dimCnt2       ' # of dimensions     == IndexCount
  -Int    paddr         ' address of data     == ArrayAddr()
  -Int    corr          ' correction value
  -Int    paddrCorr     ' void* addrCorr;
  -Int    anzElem       ' number of elements  == Dim?()
  -Int    sizeArr       ' size in bytes       == ArraySize()
  -Int    Idx(7 * 3)    ' == LBound()/UBound()
EndType

For global and static arrays an instance of this structure is stored in the global data section. For local arrays the structure is allocated dynamically. Important to realize is that every declaration (Dim/Global/Local/Static) of an array immediately results in an array descriptor dimmed or un-dimmed. The values of the structure members determine the status of the array. The Magic member is for internal use, although it perfectly well indicates if an array is empty – Erase-d or an empty declaration. Other members can be retrieved using  the following functions.

FunctionMember ArrayDescDescription
Dim?(a()) anzElem (element count) Returns the number of elements in the array. Erase clears this value (sets to 0). One way to determine if an array has been ‘dimmed’.
IndexCount(a()) dimCnt2Returns the number of dimensions. Returns 0 when not ‘dimmed’. Another way to determine if an array is empty.
ArrayAddr(a()) paddrReturns the memory address of the first element of the array. Returns 0 if erased or not ‘dimmed’. Can be used to determine if an array is empty.
ArraySize(a()) sizeArrReturns the size of all elements in bytes. Returns 0 if array is empty.
LBound(a()[,i=1]) Idx[]Returns the lower bound for a dimension (default is 1). Raises an error when an array is empty.
UBound(a()[,i=1]) Idx[]Returns the upper bound for a dimension (default is 1). Raises an error when an array is empty.
  • Only LBound and UBound cannot be used to inquire for an ‘un-dimmed’ array.
  • For the special case OLE Automation array-type ParamArray, LBound and UBound return 0 and -1 respectively; these functions do not raise an error (VB compatibility). The ParamArray datatype is in fact nothing more than a Variant containing an OLE/COM SafeArray.
Functions and statements that do not apply to arrays
An array variable is treated differently from any other variable type. The array’s variable name cannot be used in any other GB32 functions and statements as other variables can. For instance, the Clr a() statement is forbidden, ArrPtr() function does not return the location of the array’s variable name, but the location of the array-descriptor instead. You cannot use Pointer to redirect an array variable name to another descriptor. TypeName(ar()) cannot be used to obtain the data type of the array. Etc.
  • Generally, all GB-functions and statements that use a variable name as an argument are forbidden for arrays.
When the compiler refers to an array it refers to the descriptor directly. The compiler doesn’t preserve a mapping between the array’s variable name and a particular location of a pointer as it does with Strings for instance. The generated code simply doesn’t ‘know’ the array name anymore, only the location of descriptor.
There is only one runtime function that accepts the address of a (local hidden) pointer containing the address of an descriptor: CLEARARR() the local array destructor. This function cannot be invoked manually – not even when using assembler, because the address of the hidden variable is unknown. Asm lea eax, ar will not work, it still returns the address of the descriptor.

No comments:

Post a Comment