Showing posts with label UNICODE. Show all posts
Showing posts with label UNICODE. Show all posts

07 December 2024

CreateObject peculiarities (2)

Lately I received some COM-automation related questions which I answered by referring to this blog post: CreateObject Peculiarities (part 1). It advised to use the CoCreateInstance() API with the IID_IUnknown parameter to connect to an automation server. However, after re-reading I realized I could have demonstrated how to use CoCreateInstance. Below is an example that replaces the GB32 CreateObject() function with the custom made CreateObject2() function. It provides the same functionality as CreateObject and more. Where CreateObject only returns an object if it supports the IDispatch inteface, the CreateObject2() function has an optional parameter that takes any interface you want to request from the server. By default, it returns an object that  supports the IDispatch interface, and if that fails it returns the IUnknown interface.

The code is heavily commented, so I hope you will be able to understand it. As an example, CreateObject2 is used to obtain the IDispatch interface of the Scripting's FileSystemObject.

$Library "gfawinx"
$Library "UpdateRT"
UpdateRuntime      ' Patches GfaWin23.Ocx

Dim FSO As Object
' CreateObject2(ClsID [, IID]) replaces CreateObject(ClsID).
Set FSO = CreateObject2("Scripting.FileSystemObject", IID_IDispatch)
MsgBox0("Successfully created object.")
' FSO object is released when it goes out-of-scope.

Function CreateObject2(ClassID As String, Optional IID_Interface As Long) As Object
  '-------------------------------------------------------------------
  '  Like CreateObject creates an Instance of an OLE Server.
  '  The ClassID argument creates a class & specifies the OLE Server.
  '  The IID_Interface argument specifies the interface to create.
  '  There are two formats for the ClassID class argument:
  '   1. PROGID: "Excel.Application"
  '   2. CLSID: "{00000010-0000-0010-8000-00AA006D2EA4}"
  ' If a ProgID is used, the client's registry is used to get the CLSID
  ' If the optional IID_Interface parameter isn't used, a reference
  ' to the IDispatch interface is created. If that fails, a reference
  ' to the IUnknown interface is created.
  ' The IID_Interface is pointer to GUID type, created by GUID command.
  '-------------------------------------------------------------------

  Dim wProgId As Variant = ClassID          ' to Unicode
  Dim Ptr_ProgID As Long = {V:wProgId + 8}  ' string address
  Dim HResult As Long           ' COM error code
  Dim ClsID_ProgID As GUID      ' GUID is built-in type
  Dim fAskDispatch              ' set if II_IDispatch is asked

  ' Get CLSID (GUID) type from either GUID$ or ProgID$
  If Left(ClassID) = "{" && Right(ClassID) = "}" && Len(ClassID) = 38
    HResult = CLSIDFromString(Ptr_ProgID, ClsID_ProgID)
  Else
    HResult = CLSIDFromProgID(Ptr_ProgID, ClsID_ProgID)
  End If
  If HResult != S_OK Then _
    Err.Raise HResult, "CreateObject2", "Wrong CLSID"

  ' The default is to ask for IDispatch (like CreateObject)
  If IID_Interface = 0 Then IID_Interface = IID_IDispatch
  ' Remember whether IID_IDispatch is asked
  fAskDispatch = IsEqualGUID(IID_Interface, IID_IDispatch)

  ' Create a single instance of an object (ClsID_ProgID) on
  ' the local machine that supports the requested interface.
  ' Store the instance in the local function-return variable.
  HResult = CoCreateInstance(ClsID_ProgID, Null, CLSCTX_ALL, _
    IID_Interface, V:CreateObject2)

  // Only if asked for IID_IDispatch and it failed, try IID_IUnknown
  If HResult == E_NOINTERFACE && fAskDispatch
    IID_Interface = IID_IUnknown
    HResult = CoCreateInstance(ClsID_ProgID, Null, CLSCTX_ALL, _
      IID_Interface, V:CreateObject2)
  EndIf

  // If all requests for an interface have failed raise error
  If HResult != S_OK Then _
    Err.Raise HResult, "CreateObject2", "No interface"

  ' ------------------------------------------------------
  ' Global declarations section
  ' ------------------------------------------------------
  Global Const E_NOTIMPL = 0x80004001
  Global Const E_NOINTERFACE = 0x80004002

  ' -------------------------------------------------------
  ' GUID Identifier = value    (global declaration command)
  ' Generates a pointer to a 128 bit memory block containing
  ' a GUID value. So, IID_IUnknown is a Long holding the
  ' address of the GUID value.
  ' -------------------------------------------------------
  GUID IID_IUnknown  = 00000000-0000-0000-c000-000000000046
  GUID IID_IDispatch = 00020400-0000-0000-c000-000000000046

  Global Enum CLSCTX, CLSCTX_INPROC_SERVER, CLSCTX_INPROC_HANDLER = 2, _
    CLSCTX_LOCAL_SERVER = 4, CLSCTX_REMOTE_SERVER = 16, _
    CLSCTX_SERVER = CLSCTX_INPROC_SERVER + CLSCTX_LOCAL_SERVER + _
    CLSCTX_REMOTE_SERVER, _
    CLSCTX_ALL = CLSCTX_INPROC_SERVER + CLSCTX_INPROC_HANDLER + _
    CLSCTX_LOCAL_SERVER + CLSCTX_REMOTE_SERVER

  ' -------------------------------------------------------
  ' CoCreateInstance note.
  ' The GB GUID is a pointer to a GUID type in memory.
  ' To make it possible to use a GB-GUID, we should adjust
  ' the Declare to receive a Long holding the address.
  ' -------------------------------------------------------
  Declare Function CoCreateInstance Lib "OLE32" _
    (ByRef rclsid As GUID, ByVal pUnkOuter As Long, _
    ByVal dwContent As Long, ByVal pIID As Long, _
    ByVal ppv As Long) As Long
  Declare Function CLSIDFromString Lib "OLE32" _
    (ByVal lpszCLSID As Long, pclsid As GUID) As Long
  Declare Function CLSIDFromProgID Lib "OLE32" _
    (ByVal lpszProgID As Long, pclsid As GUID) As Long
  Declare IsEqualGUID Lib "ole32" (ByVal prguid1 As Long, ByVal prguid2 As Long) As Bool
EndFunc

21 August 2019

Unicode controls

In the passed years I’m frequently asked for ‘Unicode support’ in GFA-BASIC 32. The issue here is that GB is an ANSI programming language; the IDE accepts only characters in the range from 0 to 255 and the string functions assume one byte per character. All commands and functions that accept a string parameter only take ANSI strings. However, it is possible to create UNICODE controls and let the user input text in the user’s locale setting and then retrieve the wide character text from the controls. To process the retrieved text the application will most likely use Windows API wide-string functions.

A few notes
An introduction to Unicode strings can be found in a previous post: Ansi and Unicode.

This blog post will discuss the use of Unicode (or wide character) controls on a GFA-BASIC 32 form, specifically on a Dialog form. The code is discussed in bits and pieces, but the code for the entire example program can be downloaded here.

Declaring wide API functions
When an application wants to use wide character controls the Ocx property and sub-event system cannot be used any longer. In addition, a dialog definition has to be set up in code, because the form-editor can no longer be used either. The controls have to be created and handled using Windows W-API functions. Windows defines both ANSI and Unicode variants for API functions that take a string parameter. Many of the ANSI APIs are built-in in GFA-BASIC, but the W variants are missing and have to be declared explicitly. Two wide char APIs an application will definitely use are CreateWindowExW and SendMessageW. They have to be declared explicitly (abbreviated):

Declare Function CreateWindowExW Lib "user32" (ByVal dwExStyle As Long,
Declare Function SendMessageW Lib "user32" (ByVal hWnd As Handle, …

Other possible declares are lstcmpW, lstrcmpiW, CharUpperW, and CharLowerW. To draw Unicode text on the screen the application needs to declare TextOutW and/or DrawTextW, etc.

  • Recommended: A full set of wide-string functions can be found in the Shell Lightweight Utility functions (SHLWAPI) DLL. The Include library does not provide an include file with Wide function declarations though!

Defining controls
We cannot use any predefined control and we cannot use the Control command to create a wide character control. We can however create a procedure ControlW that allows an easy translation of Control statements to Unicode controls. Because we will use a W variant of the Control command, an easy way to add controls is by using a dialog box (which is a Form). This also allows us to use an external dialog box editor. The following piece of code is created using ResHacker, a GUI utility that provides the ability to create a dialog box definition. After copying and pasting the definition into the GB editor the command Control is replaced by ControlW. Note that ResHacker produces a dialog definition with dialog base units rather than pixels. Also, the dimension of the controls may need some editing once the dialog is used in GB. The WS_CHILD | WS_VISIBLE styles can be removed as well.

DlgBase Unit
Dlg 3D On
Dialog # 1, 0, 20, 261, 140, "Controls", WS_SYSMENU | WS_CAPTION
  ControlW "&OK", 1, "BUTTON", BS_DEFPUSHBUTTON | 
WS_CHILD | WS_VISIBLE | WS_TABSTOP,
130, 97, 50, 11 ControlW "&Cancel", 2, "BUTTON", BS_PUSHBUTTON |
WS_CHILD | WS_VISIBLE | WS_TABSTOP,
187, 97, 50, 11 ControlW "Checkbox", 10, "BUTTON", BS_AUTOCHECKBOX |
WS_CHILD | WS_VISIBLE | WS_TABSTOP,
7, 8, 60, 14 ControlW "Group", 0, "BUTTON", BS_GROUPBOX |
WS_CHILD | WS_VISIBLE,
7, 23, 59, 47, WS_EX_TRANSPARENT ControlW "Radio 1", 12, "BUTTON", BS_RADIOBUTTON |
WS_CHILD | WS_VISIBLE | WS_TABSTOP,
12, 36, 43, 14 ControlW "Radio 2", 13, "BUTTON", BS_RADIOBUTTON |
WS_CHILD | WS_VISIBLE | WS_TABSTOP,
12, 51, 43, 14 ControlW "Trackbar", 15, "msctls_trackbar32", TBS_HORZ |
WS_CHILD | WS_VISIBLE | WS_TABSTOP,
7, 77, 60, 18 ControlW "Insert Text:", 0, "STATIC", SS_LEFT |
WS_CHILD | WS_VISIBLE | WS_GROUP,
82, 10, 45, 10 ControlW "", 17, "EDIT", ES_LEFT | ES_MULTILINE |
WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP,
122, 7, 116, 14 EndDialog

The ControlW procedure creates the wide control. The string input parameters are ANSI strings that are converted to Unicode before the CreateWindowsExW is invoked. Since the dialog definition uses dialog box units the Dlg Base Units command is added. This command initializes a few global variables in the runtime. The ControlW procedure tests if Dlg Base Units is used and if it is used converts the coordinates from dialog box units to pixels.

Proc ControlW(text$, id%, class$, style%, x%, y%, w%, h%, Optional exstyle%)
  Local hWnd As Handle, pText As Long
  If Len(text$) Then text$ = Wide(text$) : pText = V:text$
  class$ = Wide(class$)
  style% |= WS_CHILD | WS_VISIBLE
  If {$180B5F70} %& 1    ' DlgBase Units?
    x% = MulDiv(x%, {$180B5E98}, 4)
    y% = MulDiv(y%, {$180B5E94}, 8)
    w% = MulDiv(w%, {$180B5E98}, 4)
    h% = MulDiv(h%, {$180B5E94}, 8)
  EndIf
  hWnd = CreateWindowExW(exstyle%, V:class$, pText, style%, x%, y%, w%, h%, Me.hWnd, id%, _INSTANCE, 0)
  If hWnd _
    SendMessageW(hWnd, WM_SETFONT, Me.Font._hFont, 1)
EndProc

The controls are created using pure Windows API, this also means that the controls have to be initialized and modified by sending messages. You will need proper documentation to know which message and how to send it to the controls. The controls are not OCX controls and don’t respond to notification messages through an event sub. The notification messages from the controls come either in WM_COMMAND or WM_NOTIFY message. The application needs to process these messages the ‘API-way’.

Notes on using a dialog box
An advantage of using a Dialog form is the presence of properties and event-subs. To respond to control-messages a Dlg_n_Message sub is all that is needed. A disadvantage of using a Dialog form is the lack of Unicode support for the title of the ANSI-based dialog box. One solution could be to add an informative picture along the top (caption) of the dialog form. This would require a simple LoadPicture and PaintPicture sequence of commands.
In addition, ANSI and Unicode controls can not be used together, that would break the Tab-key navigation. Even worse, the navigation with Unicode controls differs from the navigation with ANSI controls. This means commands like Sleep and PeekEvent will mess up the key-navigation. A work-around is to trap the Tab- and arrow keys in the Screen_KeyPreview event sub and call IsDialogMessage ourselves.

Sub Screen_KeyPreview(hWnd%, uMsg%, wParam%, lParam%, Cancel?)
  Dim msg As MSG
  If GetForegroundWindow() = Dlg_1.hWnd
    msg.hwnd = hWnd%
    msg.MessageVar = uMsg%
    msg.wParam = wParam%
    msg.lParam = lParam%
    Cancel? = IsDialogMessage(Dlg_1.hWnd, msg) == 1
  EndIf
EndSub

Cancel is set to True when IsDialogMessage handled the key. This prevents the handling of the navigation key in commands like Sleep and PeekEvent.
IsDialogMessage is always called as part of the message handling commands and IsDialogMessage processes the key when the form contains at least one control (might be a toolbar or statusbar). It seems the GB application ‘eats’ the keypresses if you’re not aware of this behavior.

Another issue is the way the focus is handled in a form with controls. The application should always explicitly set the focus to a control before entering the main message loop. If it doesn’t the focus might not be set correctly when a navigation key is pressed or when the application is reactivated.

Processing Unicode strings
The ControlW custom procedure takes an ANSI string for the control text. However, the program needs to set the controls text using Unicode instead. Normally, a program assigns hard coded text to a control, but the text in the IDE is limited to ANSI characters. Somehow the text must be obtained from a Unicode source that can be used as literal strings. Because it is (almost) impossible to specify Unicode strings in code directly, strings have to be obtained from an external source. This is possible with the use of an editor that can save Unicode strings. For this example I used NotePad2 that can save Unicode strings by setting the Encoding in the File menu to Unicode. In the GB code I defined constants with the index of the strings after they have been loaded into an array. These lines can be found at the start of the example program:

Dim T$$()   ' storage for UNICODE strings
Enum wsHello, wsGFABASIC
LoadWStrings("unicode.txt", T$$())

The procedure LoadWStrings loads the Unicode text lines into the array T$$(). The double $ is used to indicate that the string array variable contains wide character strings.

Now, after the dialog box has been created, but before it is displayed, the text of the wide controls can be modified using a string from T$$(). For this to happen the program includes a SetW procedure which assigns a Unicode string to a window. In the same style a GetW function returns a Unicode string from a window.

Function GetW(ByVal hwnd As Handle, Optional InclTerm As Bool = False) As String
  Local size As Long, sBuf As String
  size = SendMessageW(hwnd, WM_GETTEXTLENGTH, 0, 0)
  If size
    size++     ' also obtain the terminating null bytes
    sBuf = String(size Mul 2, #0)
    SendMessageW(hwnd, WM_GETTEXT, size, V:sBuf)
    GetW = InclTerm ? sBuf : Left(sBuf, Len(sBuf) - 2)
  EndIf
EndFunc

Proc SetW(ByVal hwnd As Handle, ByVal wTxt As String)
  If Right(wTxt, 2) != #0#0 Then wTxt += #0#0
  SendMessageW(hwnd, WM_SETTEXT, 0, V:wTxt)
EndProc

For instance, to set the text of the wide EDIT control in the dialog box to Hello:

SetW Dlg(1, 17), T$$(wsHello)

By default GetW returns a Unicode without the terminating two null bytes. However, if a Unicode string is later to be passed to a Windows function, the string is expected to end with two terminating null bytes. So, it depends on the purpose of the string whether or not the string should include the terminating zeros. GetW can return the Unicode string with the terminating bytes as well.
As an example the program also provides a way to compare Unicode strings using Windows API functions:

Function StrCmpW(ByVal str1 As String, ByVal str2 As String, 
Optional ignorecase As Bool) As Bool If Right(str1, 2) != #0#0 Then str1 += #0#0 If Right(str2, 2) != #0#0 Then str2 += #0#0 If ignorecase StrCmpW = lstrcmpiW(str1, str2) == 0 Else StrCmpW = lstrcmpW(str1, str2) == 0 EndIf EndFunc

The function StrCmpW is wrapper around the declared lstrcmpiW and lstrcmpW APIs. Before these APIs are invoked the strings are tested for the two terminating null bytes. If they are missing the strings are modified. For example, to test if the edit-control holds the word ‘hello’ the following code might be used:

wTxt = GetW(Dlg(1, 17))
If StrCmpW(wTxt, T$$(wsHello), True)
  MsgBox "Edit control specified hello"
EndIf

Summary
A GFA-BASIC application can provide Form-based Unicode controls. The IDE does not allow Unicode literal strings so they must come from an external source. In addition, many other commands like MsgBox, Dlg Open/Save require ANSI strings, so the application must use the appropriate wide Windows API functions. To be fully Unicode the application should be created using wide character API functions entirely.

12 May 2017

ANSI, UNICODE, BSTR and converting

Update 28-06-2017 - Conversion from Ansi to Unicode: WStr() function.
For some reason the WStr() routine contained a stupid bug (that I now have fixed).

More info: The number of bytes to read from a BSTR-address was wrong. GFA-BASIC always uses the SysAllocStringLen(Null, lenbytes) when allocating COM String memory. The BSTR returned is preceded by a 32-bits value specifying the BSTR's number of bytes, not the number of characters! This is exactly the value needed when reading the BSTR-bytes into a String datatype using StrPeek(). So, the function should have been: StrPeek(BSTR, {BSTR-4}), see the updated function below.

Another point of confusion was about the number of terminating null-bytes that WStr() returned. The StrPeek() function in WStr() only copies the UNICODE characters from the BSTR to a String, without the two null-bytes that secretly follow a BSTR string. As a result, the UNICODE characters copied to the String datatype are followed by only one (1) null byte; the terminating null byte that each String secretly gets.
When a String of UNICODE characters is to be passed to a Wide API function, two null-bytes must be added 'manually'.
w$ = WStr("GFABASIC") + #0#0 ' assign two nullbytes

The post as it was:
In the previous post I discussed UNICODE versus ANSI in the ANSI-based GFA-BASIC. Basically, GB doesn’t support UNICODE because it expects 1-byte characters where strings are used. In UNICODE each character occupies 2 bytes and allows more than 256 characters. Conversion ANSI to UNICODE is ok, but conversion from UNICODE to ANSI might lead to a loss of characters with a value above 256. But there is more: Variants and BSTRs.
The introduction of COM in GB required the provision of a new data type, the Variant. The Variant is a 16-byte data type that holds data and a value that specifies the type of that data(LONG, CARD, DOUBLE, etc). A Variant can also be used to store (safe-) arrays, a specific COM array type, and BSTRs, special UNICODE strings. So to understand the String and BSTR/Variant in detail ….

How a String is stored
Because a BSTR is much like a GFA-BASIC String data type, I’ll first tell how a GB String is stored. You could skip this part if you already know.
Declaring (Dim) a String-variable introduces a name for a location. The String-variable itself requires four bytes to store a pointer to dynamically allocated memory for the characters. The declaration and assigning a location is handled by the compiler, the rest happens at runtime: assigning or initializing. When the String-variable is initialized a call to malloc() reserves memory for all its characters with an additional 5 bytes. The first 4-bytes are reserved to store the length of the string and the last byte for the null-byte (not included in the length value). After allocating and copying the characters, the address of the first character of the string is stored at the variable’s location, a 32-bits address or pointer.
Global a$       ' 32-bits location(=0) in data or stack
a$ = "GFABASIC" ' assign pointer (address) to location
l = Len(a$)     ' address <> 0 return length {address-4}
Clr a$ : a$= "" ' free memory, set locations to 0
- String in memory: [xxxx|cccccc…c|0]
- Initially, the variable is a null pointer, the contents of the variable’s location is 0.
- String variable points the address of the first character c.
- Length is stored in position address – 4, and does not include the terminating zero.

Obtaining the string’s length is a 2-step process. First the variable is tested for a non-null pointer and than the value of the preceding 4 bytes (string-address – 4) is returned.
- Clearing a string (or assigning an empty string “”) will free the allocated memory and reset the variable’s contents to 0.

BSTR in GB
GB does not provide a data type BSTR, but it provides limited support of hidden BSTRs to pass and obtain BSTR-strings to and from COM objects. GB handles the conversion and memory allocation for BSTRs, but it does not provide string-manipulation functions for BSTRs, or even BSTRs in Variants. More on this below.
BSTR variables are always temporary, hidden local variables used to communicate with COM properties/methods that take or return BSTR arguments. These hidden BSTR variables are always destroyed when leaving a subroutine. Even the Naked attribute won’t prevent the inclusion of the termination code.
BSTR strings are COM based strings. They are allocated from COM-memory and consequently the memory can be managed by both the provider of the COM-object provider and the client. That is the first difference. Next a BSTR contains UTF-16 coded wide characters, which I discussed in ANSI and UNICODE. The way COM stores a BSTR is much the same as GB stores a String variable. In fact, a BSTR is 32-bits location that stores a pointer to dynamically allocated memory with UNICODE formatted characters. The length of the BSTR is stored In front of the BSTR, again like GB’s String data type.

Use Variant for BSTR
Although, GB provides hidden support for BSTRs, the only way to get access to a BSTR is by using a Variant. The following example assigns a GB-String to a Variant. At runtime the code allocates a BSTR by calling SysAllocStringLen(0, Len(GB-String)) followed by copying the converted GB-String to the returned address. The address of the BSTR together with its data type is stored in the Variant. When the Variant variable goes out of scope, the BSTR from the Variant is released through a call to SysFreeString(address).
Dim vnt1 = "Hello"
Now it gets interesting. After GB invoked the SysAllocStringLen() COM API, it converts the ANSI string to UNICODE using a private conversion routine interspersing zero’s between the characters see ANSI and UNICODE. GB does not turn to the MultiByte*() APIs Windows provides, because GB supports ANSI characters only. In the conversion process to UNICODE no characters will be lost and the private function is extremely fast.
An optimized UNICODE conversion function
This knowledge makes it possible to obtain a UNICODE-string (not a BSTR) from a String argument through our own optimized conversion routine. Note
  • A UNICODE string is required if you want to use the Wide version APIs.
  • A UNICODE string does not have a length field in front of it. It is not a BSTR. It only specifies how much bytes a character occupies (2).
  • It’s memory is managed by the program through malloc() – no COM memory - and it ends with two null-bytes (although it seems 1 is ok as well).
  • The converted ANSI argument is placed in a String only because it is a convenient data type to store consecutive data.
The function makes use of the BSTR allocation and conversion functionality of the Variant.
(The $Export is there because it comes from a .lg32 file).
Function WStr(vnt As Variant) As String Naked ' Return UNICODEd string
  $Export Function WStr "(AnsiString) As String-UNICODE Naked"
  Dim BSTR As Register Long
  BSTR = {V:vnt + 8} ' BSTR address at offset 8
  Return StrPeek(BSTR, {BSTR - 4}) ' <- 28-06-2017="" font="" updated="">
EndFunc
1. A function very well suited for the Naked attribute, because it does not contain local variables that contain dynamically allocated memory that would otherwise require explicit release code.
2. The argument of the function is ByVal As Variant. This forces the caller (calling code) to create a Variant and than pass it by value by pushing 16-bytes (4 DWords) on the stack. Whether the Variant is passed by value or by reference, the calling subroutine is responsible for freeing the BSTR stored in the Variant. However, ByVal is interesting because …
3. The GFABASIC-compiler provides a hidden optimization when you pass a literal string to a ByVal As Variant. A ByVal Variant requires16 bytes to push on the stack, but the UNICODE characters the Variant points to are already converted at compile time. Therefor the following call is extremely efficient:
Dim t$ = WStr("GFABASIC")
The GFA-BASIC compiler stores the literal string “GFABASIC” as a UNICODE sequence of bytes (2 per character) and does not need to allocate (COM) memory and convert at runtime. This also relieves the caller from releasing the BSTR-COM-memory, so the calling function doesn’t need to execute Variant destruction code.
Assigning a UNICODE formatted string this way, is almost as efficient as initializing a String with an ANSI literal string. It only takes a few cycles to call and execute the WStr() function.
4. The caller provides the String variable to store the return value of the function. That is the function’s ‘local variable’ WStr is silently declared in the calling subroutine. The hidden string is passed as a ByRef variable to the function. The return value (String) is directly assigned to the hidden variable. If an exception would occur in function Wstr() the termination code of the caller will release the hidden WStr string variable. (Therefor Naked is perfect for this function: it doesnot need to provide explicit release code.)
5. Inside the function you can see two more optimizations. First the local Long variable that stores the address of BSTR is a register variable; no stack memory and copying required. The other optimization is the Shl 1 expression that multiplies the length of the BSTR by 2. This results in an integer asm add eax, eax instruction, rather than a floating point multiplication. Also a significant optimization.
6. Other mathematic operations like V:vnt+8 and BSTR-4 are relative address operations and are properly compiled into indirect addressing instructions. So, no chance here to optimize.
I went in some detail to explain the function hoping you’ll find it useful. I hope to tell more about the way the compiler constructs subroutines and performs optimizations.

10 May 2017

ANSI and UNICODE

Updated 21-05-2017: Sample code at the end of the post.

GFA-BASIC 32 only supports ANSI strings, not UNICODE… What exactly does that mean?

ANSI-strings consist of a sequence of bytes – the characters of a string – where each byte represents a character. This allows for 256 different characters because a byte can contain a value between 0 and 255. Restricting strings to bytes limits the number of possible – mostly for not western languages – characters. To allow for more characters each character in a string must somehow occupy more than one byte. In Windows, each Unicode character is encoded using UTF-16 (where UTF is an acronym for Unicode Transformation Format). UTF-16 encodes each character as 2 bytes (or 16 bits). UTF-16 is a compromise between saving space and providing ease of coding. It is used throughout Windows, including .NET and COM.

In UNICODE the lower 256 values represent the same characters as in ANSI, but they are stored as a sequence of 16-bits integers. Additional characters are represented with higher values above 256. In UNICODE the first 256 characters have the same value as in ANSI, but each character requires 2-bytes of storage. When you convert an ANSI string to UNICODE it becomes twice the size of the ANSI string.
Let’s see what this means from a GFA-BASIC perspective.

ANSI in a GB String
When you store a literal string like “GFABASIC” in a String (ANSI, 1-byte representation), the string is filled with 8 bytes of (hexadecimal) values 47 46 41 42 41 53 49 43.

a_t$ = "GFABASIC"   ' 47 46 41 42 41 53 49 43

The same string can be created by using Chr$() and populate these byte values. (A more general approach would be to use the Mk1$() function):

a_t$ = Chr($47, $46, $41, $42, $41, $53, $49, $43)
a_t$ = Mk1($47, $46, $41, $42, $41, $53, $49, $43)

GFA-BASIC’s string functions expect ANSI strings only, and by default GB only communicates with the ANSI version of the Windows API functions. With a little knowledge you can do more.

Windows APIs are UNICODE
Windows is an UNICODE system. When a Windows API takes a string as an parameter, Windows always provides two versions of the same API. It provides an API for ANSI stings and an API for UNICODE strings. To differentiate between ANSI and UNICODE respectively, the names of the API function either ends with uppercase A - for ANSI parameters - and uppercase W for the version that accepts or expects UNICODE. A typical example would be the  SetWindowText() API which comes in two flavors SetWindowTextA() and SetWindowTextW().
The GFA-BASIC’s built-in APIs are the ones that map to the functions that end with A. So the GB function SetWindowText() maps to the SetWindowTextA() function.

UNICODE in a GB String
By default, when you declare a literal string in your source code, the compiler turns the string's characters into an array of 8-bit data types, the String. You can not – in the same way - declare a literal UNICODE string. To assign a sequence of 2-byte characters you’ll need to use different methods. For instance by populating a String by hand. In the example above it only takes one change to create a UNICODE array of characters. Simply change the Mk1() function to Mk2():

u_t$ = Mk2($47, $46, $41, $42, $41, $53, $49, $43) + #0

Now each character occupies 2 bytes and has become UNICODE formatted, because it encodes each character using UTF16, interspersing zero bytes between every ASCII character, like so

u_t$ = Chr($47,0, $46,0, $41,0, $42,0, $41,0, $53,0, $49,0, $43,0) + #0

A GB String data type always adds a null-byte (only one) to zero-terminate the sequence of characters. Since the above assignments are GB controlled, the strings end with only one null-byte. UNICODE should end with two null-bytes. You should explicitly add an additional null at the end of the string to properly create a UNICODE string.

UNICODE is not BSTR
Note that we simply created a piece of memory to store characters in 2-bytes rather than in 1-byte. The String memory is allocated from the program’s global heap and this memory is only guarded by GB. Although the string contains UNICODE it is not a BSTR. A BSTR is a COM defined string type and is allocated from COM-memory. Both the client (a GB-program) and the provider/server have access to the same COM-memory.
When a string is assigned to a Variant, which supports BSTRs only, GB allocates COM string memory and converts the ANSI string to UNICODE.

Using pure UNICODE
The GFA-BASIC string-functions use a 1-byte character indexing system. However, you can overcome this limitation for 2-byte formatted strings and apply GB String-functions when you multiply the index and length parameters by 2. For instance:

u_t$ = Left(u_t$, ipos * 2) + #0
u_t$ = Mid(u_t$, ipos * 2, nBytes * 2) + #0

You can pass these UNICODE formatted strings to APIs that end with uppercase W. To introduce the wide character APIs to your code you must Declare them explicitly. For instance, this code displays u_t$ in the client area of a window.

Declare Function TextOutW Lib "gdi32.dll" Alias "TextOutW" ( _
  ByVal hdc As Handle,        // handle to DC _
  ByVal nXStart As Int,       // x-coordinate of starting position _
  ByVal nYStart As Int,       // y-coordinate of starting position _
  ByVal lpwString As Long,    // character string _
  ByVal cbString As Long      // number of characters _
  ) As Long

Form frm1
TextOutW(frm1.hDC, 1, 1, V:u_t$, Len(u_t$) / 2)

Remember one thing. Windows uses UNICODE only, including fonts. Whether you use TextOutW or TextOutA (as Text does), all output is performed using UNICODE fonts. The TextOutA first converts the text to UNICODE and than invokes TextOutW. By providing a UNICODE formatted to a W-version API only skips the conversion from ANSI. See below for an example.

Obtaining UNICODE text from Windows APIs
Since XP, all Windows APIs taking or returning a string parameter are implemented in UNICODE only. The ANSI version of these functions translate (or convert) the ANSI strings to and from UNICODE format. Well, GB only handles ANSI strings; it passes and retrieves ANSI strings to and from Windows APIs. What is the consequence of this restriction?

When an ANSI string is passed to an A – version of an API, the Windows API will convert the string to UNICODE and than invoke the W-version of that API. There is no loss of information in this conversion. All ANSI characters are converted to UNICODE by expanding the string with zero’s as explained above. The string-size is doubled, but that’s all.

The other way around is more problematic. A Windows API may return or provide a UNICODE formatted string containing non-ANSI characters, characters with a 2-byte value above 256 … When the A-version of the API is used to retrieve text, Windows will do the UNICODE-to-ANSI conversion on behalf of the A-version of that API and the characters with a higher value of 256 will be lost.
This won’t be a problem if the ANSI-based GFA-BASIC program is used in languages no other than Latin (English) alphabets. In other languages the Windows system accepts more characters and the text won’t be properly returned to the GFA-BASIC String data type.
When your program needs UNICODE input or use UNICODE strings you should explicitly declare all the required wide APIs. In addition, you might also need W replacements for the GDI text-out functions. To use the GB string functions, you should remember to multiply or divide all integer arguments with 2.

Displaying UNICODE glyph characters (updated 21-05-2017)
Windows 10 includes and uses a new graphical font: Segoe MDL2 Assets. This sample shows how to obtain the glyphs form the font icons for use in GB.
In the accessory Special Characters select Segoe MDL2 Assets and than select a graphical character. Write down the 16-bit value from the box at the bottom and assign it to a String. Here the value for the picture for saving is 0xE105.

image

Form frm1
ScaleMode = basPixels   ' by default
SetFont "Segoe MDL2 Assets"
' Display UNICODE string "GFABASIC"
Dim u_t$ = Mk2($47, $46, $41, $42, $41, $53, $49, $43) + #0
TextW 1, 1, u_t$

' Get a Picture Object from a glyph.
' Char-value from 'Special Characters' Accesorry
Dim hBmp As Handle, p As Picture
Dim size As SIZE
u_t$ = Mk2(0xE105)          ' the Save-glyph
TextW 1, 31, u_t$           ' show it
TextSizeW(Me.hDC, V:u_t$, Len(u_t$) / 2, size)
Get 1, 31, 1 + size.cx, 31 + size.cy, hBmp   ' a GDI-handle
Put 50, 1, hBmp             ' and test it
Set p = CreatePicture(hBmp, True)  ' into a Picture
PaintPicture p, 70, 1       ' and test it
Do
  Sleep
Until Me Is Nothing

Proc TextW(x As Int, y As Int, wstr As String)
  ' Assume Scalemode = basPixels, ScaleLeft=0, and ScaleTop=0
  TextOutW(Me.hDC, x, y, V:wstr, Len(wstr) / 2)
  ' If AutoRedraw == True draw on bitmap.
  If Me.hDC2 Then TextOutW(Me.hDC2, x, y, V:wstr, Len(wstr) / 2)
EndProc

Declare Function TextOutW Lib "gdi32.dll" Alias "TextOutW" ( _
  ByVal hdc As Handle,         // handle to DC _
  ByVal nXStart As Int,        // x-coordinate of starting position _
  ByVal nYStart As Int,        // y-coordinate of starting position _
  ByVal lpwString As Long,     // character string _
  ByVal cbString As Long       // number of characters _
  ) As Long
Declare Function TextSizeW Lib "gdi32.dll" Alias "GetTextExtentPoint32W" ( _
  ByVal hdc As Handle,        // handle to DC _
  ByVal lpString As Long,     // text string _
  ByVal cbString As Int,      // characters in string _
  ByRef lpSize As SIZE        // string size _
  ) As Long
Type SIZE
  - Long cx, cy
EndType

A few notes about this sample (compared to the previous version).

  1. The Segoe MDL2 Assets font is not a fixed-sized font (the LOGFONT member lfpitchAndFamily is not FIXED_PITCH). However, the glyphs in the font all have the same format. To obtain the size of a glyph-character we cannot use the ANSI GB functions TextWidth() and TextHeight(), since they cannot return the size of a 2-byte character. Therefor the inclusion of the TextSizeW() function.
  2. To conform to GB’s scaling the TextOutW function should take coordinates in the current ScaleMode and the text-output should obey the ScaleLeft and ScaleTop settings. In this sample TextW simply draws on a pixel resolution scale and relative to (0,0), located at the top-left of the client area. Note however that Get and Put actually use the current scaling. Be sure to use the same ScaleMode for both GB commands as API functions. (As long as B= basPixels (default scaling in GFA-BASIC, VB uses twips, do not confuse the both).

Finally, the return values of ScaleLeft and ScaleTop are wrong (al versions below Build 1200). Hope to update the GfaWin23.ocx as soon as possible).