09 March 2009
The Collection Ocx (part 3)
In part three of this series I will continue with the performance of a Collection Ocx. I will also point out why you might prefer a Hash over a Collection. Not only for its performance, but also for its additional features. A Collection is a Hash Remember the Collection Ocx data type is a simple COM wrapper of the Hash Variant data type. The Collection COM object provides a very thin wrapper, because adding items to a Collection Ocx is as fast as adding them to a Hash Variant type. This is not true for iterating using the For Each construct (or For i = 1 to Count) to access one item a time using a number for the index (not a string). Given a Collection c and iterating using For i = 1 ... takes 50% more time than on a Hash Variant. Dim i%, e As Variant, t# = Timer For i% = 1 To c.Count e = c(i) Next t# = Timer - t# Print "For i = 1 To c.Count on Collection:"; t# Given a Hash Variant a: Dim a As Hash Variant ' add a lot of items ... and then: t# = Timer For i = 1 To a[%] e = a[% i] Next t# = Timer - t# Print "For i = 1 To a[%] on Hash:"; t# The same is true when iterating is performed using For Each. The Hash data type is 50% faster in accessing an item by index. Why? To return an item from the Collection the .Item method is invoked taking a Variant as a parameter. The index variable i is first converted to a Variant and then passed to the Collection.Item method, where it is decoded to see if the argument is a string or a number... Hence the performance difference. COM makes things slow because it uses Variants. Using strings as the index The Collection data type is most useful when used as a dictionary type, where data is connected to a word (name, key, or whatever). The Collection is used to store and locate data by a string, the key. What happens to the performance when a key string is used to access an item? Remember that the .Item method takes a Variant as an Index argument. This means that each index, be it a number or a string is to be converted to a Variant before the .Item method is called. In case you use a string, it is converted to the COM Automation data type BSTR and stored in the Variant parameter. This conversion is performed at the callers side, at the code line that holds the string. Now, if GFA-BASIC 32 would use the normal COM procedures to convert a string to a Variant, it would call the OLE automation Variant APIs. However these APIs (in OLEAUTO.DLL) are slow! So, GFA-BASIC 32 implements its own Variant conversion routines (these routines aren't actually functions, but inlined code) which are many times faster. In general: For each GFA-BASIC 32 Ocx COM object, GFA-BASIC inserts fast Variant-conversion code when that Ocx object uses a Variant property or method argument, resulting in 30%-40% speed increase. Now stay with me. The Collection Ocx object receives the index as a Variant argument of the .Item method, which either contains a number or a string (BSTR). When the COM object receives a string as a Variant, it needs to convert it back to ANSI to perform the underlying call to the Hash table. See the introduction of this post. On the Ocx side in GfaWin23.Ocx, GFA-BASIC could have used the OLE APIs to convert the Variant back to an ANSI string. But again, each Ocx method taking a Variant uses the fast inlined conversion code. The Variant (BSTR) conversion to an ANSI string is as fast as it could be. (I call this a typical Frank Ostrowski optimization, which he didn't implement before GFA-BASIC 32 version 1.05.) GFA-BASIC 32 complies to COM rules, but provides fast optimizations when possible. Then there was another optimization possible with literal string keys on any collection ... Hash is always better But first, for the record. These string conversions don't apply to the Hash data type. The GFA-BASIC 32 Hash commands and operators take simple ANSI strings for the key arguments and don't need any conversion. That is why the Hash Variant is so much faster. By the way, the Collection COM object is specified by MS to provide only five properties/methods (Add, Count, Item, Remove, and the hidden _NewEnum for For Each). MS didn't honour requests to include a sort method, save and load methods, and key-to-index and index-to-key conversions. Well, GFA-BASIC 32 complies to the COM standard, but through the Hash data type it does provide all these options. VB programmers don't have these features! An additional optimization with ! Back to another optimization. In VB(A) a collection item is often accessed using the ! operator. GFA-BASIC supports the ! operator as well (for all collections). It works exactly the same as when a literal string is specified as the argument: v = c("billy") v = c!billy The ! is always used with a literal string, for instance, to access the item named "billy" in a collection c. When the compiler encounters a literal string for the key argument, a special GFA-BASIC 32 function is used to obtain the data from the Collection Ocx without the BSTR conversion. Rather than calling the .Item method of the Collection method directly, the .Item call is redirected to a special GFA-BASIC 32 routine that bypasses the BSTR conversion and directly invokes the 'get-item-by-key' function of the underlying hash table. This speeds up the item access about 30%, because of the lack of BSTR conversion. The speed optimization is only for the GFA-BASIC 32 Collection object. However, the ! syntax also requires less machine code to execute. It therefore also improves execution speed by reducing the generated assembler code by the compiler, the generated code is less by 20 bytes per call. Here all collections benefit from the ! operator, not just the Collection Ocx. Performance comparison taken from the ReadMe_105.html: OLE optimization compared to Hash Variant v = c("a") - 65 cycles/64 bytes (before version 1.05) v = c("a") - 43 cycles/40 bytes v = c!a - 43 cycles/40 bytes v = h["a"] - 30 cycles/26 bytes Conclusion 1. The Collection Ocx is a fast COM object based on a Hash table. 2. GFA-BASIC 32 optimized item access with all GFA-BASIC Ocx collection objects. 3. With literal strings as keys, the ! operator generates less code. 4. The Hash data type is faster and offers more features. Keep tuned, the next time I discuss a special sort of collection; the ListItems collection.