::::: : the wood : davidrobins.net

Microsoft Word COM dispatch rewrite

News ·Wednesday February 7, 2007 @ 20:20 EST (link)

I came up with two major Office 14 M0 (milestone 0) projects: one to remove our use of C's setjmp and longjmp (which were used as a poor man's exception; Word was only converted to C++ early in version 12, aka 2007), and another to clean up the object model dispatch, including allowing direct dispatch to Word object methods. The setjmp cleanup could have been trivially done by changing the jumps to exceptions, but that's not the Word way; they were all laboriously changed to return codes, propagated as far up as needed.

Word's object model begins with the GOD structure, or Generic Object Descriptor, a structure of pointers to useful information: first of all, the vtbl (virtual method table, containing pointers to all of the methods implemented by the interface), various other tables, and then the Generic, a structure that naturally became an object during the C++ conversion. The Generic implements the standard IUnknown COM methods AddRef, Release, and QueryInterface, as well as IDispatch, and various internal bookkeeping (for example, keeping track of "advise lists" so, for example, parents can be notified of child deletion and vice versa). It also stores a void pointer which user data can be hung off of, usually a reference to some internal Word structure. Care is taken so that when the item referenced goes away (for example, a document is closed), the object is notified and can mark itself deleted so that it doesn't try to use invalid pointers and crash.

There are several ways to implement COM interfaces; in fact, I wrote a work report about some of them for school, which received a Microsoft-sponsored award. The most common is probably to simply define the interface an an IDL (Interface Definition Language) file, and generate a C or C++ header with MIDL, the Microsoft IDL compiler. Word does have an IDL-type file, but it's not IDL, probably because it had an object model before IDL existed (I don't know, I wasn't there, but we do use our own tools), and we have our own compiler, OAC (probably Office Automation Compiler). In it, one can specify a handler for each method: either a specific function, or a generic handler, which is called in much the same manner as IDispatch::Invoke, i.e. with an array of parameters and flags, as well as method-specific parameters. Handlers are useful for forwarding interfaces to internal classes implemented elsewhere, or for wrapping commands (e.g. one class of handler saves the old selection, selects the object called on, runs a command specified by a handler parameter, and restores the selection).

However, there's a lot of duplicated code in function handlers; many functions extract the previously mentioned void pointer from the generic and invoke a class method; the compiler should be able to generate this code. So, I added support for a class_handler directive to OAC, which would dispatch COM methods to a similarly-named method in a given class, automatically casting the generic's user pointer to an object of that class:

Old:
interface IDShape
{
...
   properties
   {
      VT_FLOAT Width
      dispid 999
      get proc HrOamShapeGetWidth
   }
}

...

STDMETHODIMPC HrOamShapeGetWidth(IDShape *pidsp, float *pflWidth) { int fIDispatch = vrf.fIDispatch; int fObjModelCall = vfObjModelCall;

Generic *pgen = PgenFromPunk(pidsp); OASO *poaso = (OASO *)LpvFromGenericInstanceData(

HRESULT hr = poaso->HrGetWidth(pflWidth);

vrf.fIDispatch = fIDispatch; vfObjModelCall = fObjModelCall;

return HrRaiseExcepHr(pidsp, hr); }
New:
interface IDShape
class_handler OASO
{
...
   properties
   {
      VT_FLOAT Width
      dispid 999
      read_only
   }
}
And both need:
HRESULT OASO::HrGetWidth(float *pflWidth)
{
   *pflWidth = PtFromEmu(PspInternal()->flWidth());
   return S_OK;
}
(Some code facts and names have been changed, but it's a representative sample.)

You won't believe how many times the save/restore of the object model flag code and call to HrRaiseExcepHr (or variants) are repeated. That's one of the things that this new dispatch solves, by going through a common function, the new HrOaDispatch, rather than dispatching directly to the function in question (which didn't always happen; the old code had wrappers too, to setup undo, for example; it was very crusty). True, it does add the overhead of an extra call, but the common entry point makes it easier to ensure that calls are safe (for example, avoiding reentrancy from event callbacks). But getting direct dispatch to C++ methods was decidedly nontrivial.

Fortunately, I started with a fairly pathological class: the shape class, which has virtual bases (there are four combinations: old or new shapes, and inline or floating shapes, all inheriting from a common virtual base, and, for example, there is a "new shape" mid-hierarchy class that provides functionality for both inline and floating new shapes; a new floating shape inherits from the new shape and the floating shape class, both of which derive from a common base). If I can dispatch to this guy, everything else is easy.

To start with, we have to know what method we're dispatching. The old code did this with #defined stubs, but now with C++ we can use templates. What exactly do we put into the object's vtbl? The functional case is easy: just store the function pointer, although to provide a little more information (such as the COM class and method index, for other lookups), a template function is used that accepts these all as template parameters and passes them on to HrOaDispatch. HrOaDispatch then uses the information to figure out how large the stack parameters are for the final function, copies them to the stack (yes Virginia, in assembly), and calls the function, which, in usual stdcall fashion, cleans up the stack for us before it returns.

However, I didn't know at first how complex object model calls can be, especially with virtual bases involved. My first plan was to store the method address and then set up the stack as for function dispatch, and somehow call the method with assembly. Bad idea; the assembly dispatch code generated by the compiler is varied and complex. Instead I wrote a template to do the call: HrOaCallMethodC, HRESULT (C::*)()(C* pC, void *pStack, size_t cbStack), and stored the address of that templated function to use to dispatch the call to the C++ method (letting the compiler do the work of setting it), using #pragma pointers_to_members to ensure all virtual pointers were the same size as well as turning off runtime stack checking for that code (the compiler gets upset when a method that says it takes no parameters has actually consumed parameters).

This took a lot longer to figure out than it takes to write; one of the snags I faced was having to build a separate version of the precompiled headers (PCH) with the runtime checks disabled (disabling them for all of Word wouldn't be looked on favorably); figuring out a way to dispatch to a general method was tough too; there was a lot of casting and gnashing of teeth. But it's worth it to have cleaned up and improved some very old object model dispatch code.