This implementation is not W3C compliant because it lacks support for namespace functionality, entity references, DOCTYPE nodes, DTD default attribute values, and other peripheral functionality. The DOM_Node type and it's associated operations should work well however because what functionality is supported has been tested thoroughly.
The definitive information on the DOM is the collection of W3C recommendations which can be found at the below locations:
To build a document from scratch use the expression DOM_Implementation_createDocument(NULL, NULL, NULL) to create an empty document and add new nodes using DOM_Document_createElement, DOM_Document_createComment, etc with DOM_Node_appendChild, DOM_Node_insertBefore or similar. See the DOM_Implementation and DOM_Node interface documentation for details.
Memory Management
The DOM_DocumentLS_load, DOM_DocumentLS_read, and DOM_Document_createXxx functions allocate memory that must at some point be freed with DOM_Document_destoryNode. The DOM_Document_destroyNode function may be used to released nodes of all types such as DOM_Element, DOM_Text, DOM_Attr, DOM_Document. All children of a node are freed when the parent is freed. An entire document may be free with the expression DOM_Document_destroyNode(doc, doc). Beware that freeing a node that is still a decendant of another node will result in a tree with invalid pointers and will cause the program to crash when freed again. There are only two other special cases to consider. First, the DOM_Document_destroyNodeList function must be called for each DOM_NodeList returned by DOM_Element_getElementsByTagName and DOM_Document_getElementsByTagName. Second, the DOM_DocumentFragment node cannot be a child of another node. When added to the tree, it's children are actually moved into the target node leaving an empty DOM_DocumentFragment. This empty node must be freed with DOM_Document_destroyNode if it will no longer be used. For completeness, the DOM_DocumentEvent_destroyEvent function must be called to free DOM_Event objects however that non-core API is not yet documented here.
Only the DOM_Element node type has attributes. All other node types have a NULL attributes member. Child nodes are accessable through the childNodes DOM_NodeList member and the firstChild, lastChild, previousSibling, and nextSibling members. Not all element types have child nodes.
In DOMC node inheritance is emulated with simple typedef statements and a union that contains all possible subclass attributes. To access a child interface specific attribute it may be necessary to access it through this union. For example the systemId of a notation node is currently only accessible through the union like:
DOM_String *sysid; ... sysid = node->u.Notation.systemId;Care must be taken when modifing these union members (this is not well defined yet). Attributes accessible through the union that may need to be modified have helper methods to make this less awkward. The DOM_Node_setNodeValue function must be used to set the nodeValue member.
The all-important DOM_Node structure follows although some fields are left out in the interest of brevity. It may be necessary to look at this structure in the domc.h header.
struct DOM_Node { DOM_String *nodeName; DOM_String *nodeValue; unsigned short nodeType; DOM_Node *parentNode; DOM_NodeList *childNodes; DOM_Node *firstChild; DOM_Node *lastChild; DOM_Node *previousSibling; DOM_Node *nextSibling; DOM_NamedNodeMap *attributes; DOM_Document *ownerDocument; union { struct { DOM_DocumentType *doctype; DOM_Element *documentElement; DOM_String *version; DOM_String *encoding; int standalone; } Document; struct { DOM_NamedNodeMap *entities; DOM_NamedNodeMap *notations; DOM_String *publicId; DOM_String *systemId; DOM_String *internalSubset; } DocumentType; struct { int specified; DOM_Element *ownerElement; } Attr; struct { int length; } CharacterData; struct { DOM_String *publicId; DOM_String *systemId; } Notation; struct { DOM_String *publicId; DOM_String *systemId; DOM_String *notationName; } Entity; struct { DOM_String *target; DOM_String *data; } ProcessingInstruction; } u; };
The DOM specifications require support for entity references which may result in the childNodes of an attribute containing a potentially complex subtree of DOM nodes. DOMC currently has very weak support for entity references and as a result attributes will never have children. The default module for loading and storing XML documents uses the Expat XML parser which expands entity references by default. Expat recently added support for parsing external entities but DOMC does not yet use this functionalty.
The DOM recommendations specify that these lists are live meaning that modifying the children of a node should be reflected in a list returned by the getElementsByTagName functions. Currently DOMC does not update a DOM_NodeList returned by the getElementsByTagName functions if source nodes are subsequently removed or if a node is added that should be included.
Currently all of these functions set DOM_Exception if an error occurs however there is no return value to detect the error event. A future version of DOMC will likely return a value that indicates that an error has occured.
DOM specifications require that character data is UTF-16 encoded. DOMC does not support UTF-16. The locale dependant 8 bit encoding is used instead. This permits common char * strings to be used in place of DOM_String *. Many UNIX and Linux systems support the UTF-8 locale. If a DOMC program is running in a UTF-8 locale the offsets of these string operations refer to characters rather than bytes or individual multibyte sequences. Thus the behavior of these functions should be very similar or identical to that of a DOM implementation that uses UTF-16. Also note that UTF-8 support may be disabled for the sake of installation simplicity. It may be necessary to obtain the source code and rebuild DOMC if i18n support is required.