634 lines
18 KiB
HTML
634 lines
18 KiB
HTML
<html>
|
|
<body>
|
|
|
|
<h1 align='right'><a name='ADVANCED'>Chapter 3 - More Mini-XML
|
|
Programming Techniques</a></h1>
|
|
|
|
<p>This chapter shows additional ways to use the Mini-XML
|
|
library in your programs.</p>
|
|
|
|
<h2><a name='LOAD_CALLBACKS'>Load Callbacks</a></h2>
|
|
|
|
<p><a href='#LOAD_XML'>Chapter 2</a> introduced the <a
|
|
href='#mxmlLoadFile'><tt>mxmlLoadFile()</tt></a> and <a
|
|
href='#mxmlLoadString'><tt>mxmlLoadString()</tt></a> functions.
|
|
The last argument to these functions is a callback function
|
|
which is used to determine the value type of each data node in
|
|
an XML document.</p>
|
|
|
|
<p>Mini-XML defines several standard callbacks for simple
|
|
XML data files:</p>
|
|
|
|
<ul>
|
|
|
|
<li><tt>MXML_INTEGER_CALLBACK</tt> - All data nodes
|
|
contain whitespace-separated integers.</li>
|
|
|
|
<li><tt>MXML_OPAQUE_CALLBACK</tt> - All data nodes
|
|
contain opaque strings ("CDATA").</li>
|
|
|
|
<li><tt>MXML_REAL_CALLBACK</tt> - All data nodes contain
|
|
whitespace-separated floating-point numbers.</li>
|
|
|
|
<li><tt>MXML_TEXT_CALLBACK</tt> - All data nodes contain
|
|
whitespace-separated strings.</li>
|
|
|
|
</ul>
|
|
|
|
<p>You can provide your own callback functions for more complex
|
|
XML documents. Your callback function will receive a pointer to
|
|
the current element node and must return the value type of the
|
|
immediate children for that element node: <tt>MXML_INTEGER</tt>,
|
|
<tt>MXML_OPAQUE</tt>, <tt>MXML_REAL</tt>, or <tt>MXML_TEXT</tt>.
|
|
The function is called <i>after</i> the element and its
|
|
attributes have been read, so you can look at the element name,
|
|
attributes, and attribute values to determine the proper value
|
|
type to return.</p>
|
|
|
|
<!-- NEED 2in -->
|
|
<p>The following callback function looks for an attribute named
|
|
"type" or the element name to determine the value type for its
|
|
child nodes:</p>
|
|
|
|
<pre>
|
|
mxml_type_t
|
|
type_cb(mxml_node_t *node)
|
|
{
|
|
const char *type;
|
|
|
|
/*
|
|
* You can lookup attributes and/or use the
|
|
* element name, hierarchy, etc...
|
|
*/
|
|
|
|
type = mxmlElementGetAttr(node, "type");
|
|
if (type == NULL)
|
|
type = mxmlGetElement(node);
|
|
|
|
if (!strcmp(type, "integer"))
|
|
return (MXML_INTEGER);
|
|
else if (!strcmp(type, "opaque"))
|
|
return (MXML_OPAQUE);
|
|
else if (!strcmp(type, "real"))
|
|
return (MXML_REAL);
|
|
else
|
|
return (MXML_TEXT);
|
|
}
|
|
</pre>
|
|
|
|
<p>To use this callback function, simply use the name when you
|
|
call any of the load functions:</p>
|
|
|
|
<pre>
|
|
FILE *fp;
|
|
mxml_node_t *tree;
|
|
|
|
fp = fopen("filename.xml", "r");
|
|
tree = mxmlLoadFile(NULL, fp, <b>type_cb</b>);
|
|
fclose(fp);
|
|
</pre>
|
|
|
|
|
|
<h2><a name='SAVE_CALLBACKS'>Save Callbacks</a></h2>
|
|
|
|
<p><a href='#LOAD_XML'>Chapter 2</a> also introduced the <a
|
|
href='#mxmlSaveFile'><tt>mxmlSaveFile()</tt></a>, <a
|
|
href='#mxmlSaveString'><tt>mxmlSaveString()</tt></a>, and <a
|
|
href='#mxmlSaveAllocString'><tt>mxmlSaveAllocString()</tt></a>
|
|
functions. The last argument to these functions is a callback
|
|
function which is used to automatically insert whitespace in an
|
|
XML document.</p>
|
|
|
|
<p>Your callback function will be called up to four times for
|
|
each element node with a pointer to the node and a "where" value
|
|
of <tt>MXML_WS_BEFORE_OPEN</tt>, <tt>MXML_WS_AFTER_OPEN</tt>,
|
|
<tt>MXML_WS_BEFORE_CLOSE</tt>, or <tt>MXML_WS_AFTER_CLOSE</tt>.
|
|
The callback function should return <tt>NULL</tt> if no
|
|
whitespace should be added and the string to insert (spaces,
|
|
tabs, carriage returns, and newlines) otherwise.</p>
|
|
|
|
<p>The following whitespace callback can be used to add
|
|
whitespace to XHTML output to make it more readable in a standard
|
|
text editor:</p>
|
|
|
|
<pre>
|
|
const char *
|
|
whitespace_cb(mxml_node_t *node,
|
|
int where)
|
|
{
|
|
const char *name;
|
|
|
|
/*
|
|
* We can conditionally break to a new line
|
|
* before or after any element. These are
|
|
* just common HTML elements...
|
|
*/
|
|
|
|
name = mxmlGetElement(node);
|
|
|
|
if (!strcmp(name, "html") ||
|
|
!strcmp(name, "head") ||
|
|
!strcmp(name, "body") ||
|
|
!strcmp(name, "pre") ||
|
|
!strcmp(name, "p") ||
|
|
!strcmp(name, "h1") ||
|
|
!strcmp(name, "h2") ||
|
|
!strcmp(name, "h3") ||
|
|
!strcmp(name, "h4") ||
|
|
!strcmp(name, "h5") ||
|
|
!strcmp(name, "h6"))
|
|
{
|
|
/*
|
|
* Newlines before open and after
|
|
* close...
|
|
*/
|
|
|
|
if (where == MXML_WS_BEFORE_OPEN ||
|
|
where == MXML_WS_AFTER_CLOSE)
|
|
return ("\n");
|
|
}
|
|
else if (!strcmp(name, "dl") ||
|
|
!strcmp(name, "ol") ||
|
|
!strcmp(name, "ul"))
|
|
{
|
|
/*
|
|
* Put a newline before and after list
|
|
* elements...
|
|
*/
|
|
|
|
return ("\n");
|
|
}
|
|
else if (!strcmp(name, "dd") ||
|
|
!strcmp(name, "dt") ||
|
|
!strcmp(name, "li"))
|
|
{
|
|
/*
|
|
* Put a tab before <li>'s, * <dd>'s,
|
|
* and <dt>'s, and a newline after them...
|
|
*/
|
|
|
|
if (where == MXML_WS_BEFORE_OPEN)
|
|
return ("\t");
|
|
else if (where == MXML_WS_AFTER_CLOSE)
|
|
return ("\n");
|
|
}
|
|
|
|
/*
|
|
* Return NULL for no added whitespace...
|
|
*/
|
|
|
|
return (NULL);
|
|
}
|
|
</pre>
|
|
|
|
<p>To use this callback function, simply use the name when you
|
|
call any of the save functions:</p>
|
|
|
|
<pre>
|
|
FILE *fp;
|
|
mxml_node_t *tree;
|
|
|
|
fp = fopen("filename.xml", "w");
|
|
mxmlSaveFile(tree, fp, <b>whitespace_cb</b>);
|
|
fclose(fp);
|
|
</pre>
|
|
|
|
|
|
<!-- NEED 10 -->
|
|
<h2>Custom Data Types</h2>
|
|
|
|
<p>Mini-XML supports custom data types via global load and save
|
|
callbacks. Only a single set of callbacks can be active at any
|
|
time, however your callbacks can store additional information in
|
|
order to support multiple custom data types as needed. The
|
|
<tt>MXML_CUSTOM</tt> node type identifies custom data nodes.</p>
|
|
|
|
<p>The load callback receives a pointer to the current data node
|
|
and a string of opaque character data from the XML source with
|
|
character entities converted to the corresponding UTF-8
|
|
characters. For example, if we wanted to support a custom
|
|
date/time type whose value is encoded as "yyyy-mm-ddThh:mm:ssZ"
|
|
(ISO format), the load callback would look like the
|
|
following:</p>
|
|
|
|
<pre>
|
|
typedef struct
|
|
{
|
|
unsigned year, /* Year */
|
|
month, /* Month */
|
|
day, /* Day */
|
|
hour, /* Hour */
|
|
minute, /* Minute */
|
|
second; /* Second */
|
|
time_t unix; /* UNIX time */
|
|
} iso_date_time_t;
|
|
|
|
int
|
|
load_custom(mxml_node_t *node,
|
|
const char *data)
|
|
{
|
|
iso_date_time_t *dt;
|
|
struct tm tmdata;
|
|
|
|
/*
|
|
* Allocate data structure...
|
|
*/
|
|
|
|
dt = calloc(1, sizeof(iso_date_time_t));
|
|
|
|
/*
|
|
* Try reading 6 unsigned integers from the
|
|
* data string...
|
|
*/
|
|
|
|
if (sscanf(data, "%u-%u-%uT%u:%u:%uZ",
|
|
&(dt->year), &(dt->month),
|
|
&(dt->day), &(dt->hour),
|
|
&(dt->minute),
|
|
&(dt->second)) != 6)
|
|
{
|
|
/*
|
|
* Unable to read numbers, free the data
|
|
* structure and return an error...
|
|
*/
|
|
|
|
free(dt);
|
|
|
|
return (-1);
|
|
}
|
|
|
|
/*
|
|
* Range check values...
|
|
*/
|
|
|
|
if (dt->month < 1 || dt->month > 12 ||
|
|
dt->day < 1 || dt->day > 31 ||
|
|
dt->hour < 0 || dt->hour > 23 ||
|
|
dt->minute < 0 || dt->minute > 59 ||
|
|
dt->second < 0 || dt->second > 59)
|
|
{
|
|
/*
|
|
* Date information is out of range...
|
|
*/
|
|
|
|
free(dt);
|
|
|
|
return (-1);
|
|
}
|
|
|
|
/*
|
|
* Convert ISO time to UNIX time in
|
|
* seconds...
|
|
*/
|
|
|
|
tmdata.tm_year = dt->year - 1900;
|
|
tmdata.tm_mon = dt->month - 1;
|
|
tmdata.tm_day = dt->day;
|
|
tmdata.tm_hour = dt->hour;
|
|
tmdata.tm_min = dt->minute;
|
|
tmdata.tm_sec = dt->second;
|
|
|
|
dt->unix = gmtime(&tmdata);
|
|
|
|
/*
|
|
* Assign custom node data and destroy
|
|
* function pointers...
|
|
*/
|
|
|
|
mxmlSetCustom(node, data, destroy);
|
|
|
|
/*
|
|
* Return with no errors...
|
|
*/
|
|
|
|
return (0);
|
|
}
|
|
</pre>
|
|
|
|
<p>The function itself can return 0 on success or -1 if it is
|
|
unable to decode the custom data or the data contains an error.
|
|
Custom data nodes contain a <tt>void</tt> pointer to the
|
|
allocated custom data for the node and a pointer to a destructor
|
|
function which will free the custom data when the node is
|
|
deleted.</p>
|
|
|
|
<!-- NEED 15 -->
|
|
<p>The save callback receives the node pointer and returns an
|
|
allocated string containing the custom data value. The following
|
|
save callback could be used for our ISO date/time type:</p>
|
|
|
|
<pre>
|
|
char *
|
|
save_custom(mxml_node_t *node)
|
|
{
|
|
char data[255];
|
|
iso_date_time_t *dt;
|
|
|
|
|
|
dt = (iso_date_time_t *)mxmlGetCustom(node);
|
|
|
|
snprintf(data, sizeof(data),
|
|
"%04u-%02u-%02uT%02u:%02u:%02uZ",
|
|
dt->year, dt->month, dt->day,
|
|
dt->hour, dt->minute, dt->second);
|
|
|
|
return (strdup(data));
|
|
}
|
|
</pre>
|
|
|
|
<p>You register the callback functions using the <a
|
|
href='#mxmlSetCustomHandlers'><tt>mxmlSetCustomHandlers()</tt></a>
|
|
function:</p>
|
|
|
|
<pre>
|
|
mxmlSetCustomHandlers(<b>load_custom</b>,
|
|
<b>save_custom</b>);
|
|
</pre>
|
|
|
|
|
|
<!-- NEED 20 -->
|
|
<h2>Changing Node Values</h2>
|
|
|
|
<p>All of the examples so far have concentrated on creating and
|
|
loading new XML data nodes. Many applications, however, need to
|
|
manipulate or change the nodes during their operation, so
|
|
Mini-XML provides functions to change node values safely and
|
|
without leaking memory.</p>
|
|
|
|
<p>Existing nodes can be changed using the <a
|
|
href='#mxmlSetElement'><tt>mxmlSetElement()</tt></a>, <a
|
|
href='#mxmlSetInteger'><tt>mxmlSetInteger()</tt></a>, <a
|
|
href='#mxmlSetOpaque'><tt>mxmlSetOpaque()</tt></a>, <a
|
|
href='#mxmlSetReal'><tt>mxmlSetReal()</tt></a>, <a
|
|
href='#mxmlSetText'><tt>mxmlSetText()</tt></a>, and <a
|
|
href='#mxmlSetTextf'><tt>mxmlSetTextf()</tt></a> functions. For
|
|
example, use the following function call to change a text node
|
|
to contain the text "new" with leading whitespace:</p>
|
|
|
|
<pre>
|
|
mxml_node_t *node;
|
|
|
|
mxmlSetText(node, 1, "new");
|
|
</pre>
|
|
|
|
|
|
<h2>Formatted Text</h2>
|
|
|
|
<p>The <a href='#mxmlNewTextf'><tt>mxmlNewTextf()</tt></a> and <a
|
|
href='#mxmlSetTextf'><tt>mxmlSetTextf()</tt></a> functions create
|
|
and change text nodes, respectively, using <tt>printf</tt>-style
|
|
format strings and arguments. For example, use the following
|
|
function call to create a new text node containing a constructed
|
|
filename:</p>
|
|
|
|
<pre>
|
|
mxml_node_t</a> *node;
|
|
|
|
node = mxmlNewTextf(node, 1, "%s/%s",
|
|
path, filename);
|
|
</pre>
|
|
|
|
|
|
<h2>Indexing</h2>
|
|
|
|
<p>Mini-XML provides functions for managing indices of nodes.
|
|
The current implementation provides the same functionality as
|
|
<a href='#mxmlFindElement'><tt>mxmlFindElement()</tt></a>.
|
|
The advantage of using an index is that searching and
|
|
enumeration of elements is significantly faster. The only
|
|
disadvantage is that each index is a static snapshot of the XML
|
|
document, so indices are not well suited to XML data that is
|
|
updated more often than it is searched. The overhead of creating
|
|
an index is approximately equal to walking the XML document
|
|
tree. Nodes in the index are sorted by element name and
|
|
attribute value.</p>
|
|
|
|
<p>Indices are stored in <a href='#mxml_index_t'><tt>mxml_index_t</tt></a>
|
|
structures. The <a href='#mxmlIndexNew'><tt>mxmlIndexNew()</tt></a> function
|
|
creates a new index:</p>
|
|
|
|
<pre>
|
|
mxml_node_t *tree;
|
|
mxml_index_t *ind;
|
|
|
|
ind = mxmlIndexNew(tree, "element",
|
|
"attribute");
|
|
</pre>
|
|
|
|
<p>The first argument is the XML node tree to index. Normally this
|
|
will be a pointer to the <tt>?xml</tt> element.</p>
|
|
|
|
<p>The second argument contains the element to index; passing
|
|
<tt>NULL</tt> indexes all element nodes alphabetically.</p>
|
|
|
|
<p>The third argument contains the attribute to index; passing
|
|
<tt>NULL</tt> causes only the element name to be indexed.</p>
|
|
|
|
<p>Once the index is created, the <a
|
|
href='#mxmlIndexEnum'><tt>mxmlIndexEnum()</tt></a>, <a
|
|
href='#mxmlIndexFind'><tt>mxmlIndexFind()</tt></a>, and <a
|
|
href='#mxmlIndexReset'><tt>mxmlIndexReset()</tt></a> functions
|
|
are used to access the nodes in the index. The <a
|
|
href='#mxmlIndexReset'><tt>mxmlIndexReset()</tt></a> function
|
|
resets the "current" node pointer in the index, allowing you to
|
|
do new searches and enumerations on the same index. Typically
|
|
you will call this function prior to your calls to <a
|
|
href='#mxmlIndexEnum'><tt>mxmlIndexEnum()</tt></a> and <a
|
|
href='#mxmlIndexFind'><tt>mxmlIndexFind()</tt></a>.</p>
|
|
|
|
<p>The <a href='#mxmlIndexEnum'><tt>mxmlIndexEnum()</tt></a>
|
|
function enumerates each of the nodes in the index and can be
|
|
used in a loop as follows:</p>
|
|
|
|
<pre>
|
|
mxml_node_t *node;
|
|
|
|
mxmlIndexReset(ind);
|
|
|
|
while ((node = mxmlIndexEnum(ind)) != NULL)
|
|
{
|
|
// do something with node
|
|
}
|
|
</pre>
|
|
|
|
<p>The <a href='#mxmlIndexFind'><tt>mxmlIndexFind()</tt></a>
|
|
function locates the next occurrence of the named element and
|
|
attribute value in the index. It can be used to find all
|
|
matching elements in an index, as follows:</p>
|
|
|
|
<pre>
|
|
mxml_node_t *node;
|
|
|
|
mxmlIndexReset(ind);
|
|
|
|
while ((node = mxmlIndexFind(ind, "element",
|
|
"attr-value"))
|
|
!= NULL)
|
|
{
|
|
// do something with node
|
|
}
|
|
</pre>
|
|
|
|
<p>The second and third arguments represent the element name and
|
|
attribute value, respectively. A <tt>NULL</tt> pointer is used
|
|
to return all elements or attributes in the index. Passing
|
|
<tt>NULL</tt> for both the element name and attribute value
|
|
is equivalent to calling <tt>mxmlIndexEnum</tt>.</p>
|
|
|
|
<p>When you are done using the index, delete it using the
|
|
<a href='#mxmlIndexDelete()'><tt>mxmlIndexDelete()</tt></a>
|
|
function:</p>
|
|
|
|
<pre>
|
|
mxmlIndexDelete(ind);
|
|
</pre>
|
|
|
|
<h2>SAX (Stream) Loading of Documents</h2>
|
|
|
|
<p>Mini-XML supports an implementation of the Simple API for XML
|
|
(SAX) which allows you to load and process an XML document as a
|
|
stream of nodes. Aside from allowing you to process XML documents of
|
|
any size, the Mini-XML implementation also allows you to retain
|
|
portions of the document in memory for later processing.</p>
|
|
|
|
<p>The <a href='#mxmlSAXLoad'><tt>mxmlSAXLoadFd</tt></a>, <a
|
|
href='#mxmlSAXLoadFile'><tt>mxmlSAXLoadFile</tt></a>, and <a
|
|
href='#mxmlSAXLoadString'><tt>mxmlSAXLoadString</tt></a> functions
|
|
provide the SAX loading APIs. Each function works like the
|
|
corresponding <tt>mxmlLoad</tt> function but uses a callback to
|
|
process each node as it is read.</p>
|
|
|
|
<p>The callback function receives the node, an event code, and
|
|
a user data pointer you supply:</p>
|
|
|
|
<pre>
|
|
void
|
|
sax_cb(mxml_node_t *node,
|
|
mxml_sax_event_t event,
|
|
void *data)
|
|
{
|
|
... do something ...
|
|
}
|
|
</pre>
|
|
|
|
<p>The event will be one of the following:</p>
|
|
|
|
<ul>
|
|
|
|
<li><tt>MXML_SAX_CDATA</tt> - CDATA was just read</li>
|
|
|
|
<li><tt>MXML_SAX_COMMENT</tt> - A comment was just read</li>
|
|
|
|
<li><tt>MXML_SAX_DATA</tt> - Data (custom, integer, opaque, real, or text) was just read</li>
|
|
|
|
<li><tt>MXML_SAX_DIRECTIVE</tt> - A processing directive was just read</li>
|
|
|
|
<li><tt>MXML_SAX_ELEMENT_CLOSE</tt> - A close element was just read (<tt></element></tt>)</li>
|
|
|
|
<li><tt>MXML_SAX_ELEMENT_OPEN</tt> - An open element was just read (<tt><element></tt>)</li>
|
|
|
|
</ul>
|
|
|
|
<p>Elements are <em>released</em> after the close element is
|
|
processed. All other nodes are released after they are processed.
|
|
The SAX callback can <em>retain</em> the node using the <a
|
|
href='#mxmlRetain'><tt>mxmlRetain</tt></a> function. For example,
|
|
the following SAX callback will retain all nodes, effectively
|
|
simulating a normal in-memory load:</p>
|
|
|
|
<pre>
|
|
void
|
|
sax_cb(mxml_node_t *node,
|
|
mxml_sax_event_t event,
|
|
void *data)
|
|
{
|
|
if (event != MXML_SAX_ELEMENT_CLOSE)
|
|
mxmlRetain(node);
|
|
}
|
|
</pre>
|
|
|
|
<p>More typically the SAX callback will only retain a small portion
|
|
of the document that is needed for post-processing. For example, the
|
|
following SAX callback will retain the title and headings in an
|
|
XHTML file. It also retains the (parent) elements like <tt><html></tt>, <tt><head></tt>, and <tt><body></tt>, and processing
|
|
directives like <tt><?xml ... ?></tt> and <tt><!DOCTYPE ... ></tt>:</p>
|
|
|
|
<!-- NEED 10 -->
|
|
<pre>
|
|
void
|
|
sax_cb(mxml_node_t *node,
|
|
mxml_sax_event_t event,
|
|
void *data)
|
|
{
|
|
if (event == MXML_SAX_ELEMENT_OPEN)
|
|
{
|
|
/*
|
|
* Retain headings and titles...
|
|
*/
|
|
|
|
char *name = mxmlGetElement(node);
|
|
|
|
if (!strcmp(name, "html") ||
|
|
!strcmp(name, "head") ||
|
|
!strcmp(name, "title") ||
|
|
!strcmp(name, "body") ||
|
|
!strcmp(name, "h1") ||
|
|
!strcmp(name, "h2") ||
|
|
!strcmp(name, "h3") ||
|
|
!strcmp(name, "h4") ||
|
|
!strcmp(name, "h5") ||
|
|
!strcmp(name, "h6"))
|
|
mxmlRetain(node);
|
|
}
|
|
else if (event == MXML_SAX_DIRECTIVE)
|
|
mxmlRetain(node);
|
|
else if (event == MXML_SAX_DATA)
|
|
{
|
|
if (mxmlGetRefCount(mxmlGetParent(node)) > 1)
|
|
{
|
|
/*
|
|
* If the parent was retained, then retain
|
|
* this data node as well.
|
|
*/
|
|
|
|
mxmlRetain(node);
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
|
|
<p>The resulting skeleton document tree can then be searched just
|
|
like one loaded using the <tt>mxmlLoad</tt> functions. For example,
|
|
a filter that reads an XHTML document from stdin and then shows the
|
|
title and headings in the document would look like:</p>
|
|
|
|
<pre>
|
|
mxml_node_t *doc, *title, *body, *heading;
|
|
|
|
doc = mxmlSAXLoadFd(NULL, 0,
|
|
MXML_TEXT_CALLBACK,
|
|
<b>sax_cb</b>, NULL);
|
|
|
|
title = mxmlFindElement(doc, doc, "title",
|
|
NULL, NULL,
|
|
MXML_DESCEND);
|
|
|
|
if (title)
|
|
print_children(title);
|
|
|
|
body = mxmlFindElement(doc, doc, "body",
|
|
NULL, NULL,
|
|
MXML_DESCEND);
|
|
|
|
if (body)
|
|
{
|
|
for (heading = mxmlGetFirstChild(body);
|
|
heading;
|
|
heading = mxmlGetNextSibling(heading))
|
|
print_children(heading);
|
|
}
|
|
</pre>
|
|
|
|
</body>
|
|
</html>
|