[ Home Page ] [ Eiffel Archive ] [ Eiffel Classes and Clusters ]

Arc de Triomphe clipart (2486 bytes)

eXML - Eiffel XML Parser Toolkit


Written by Andreas Leitner.

exml_0_1_6.zip (219651 bytes) - source code
http://exml.dhs.org/ (eXML home page)
http://www.eGroups.com/list/exml/ (eXML Mailing List)
http://leitner.dhs.org/ (Andreas Leitner home page)


Introduction

eXML is a XML 1.0 parser for Eiffel based on expat. Please see the expat-hompage for details on the parser.

The current eXML version is 0.1.6. It is using the dynamic link libraries of expat version 1.0.2

Copyright (c) 1999 Andreas Leitner. eXML is subject to the Eiffel Forum Freeware License.

Installation

To install the eXML Library unpack it in a directory of your choice. You will need to set the following enviroment variables to compile an eXML application:

EXML:
Pointing to the directory where you installed eXML.

GOBO:
Pointing to the directory where you installed GOBO. If you don't have GOBO, get it from http://www.gobo.demon.co.uk/eiffel/gobo/.


You need to do one ore more of the following actions depending on the configuration you use:

Windows (all):
All the needed libraries come precompiled. The DLL's are stored in the directory %EXML%\expat\bin. Be sure they are in your DLL-loadpath. It's a good idea to put the DLL's in your system directory.

Windows (ISE Eiffel):
There is an extra C-library needed. Please go to the directory %EXML%\compiler_specific\ise\clib and run "nmake makefile.win.msc"

Linux (all):
Go to the directory $(EXML)/expat and run "make". This will generate the needed expat object files, that are needed when compiling an eXML application.

eXML can be downloaded from exml_0_1_6.zip.

Documentation

eXML provides two different type of parsers: an event based parser and a tree based parser. Have a look at http://www.megginson.com/SAX/event.html for a comparsion between event based and tree based XML parsers. The tree based parser (XML_TREE_PARSER) is implemented as a descentant of the event based parser (XML_PARSER).

The Tree Based Parser

For the user of the library in most cases the tree based parser will be the better choice. For this reason I will limit the documentation on this topic for now. Objects of the type XML_TREE_PARSER can be feed with a XML document split up in one or more strings. The following example parsers a XML document that is stored completly in a string object of the name `buffer'.

.
.
.
local
        parser: XML_TREE_PARSER
.
.
.
                -- somewhere the parser must be created (:
        !! parser.make
.
.
.
                -- parse the XML-document
parser.parse_string (buffer)
                -- and tell the parser that the end of the document has been reached
parser.set_end_of_file  

if
        not parser.is_correct
then
                -- whoops! there was an error in the docment.
                -- print out some information about that error
        print ("%N")
        print (parser.last_error_string)
        print (" at ln: ")
        print (parser.last_line_number)
        print (", cl: ")
        print (parser.last_column_number)
        print ("%N")
else
                -- the document was parsed successfully.
                -- print out the structure of the document
        print (parser.out)
end

The parser stores the information contained in the XML-document as a tree consisting of XML_NODES. The following is a inheritance graph of all nodes currently implemented.

nodes

The parser inheritance graph itself is presented in the next picture. You see that the tree parser not only inherits from the event based parser, but also from XML_DOCUMENT. Objects of the type XML_DOCUMENT represent a whole XML-document. They have a `root_element' of the type XML_ELEMENT and (in the future will) hold additional information given in the XML-document (i.e. the encoding)

parser

eBook

eBook is an example of an eXML tree based parser application. It is included in the eXML package. But eBook is more than a plain example, it is pretty much an application of its own. For example this whole web site is generated using eBook. The text and structure of this web site is stored in a XML-document. eBook takes this XML-document and some additional layout files and generates HTML pages as output.

There are several reasons why this can be usefull. I.e. eBook automatically generates the menu you can see at the right. If I want the site to have a new design, I only need to change the layout files and rerun eBook to generate the new HTML pages. eBook helps you to seperate the actual information from the layout of HTML sites. It makes it even possible to create different layout-files for different browsers, so you do not need to fidle with compatibility problems.

Currently there is not much documentation available about eBook. Have a look at the test data in the directory "/examples/test_data/ebook/input" to see how eBook works.

Watch this space for more information about eBook...

One last note: If you try eBook and worry about it beeing slow, try compiling it with assertion checking turned off !!!

eBook File Format

The main eBook file format is usual XML of course. XML is quite easy to understand. It helps alot if you know a bit about HTML. Have a look at the XML Version 1.0 Specification for full details about XML.

With eBook you have a clean seperation between content and layout. The content is stored in a XML file. The main element is ebook. Within the ebook element you can nest and list as many page elements as you want. A page element will be translated by eBook to a HTML page. The following sample would generate 3 (nearly) empty HTML files:

<ebook>
        <page>
        </page>

        <page>
        </page>

        <page>
        </page>
</ebook>

The previous example is not runnable because each page must have exactly one topic , exactly one text and zero or more page elements. The topic element must contain only text. The text you specify in the topic element will be used in the menu and as headline for the corresponding page. The text in the text element will be used as page content. For now only plain text is allowed. If you need some special HTML tags, you should put a CDATA section in the text element. Here is a more complete example:

<?xml version="1.0"?>

<!DOCTYPE ebook
[
<!ELEMENT ebook (page+)>
<!ELEMENT page (topic, text, page+)>
<!ATTLIST page
          key    CDATA "auto">
<!ELEMENT topic (#PCDATA)>
<!ELEMENT text (#PCDATA)>
]>

<ebook>
        <page key="index.html">
                <topic>
                        Main Page
                </topic>
                <text>
                        This is the main page.
                </text>
        </page>
        <page>
                <topic>
                        Second Page
                </topic>
                <text>
                        <![CDATA[

                        This is the second page. It is enclosed 
                        in a CDATA section, thus I can use regular 
                        HTML in here.
                        <A HREF="http://exml.dhs.org">I am a link!</A>
                        ]]>
                </text>
        </page>
        <page>
                <topic>
                        3rd Page
                </topic>
                <text>
                        This is the 3rd page. 
                        It includes a nested page.
                </text>
                <page>
                        <topic>
                                Nested page
                        </topic>
                        <text>
                                This page is nested in the 3rd page.
                        </text>
                </page>
        </page>
</ebook>

The first lines define how which format this XML document has. The key attribute you can see on the first page-start tag is optional on if specified forces eBook to use the value of the attribute as file-name for this page.

Currently eBook does nearly no validation on wether the input is correct or not. Well formdness rules are checked an reported, but if other errors will most likely result in an unhandled exception.

The Future

I have several ideas on how to extend/enhance eXML. There is plenty of work to do. If you are keen to help developing eXML please contact me.

The following list shows spots that need work (in no specific order)

Known Bugs

Date Description Status
04/14/1999 Compiling eBook with SmallEiffel and option "-boost" will result in C-error messages. Seems to be a bug in SmallEiffel. Not fixed.
04/08/1999 There are strange XML-errors reported when compiling with ISE. Fixed since version 0.1.6.

Something else you might want to tell me?

The files clib/xmlparse.dll,clib/xmlparse.lib,clib/xmltok.dll and spec/windows/include/xmlparse.h are (unmodified) taken from expat and therefore are subject to the Mozilla Public License Version 1.0.

There is now a mailing list for eXML at www.egroups.com.

Andreas Leitner andreas.leitner@teleweb.at

9 March 1999

[ Home Page ] [ Eiffel Archive ] [ Eiffel Classes and Clusters ]