|About Weeds of El Limon: XML|
The village of El Limon in the Dominican Republic is home to a remarkable development project which has built an irrigation and hydroelectric power system: the school has a laptop computer and soon will have an internet connection. Every night the children are entertained by multimedia CD-ROM titles originally designed for sale for people in affluent countries. Our mission was to develop the first multimedia product designed especially for them.
Olivia is a botanical illustrator and an expert on weeds, so we took notes and made illustrations of 32 weeds that we found in the village. We then had the task of converting a mass of text into presentable HTML. If I had to write 32 web pages by hand, even with the help of a WYSIWIG HTML editor, it would be difficult if not imposible to maintain a consistent look and feel. And then, if I didn't like the results or if I found the pages were incompatible with a particular browser, making any changes would be a lot of work. If we were to get the descriptions translated into spanish, we'd have to repeat all this work. In short, the project had grown beyond writing individual web pages to the next stage.
At the time we got home, the W3 consortium had just issued XML 1.0 as a reccomendation, and XML seemed to be just the tool I needed. With XML I was able to design my own specialized markup language for descriptions of weeds and write a translator program for producing a set of web pages complete with indexes and navigational hints for humans and search engines alike. Content and style are now separated so we can update the descriptions of the plants without worrying about the mechanics of HTML; because the pages are automatically generated, the look is absolutely consistent, reaching a level of professionalism that would be difficult to attain creating pages by hand, and if we don't like the way it looks or if we want to add a new feature, changing all of the pages is as simple as pushing a button.
My publishing system has a three-layer architecture. At the top of the picture, there is an XML parser which, given a document type declaration or DTD parses XML into a collection of objects. You don't want to write your own, many of them are free with source code at this point: as much as I hate to admit it, I choise MSXML because it was the only well-documented XML parser at the time. If I had started this project today, I would have probably used IBM's XML for Java.
The output of an XML parser, however, is structured more like an XML document than like a collection of plant descriptions. Secondly, I don't want to be tied down to a product from Microsoft, so I created my own collection of objects that represent a description of a plant in a natural way. One factory class uses MSXML to generate a Species object that describes a plant. When I want to change to another parser, I simply need to rewrite one class since no other class contains a single reference to MSXML.
The HTML generator lies in a third layer. Since it's separate from the
representation of the plants, it could easily be replaced with a layer
that generates, say, a
LATEX document or one
that runs as a client-side Java applet, or that generates HTML
dynamically on a server. If I were to write an applet version, I
probably would not include an XML parser with the applet, since it
would take time to download and verify the parser as well as take time to download and parse XML
instead I would send the intermediate representation to the applet using Java's
The design of this program has been highly influenced by the books
Concurrent Programming in Java.
<?XML VERSION="1.0"?> <!DOCTYPE PLANTDATA SYSTEM "limon.dtd"> <PLANTDATA> <SPECIES ID="6"> <FAMILY>Cucurbitacea</FAMILY> <LATIN>Momordica charantia L.</LATIN> <COMMON>balsam pear</COMMON> <COMMON>balsam apple</COMMON> <COMMON>cerasee bush</COMMON> <COMMON>archucha</COMMON> <COMMON>balsamina</COMMON> <COMMON>achochilla</COMMON> <COMMON>pepinillo</COMMON> <COMMON>cunde amor</COMMON> <COMMON>melao de Sao Caetano</COMMON> <COMMON>carcilla</COMMON> <TEXT TYPE="DESCRIPTION" SOURCE="Direnzo98"> Vine, climbs by tendrils. Leaves are alternate, soft and lightly hairy. Leaves are deeply lobed with five lobes. (Length about <CM>3</CM>) Yellow flowers arise from leaf axils as do tendrils. Flower has five petals, bright orange small cluters of pistils and stamen at center. (Diameter about <CM>1.5</CM>) Pods are oval tapering to a point with rows of little spikes, green turning orange as they mature. Exploded pods show bright orange peels and four red seeds. Inside is sticky. Pod length (about <CM>2.5</CM>) Stem is hairy, very hairy at terminal end. Found growing on fence along main road in full sun. </TEXT> </SPECIES> </PLANTDATA>
I had a vision in my mind of something like the above, but I started out knowing almost nothing about XML. I read the XML Specification and started experimenting until I had a DTD that did what I wanted. It is
<!ELEMENT PLANTDATA ( SPECIES )+> <!ELEMENT SPECIES ( FAMILY?,LATIN*,COMMON*,TEXT*,CITE*)> <!ATTLIST SPECIES ID CDATA #REQUIRED> <!ELEMENT FAMILY ( #PCDATA )> <!ELEMENT LATIN ( #PCDATA )> <!ELEMENT COMMON ( #PCDATA )> <!ELEMENT TEXT ( #PCDATA | A | CM | REF )*> <!ATTLIST TEXT TYPE CDATA #REQUIRED> <!ATTLIST TEXT SOURCE CDATA #REQUIRED> <!ATTLIST TEXT LANGUAGE CDATA "ENGLISH"> <!ELEMENT A (#PCDATA)> <!ATTLIST A HREF CDATA #REQUIRED> <!ELEMENT CM (#PCDATA)> <!ELEMENT REF EMPTY> <!ATTLIST REF ID CDATA #REQUIRED> <!ELEMENT IMAGE (#PCDATA)> <!ATTLIST IMAGE HREF CDATA #REQUIRED> <!ATTLIST IMAGE SOURCE CDATA ""> <!ATTLIST IMAGE TYPE CDATA "PHOTO"> <!ELEMENT CITE EMPTY> <!ATTLIST CITE SOURCE CDATA #REQUIRED> <!ATTLIST CITE PAGE CDATA ""> <!ENTITY Agrave 'À'> <!ENTITY Aacute 'Á'> <!ENTITY agrave 'à'> <!ENTITY aacute 'á'>