Chapter 15 of Professional Java Server Programming, © 1999 Wrox Press.

[ Part 1 ]
Part 2
Part 3
Table of Contents
Weeds of El Limon 2
    The problem
        The site
            
        How it works
        The unfriendly net
    The tools

Weeds of El Limon 2


The problem

A little more than a year ago, we visited the village of El Limon in the Dominican Republic to help string power lines for a micro-hydroelectic generator. (Read about the project at http://www.career.cornell.edu/cresp/ecopartners/) When we got there, the money to buy the cable hadn't arrived, so we spent our time describing and making drawings of commonly occuring weeds. The villagers had a solar powered laptop computer and would soon be getting a cellular internet connection -- if we converted our weed descriptions to HTML, they could access them, as well as other users of the web.


However, making Weeds of El Limon meant making 32 nearly identical web pages for 32 weeds. It seemed reasonable to write 32 pages once, but what if I decided I didn't like the look and I wanted to change it? The XML specification had just been approved, so I got the idea to write the weed descriptions in XML and write a program to convert them into attractive HTML. Being an early application of XML, Weeds of El Limon, attracted attention from the XML community and we wrote about it Chapter 12 of XML Applications from Wrox press.


A year later, we'd received E-mail from people interested in the weeds. Some had typed "prickle poppy" into a search engine and got our description, and we heard from a peace corps volunteer who was just about to visit the Dominican Republic. W asked ourselves, "How we can we make our site more useful to people interested in plants?" Weeds of El Limon had problems: because we didn't take the right books with us, we weren't able to identify many of the plants, in particular the grasses. Also, our plants were numbered in the order in which we found them, not the conventional alphabetic ordering. We'd gotten the rights to put a list of recommended street trees online, and making an improved Weeds of El Limon would be a way to test ideas for a more powerful publishing system.


To make the new site effective, we'd need a clear mission. We thought about the Peace Corps volunteer who wrote us, and, being able to find almost nothing else about tropical weeds online, decided that we could best help people like him out by making a brief introduction to weeds in the Caribbean. Our sampling procedure wasn't comprehensive or scientific, we just walked out of the schoolhouse we were staying in and dug up the first new weed we saw -- this meant, however, that we observed and recorded the most ubiquitous species... the ones that you'd notice right after you stepped off the plane. Therefore, we bundled the 14 weeds we'd identified as Common Weeds of the Caribbean (http://www.tapir.org/weeds/), and redesigned the site to make it easier to use.

The site

Common Weeds contains three kinds of database-generated page. Each plant has an individual page, and there are two index pages: an index by common name and a top page indexed by latin name. When Common Weeds is completed, there will also be a few static pages with information about the authors, the software, and books about tropical weeds.



Common Weeds is a simpler site than the original Weeds of El Limon. In the original, with 32 weeds, I needed separate pages to make indices by common name, latin name, and by family. With just 14 weeds, it was practical to link to each weed from the top page, eliminating the need for separate indices by family and latin name. Since we put more information on the top page, the site is easier to use, since users need to click less.




[enlarge image]

Beneath the surface, the HTML is simpler. In Weeds of El Limon, I used a table cell background to color the bar at the top of the page, making white letters on a black background. I first tried this with the BGCOLOR attribute of the <TD> tag and with <FONT COLOR>, but the result on Netscape 2, which doesn't support colored table cells, was disastrous: white letters on a white background are impossible to read. To solve this problem, I used cascading style sheets (CSS) to set the cell and text colors. Since then, our staff artist discovered just how bad the CSS implementations in Netscape 4 and IE 3 are -- and that, often, it is much better to use old-fashioned, conservative HTML that works, even if it does make the blood of the W3 consortium boil. This time, to avoid trouble with table-cell backgrounds, I chose fail safe colors. In a browser without table cell backgrounds, the pale green bar at top is white -- compatible with black text.


[enlarge image]


Like the top page, the common name index is information rich. For the roughly 80% of web users with screens 800x600 or greater, all of the names fit on one screen. Although I generate the page dynamically out of a database, I set the break points of the columns by hand to guarantee effective layout.


[enlarge image]

Compared to the indices, the changes to the individual weed pages were minor. Although some pundits think that page numbers are obsolete on the web, I decided to keep numbering the weeds. In the original, we numbered weeds by the order in which we found them. Now they're numbered in the order they appear in on the top page. I added a navigation widget that lets viewers jump to any number with a single click, to imitate the "feel" of a book.



How it works

Weeds of El Limon was a simple filter: it took a collection of XML files as an input, and created a set of static HTML files which I could put on my server. Afterwards, I built several database driven sites and got hooked by the ability to provide multiple views of information stored in a database and the ability of two or more people to collaborate on maintaining a database driven site. For the first phase of Common Weeds, I took advantage of the existing XML format. Rather than building a system to interactively add and update weed descriptions in the database, I could simply import them in XML format into the database. If I need to change the descriptions in the short term, I can edit the XML and reload the database. This way I could quickly develop a database-driven site, and in the future, I can add additional pages to edit the database directly. Since the original Weeds of El Limon software, WEEDS 1, was written in Java, it seemed natural to use servlets and JSP since I'd be able to reuse some of the software and design.



WEEDS 2, the software which generates Common Weeds, is a 4-tier application because four separate processes are involved in each request: the web browser, the Apache web server, the Java virtual machine, and the MySQL database. Parts of WEEDS 2 run on the web server (URL rewriting rules), parts in the Java virtual machine (JSP and supporting classes) and parts run in MySQL (database queries.) Because I get a lot of mail about Weeds of El Limon from people in the third world, the output of WEEDS 2 is conservative HTML, roughly compliant with HTML 3.2, without style sheets, applets, or Javascript. For any project, it's important to consider the audience when choosing which client-side features you use. Applets, for instance, can provide a sharp user interface for an intranet application, or a VRML world can draw an audience of thrill seekers, but, since fancy features aren't supported in old browsers and aren't completely compatible between new browsers, early adopters risk limiting their audience and greatly raising the cost of developing and maintaining their sites. It's often easier to add interactivity on the server side, where you can maintain a large database and be a center for communication between your users. On the server, you can maintain control over your platform, your tools, and not have to worry about supporting multiple versions of the software you depend on.



For each of the three page types, there exists a Java Server Page and a Java Bean. For instance, to generate individual weed pages, WEEDS 2 uses weed.jsp and WeedBean. The Java Server Page is, mostly, an HTML template filled with information it gets from its corresponding Bean. There are a few advantages to this division. If I make a change in a Java class, I have to recompile the .java file, and, possibly, restart the servlet engine. (Some servlet engines, such as Apache JServ and JRun, can be configured to automatically reload class files that change. Others, such as Sun's Java Server Web Development Kit, cannot.) JSP files, however, are compiled into Java servlets by the JSP engine. So long as the JSP file stays the same, the JSP engine reuses the servlet. When you change the JSP file, the JSP engine detects this and recompiles. Thus, it's as easy to edit a JSP file on your server as it is to edit an HTML file. By defining the look of a page in a JSP, I can make simple changes without the hassle of recompiling. However, by hiding complex Java methods inside beans, the Java left inside the JSP is simple and stereotypical, generally of the form <% bean.getProperty() %>, meaning that JSPs can be written and edited by people who know about graphic design and HTML while the complexities of Java, databases, and object-oriented design can be worried about by programmers.


A one-to-one mapping between JSPs and beans is only one possible design. You could, if you like, have multiple JSPs access the same bean, or, have a page incorporate more than one bean. Java Beans can be used as reusable components across a site, to create navigation bars, or to insert advertisements. In my case, there were some methods, (for instance, those that display certain objects in HTML form) and properties (such as the copyright notice) that were common to all the pages. I could have created an additional bean shared by all the pages, but rather, I made IndexBean, CommonBean and WeedBean subclasses of GeneralBean to share these functions.


WEEDS 2, in addition to web pages, contains images of the weeds. The images are stored in .gif format inside BLOB columns in the database, and are served by the ViewWeed servlet. I use a servlet here, because ViewWeed simply retrieves the image from the database and sends it over the network verbatim without filling in a template.


The beans and servlet depend on supporting classes, in particular, DBSpecies, which provides an object-relational view of a single species in the database, and Weeds, which is the gateway to the database. In some sense, Weeds is the central class of the application since it holds the database connection and all of the SQL statements used to access the database. Also, Weeds contains utility methods that I want to share throughout the application. Currently, I create a new instance of Weeds for each web request. Although Weeds itself is lightweight (about 32 bytes), it takes time to create the database connection. This is OK now, because WEEDS 2 is currently fast enough for what I do with it.


If I need to speed my application up, I've got two options. The simplest is to change Weeds so it gets its connections from an existing mature and efficient connection pool, such as the one presented in Chapter 11 This would be a snap, since Weeds encapsulates the connection -- I wouldn't have to change a line of code elsewhere. If, however, WEEDS becomes a more complex application -- that, say, accesses more than one database, or pools additional resources, it would also be possible to pool the Weeds class. Pooling Weeds wouldn't be very different from pooling connections, although unlike a connection pool, I couldn't simply copy the code off the net. Since, ultimately, only a single thread at a time can obtain an object from a pool, the time cost of pooling is determined by the time it takes to obtain a lock (to call a synchronized method.) Since it costs the same to pool a single object that contains references to N objects as it does to pool any other object, wrapping multiple objects that need to be pooled in a single object leads to a design which can be scaled to higher performance.


Locating all of the SQL in the Weeds class has other consequences. There is the disadvantage that SQL statements are declared in a file far away from the places where they are used, which makes the code harder to read to a newcomer. Each SQL statement is wrapped in a method, and it's a pain to think up a good name for each statement. On the other hand, since all of the SQL is in one place, all of the dependence on a particular database is in one place. Although SQL is supposed to be a standard, you'll discover many foibles in your database when you try to port an application from one database to another. If, for instance, I find the application doesn't work with database XYZ, I can make a subclass XYZWeeds which fixes the incompatibilities. This would also be a way to take advantage of special, nonportable, features of a particular database, such as stored procedures, which could improve performance. With the SQL in one place, I can also change the names of columns and rows and make other changes in the structure of the database without affecting the rest of the program. (I've already done this several times)


Organizing a server application in several layers makes your application more flexible for the future. Because the JSPs, beans, and supporting classes form three distinct layers, it would be possible to, if desired, move one of the layers onto another server using Java RMI or IIOP. For large business applications, Enterprise Java Beans (EJB) provides a standard interface for application servers that provide services such as transaction management, security, distribution and persistence. I'll talk about EJB more when I discuss the Java Beans in this application.


The unfriendly net

I wish I could install the WEEDS 2 software and database on our web server in San Francisco and work on both the descriptions and the software from my computers at work in home. Unfortunately, in Germany where I currently live, residential phone calls are charged by the minute. Between high costs, a busy modem pool, and a congested transatlantic cable, it's not practical to work online.


To cope with this problem, we keep a master copy of each of our web sites on our development server in Germany, each on a virtual host in our home LAN. We don't have a database or servlet engine in San Francisco, but instead, we create a static copy of the site (plain HTML and GIF files) on our machine in Germany and install the static copy on our web server. This technique can't be used for sites which process complicated queries (such as full-text search) or that can be modified by users (such as a site with a message board). However, our static site can be stored on a floppy and viewed on computers without a network connection.


There are two steps to copy our site to our server. First, on our machine in Germany, we make a static copy with a web crawler, specifically, pavuk (http://www.idata.sk/ondrej/pavuk/) Pavuk crawls through a web site and makes a collection of directories and files that mirror the original site. Then, we mirror the static copy of our site to our web server using rsync (http://rsync.samba.org)), a utility that brings two files or directories into sync by sending only the parts that are different.


To copy our dynamic site with a web crawler, we need to make it look, to web clients, like a static site. That is, there can't be any .jsp, .cgi, or .asp files, or cgi-bin directories. Also, we can't pass parameters via the GET or POST methods (that is, no URLs that end like weed.html?id=7.) Although it's possible to write a servlet which pretends to be a directory, and reads the path after the servlet name (such as servlet/small/4.gif), JSPs can only read parameters via GET or POST. To get around this, I use Apache's mod_rewrite module to transform static URLs like weeds/weed/4.html into URLs with GET parameters, such as weeds/jsp/weed.jsp?id=4. URL rewriting is similar to servlet aliasing, but is much more powerful: mod_rewrite can use external programs to transform URLs, use mod_proxy to pass a request to another web server (this was useful in early testing, when the only JSP 1.0 compatible engine didn't work with Apache) or even distribute requests between multiple servers.


There are a other reasons to make a dynamic site look static. Major search engines and other robots often refuse to index sites that look dynamic, stopping when they see URLs that look dynamic (.jsp, .cgi, .asp, /cgi-bin, etc.), because some sites could send a crawler crawling through millions of dynamically generated pages, wasting the time and bandwidth of the crawler while overloading the dynamic site. Thus, many worthwhile dynamic sites lose the hits they could get by being indexed. If a finite subset of a database driven site is worth indexing, making it look static could increase your traffic and help people find a useful resource. Also, when you hide the mechanics of your site, you make it a little harder for hackers to take advantage of published and unpublished weaknesses of your server software.


The tools

Next, I'll talk about the hardware and software that Common Weeds depends on and how to set up a system to run it. Our development system is a 350 Mhz Pentium 2 with 64 megs of RAM. it runs Debian Linux 2.0. I use Apache as a web server, with the optional mod_rewrite module compiled in (which is included with the Apache source but disabled by default, see http://www.apache.org/) as well as the Apache Jserv servlet engine (see http://java.apache.org/) To add support for JSP, I use Sun's reference implementation, available from http://www.javasoft.com/products/jsp/download.html. I use the shareware database MySQL (http://www.mysql.org/.) and the mm.mysql JDBC driver ( http://www.worldserver.com/mm.mysql/.)



Configuration depends on your web server and servlet engine. After getting Apache, JServ and the JSP reference implementation working, I had to make two changes to run my application: first, I had to add the mm.mysql driver to the permanent classpath of the servlet engine.



/usr/local/jserv/conf/jserv.properties

wrapper.classpath=/usr/local/jdk117_v1a/lib/mysql.jar

The java class files (servlets, beans, and supporting files) are packaged in a JAR file, which I add as a JServ repository. Unlike the permanent class path, JServ monitors files in the repository and reloads them when they change so I can develop without restarting the servlet engine.

/usr/local/jserv/conf/zone.properties

repositories=/home/www0/weeds/weeds2.jar

[ Part 1 ]
Part 2
Part 3
Table of Contents

Chapter 15 of Professional Java Server Programming, © 1999 Wrox Press.
Produced by Honeylocust Media Systems, contact paul@honeylocust.com