Tapir User Manager: Installation and Configuration Management

Overview

As TUM is a product that can be installed in diverse environments, TUM contains an installation and configuration management system that
  1. Installs the TUM software package in the web server's file tree,
  2. allows simples customization of installation points, email addresses and other parameters,
  3. generates Apache configuration files enabling the features that TUM uses,
  4. generates site-specific cryptographic secrets,
  5. generates scripts for initializing the database,
  6. allows site developers to overlay specific TUM files with site-specific files.
Trade-offs exist in the configuration of server products. The most versatile place to store configuration information is in the database -- here, the adminstrator of a site can change the properties of the site with a click of a mouse and see them applied instantly without recompilation or rebooting. This would be ideal for a consumer product designed for low- volume sites. However, database access is expensive, and we can acheive better performance if some configuration information is statically built into the product at compile time or load time. TUM also uses the Apache web server, which reads configuration files at load time and stores parameters in RAM.

Because TUM needs to configure files in a number of languages (PHP scripts, Apache configuration files, as well as shell and SQL scripts) I chose to use the versatile M4 macro preprocessor for configuration. Nearly all of the files in the TUM package are processed by the M4 file before they are executed or installed in the server's file tree.

Operation

TUM configuration proceeds in three stages.
  1. Generation of random material
  2. Generation of configuration files and non-PHP scripts,
  3. Installation of PHP scripts into the file tree.
Steps 1 and 2 are performed by the Configure script in the top directory. If the file conf/secrets.m4 does not exist, the Configure script runs script/generate-keys, which generates a random number used to cryptographically sign session cookies as well as database passwords, and the password of the initial user. This information is generated randomly using the /dev/random device included in Linux and some other versions of UNIX so that it is different for every TUM installation, preventing abuse.

In stage 2, the directories script/, site-script/, sql/, site-sql/ and external/ are recursively traversed. Every file ending in .m4 is processed through the M4 preprocessor and the output file is stored in a file of the same name with the .m4 extension deleted. (sql/tapir.sql.m4 is transformed into sql/tapir.sql.)

Stage 3 is performed by script/tapir-install, which is generated in stage 2 from the script/tapir-install.m4 file. In stage 3, the directories src/ and site-src/ are recursively traversed -- files ending in .m4 are processed and a parallel tree of files is constructed under the web server's file tree, specified by //home/www/www.tcgreens.org/auth . (For instance, src/login-form.php.m4 could be transformed into /home/www/root/auth/login-form.php.

Files in the site-src/ directory are overlaid on top of files in the src/ directory. That is, if you want to place a new file in the auth/ tree, you can put an .m4 file that will generate it in site-src. Also, if you want to replace a file in the src/ tree, you can do so by placing a file with the same name in site-src. This provides a mechanism for separating standard TUM files from site-specific additions and modifications that can be used with or without version control software such as CVS. If you're making modifications specific to a particular site, you should never modify files in the src/ tree.

One advantage of the install procedure is that it keeps junk files from accumulating in the web tree. If, for instance, you use emacs to edit files in the web tree, you'll create a number of backup (file~) and auto-save (#file) files that can, potentially, cause a security problem. Similarly, CVS generates a number of directories titled CVS. Because only files ending in .m4 are copied (not quite true, we copy .gif files too!), we avoid putting files into the web tree accidentally.

The tapir-install script has several options. By default, the tapir-install script works like the make command, and rebuilds files only if the .m4 file has been updated since the file in the web tree. In this mode, no directories are created. This mode is useful for copying incremental changes during development. If the -a (all) option is used (script/tapir-install -a) then the script deletes the contents of the auth/ tree on the web server, creates any directories and rebuilds all of the files. It is necessary to the use the -a option when installing TUM for the first time, when the conf.m4 file is changed, when a directory is created or when a file is deleted.

The tapir-install script, by default, passes the output of the M4 preprocessor through the file script/nuke-comments which currently removes full-line comments starting with # from the file -- because PHP currently parses each script completely each time a script is executed, this allows us to use comments freely without harming performance. (This will be less necessary when the Zend Script Cache is released.) Unfortunately, this breaks the correspondence between line numbers in the .php.m4 and .php files, making debugging difficult. To prevent this, set the DEBUG environment variable to 1.

How we use M4

Documentation for M4 is available here or by typing info m4 on many Unix systems. M4 is a macro preprocessor that, mostly, copies input to the output stream, modifying the stream when certain commands are recieved. M4 functions look a lot like C functions or PHP functions, so we use M4's -P option to preface all M4 functions with the string m4_ (define() is replaced with .) We also use the convention that constants specified in conf/conf.m4 be prefixed by M4 and written in all caps (/local/local/tapir/apache).