HTML Redemption Language Manual

This is HTML Redemption Language (HRL for short), a language that redeems HTML by adding some tags that should have been in HTML from the beginning.

Features:

The HRL compiler, HRLC, translates HRL into HTML. It is meant to be used as a preprocessor, that is, it generates static web pages off-line. It really isn't fast enough to generate HTML on the fly.

HRLC was inspired by another package, hsc. In fact, most of the names of my tags are taken directly from hsc, except that the names of HRL tags are not preceded by $. HRL, however, is much more versatile than hsc, thanks to the power of Python.

HRLC is copyrighted by me, Carl Banks, under a BSD-style license. See the file LICENSE for details.

The home page of HRL is http://www.aerojockey.com/software/hrl.

Invocation

HRLC is a command line program. On Unix or Windows console, invoke it from the command line this way:

    hrlc <inputfile> [<inputfile> [...]] <outputfile>
The multiple <inputfile>s will be processed sequentially to form a single <outputfile>.

Note that <outputfile> is required. Use a hyphen "-" to specify standard input or output. There are currently no command line options.

There are several include files supplied with HRL, which may be specified as input files on the command line. These can beneficially modify the behavior of HRLC. For example, the following will automatically fill in the width and height attributes of image tags in myfile.hrl:

 hrlc imgsize.hri myfile.hrl myfile.html 

For a complete list of include files, see the section Include Files below.

Variables and Computed Attributes

String-valued variables can be defined in your HRL files by using <define>, for example:

    <define name="homepage" value="http://www.aerojockey.com/">
    <define name="prefix" value="bin/images/">
You can then access a variable with a computed attribute, like this:
    <a href="((homepage))">

An attribute value surrounded by a pair of parentheses is a computed attribute. HRLC substitutes in the value of the variable in the computed attribute. The nice thing about the above example is, if the URL of my home page changes, rather than change 200 or so links to it, I only need to change one variable definition and regenerate.

To output the value of a variable directly as text (rather than an attribute), you can use the <e> tag. For example:

    The URL of my home page is <e val="((homepage))">.

But computed attributes can be more powerful than simple variable substitution. In fact, a computed attribute can be any Python expression. The following exemplifies adding a prefix to an image location:

    <define name="prefix" value="/path/to/images/">
    <img src="((prefix+'hello.jpg'))">
If the location of all your image files change, then rather than edit 200 or so img tags, you only need to change the prefix variable, and regenerate, and the image tags are automatically updated.

Variables and computed attributes can also be used to control conditions. The <if> tag can selectively include or omit parts of the input. The following only prints "Success!" if the value of success is 'yes':

    <if cond="((success == 'yes'))">
    Success!
    </if>

Sometimes in a computed attribute, we want to use a function from a Python module. Python 1.5 users often want the string module to be available for computed attributes. However, it is quite unwieldy to import and use a module in a single expression. Fortunately, one can import a module, for use in all scriptlets and computed attributes, with the import tag:

    <import module="string">
    <e val="(( string.lower(text) ))">

Scriptlets

Using the <python> tag, Python can be embedded in the HRL file, allowing complex operations to be done automatically when generating the web page. I call the embedded Python a scriptlet, because it's a little script embedded in a web page. (The name is somewhat analogous to applet, which is an application embedded in a web page.)

Scriptlets bring the power of Python to web page design. Being a full-fledged programming language, you can perform arbitrary calculations when generating the web page. This can, among other things, simplify the task of keeping your web page up-to-date. To exemplify:

    <python>
    from time import time, ctime
    hrl.doc.write ('Page last generated at ' + ctime(time()) + '.')
    </python>
This simple scriptlet automatically puts the time the page was generated into the HTML output, relieving the human user from having to update that information manually. Note the use of hrl.doc.write for outputting text. There are other useful functions supplied, such as hrl.error and hrl.warning. See the section Attributes of the hrl global variable below for a complete list.

Python scriptlet can also access variables defined by the <define>. However, the reverse isn't quite true. A variable defined in a scriptlet is local to that scriptlet, and cannot be used in a computed attribute or another scriptlet, unless the variable is declared global. See the section Namespaces below for an explanation of this behavior.

Scriptlets can be much more powerful and complex than this, of course; the ultimate power of Python scriptlets is limited only by your computer's resources.

Scriptlets can be even more powerful when embedded in a macro definition, as we will see below.

Namespaces

As in Python, there are two namespaces in HRL, local and global. Each scriptlet gets a unique local namespace, so a variable set in one scriptlet will not be available in another. However, as in regular Python, you can use the global keyword to put the variable into the global namespace, where it can be accessed by other scriptlets, or by computed attributes:

    <python>
    global food
    food = 'spam'
    </python>

    <python>
    hrl.doc.write(food)
    </python>

    <e val="((food))">

There is one preset global variable: hrl. This is a collection of functions and other objects useful for someone writing scriptlets. For example, the function hrl.error prints an error message, and causes the processing to fail. The attributes of hrl are documented below, in the section "Attributes of the hrl global variable".

The local namespace is empty at top level, but inside a macro definition, the local namespace is populated with the attributes of the macro. Also, if the macro is a container, the local namespace contains the variable "content". The section Macros below documents some of the subtleties of macros in more detail.

Macros

Macros allow you to define your own tags. In the simplest case, consider a tag that inserts a little "New" icon. You could define it like this:

    <macro name="new">
    <img src="new.gif" alt="[NEW!]" vspace="5">
    </macro>

Once you've defined this macro, you only need use the new <new> tag to insert your "New" icon:

    <new>Updates to the "non-nude photographs of myself" section!

Macros can also define container tags. Suppose you want to boldface a lot of links. Perhaps the proper way to do this is with a style sheet. But this can be also be done easily with macros. You can define a new <ab> tag like this:

    <macro name="ab" req="href" container>
    <b><a href="((href))"><content></a></b>
    </macro>
This macro deserves a deeper look. The req attribute defines required tags for your macro as a comma-separated list. Here, there is only one required tag, href. The container attribute should be present whenever the macro is a container, as it is here.

As mentioned above, the required attributes enter the local namespace of the macro definition, so you can access href with a computed attribute. And because href is a required attribute, you need not worry whether href was supplied; you can just use it. HRLC makes sure the href attribute is supplied, raising an error if it isn't.

The <content> tag is replaced by the content of the macro, i.e., the stuff between <ab> and </ab>. However, content is also a variable in the local namespace, meaning that it could be used in a computed attribute. Suppose you want to define a tag that creates a link where the URL is also the text of the link. You can define it this way:

    <macro name="ae" req="href" container>
    <a href="((content))"><content></a>
    </macro>

What about optional tags? Suppose you wanted to write a very general image tag that automatically prepends "image/" and appends ".jpg" to the image filename, saving you much typing. But, you sill want the option of using some of the optional attributes of img. This is simple; the opt attribute of the macro definition specifies optional attributes as a comma-separated list. When the macro is invoked, optional tags that aren't specified are initialized to a special object, Not_Specified. And if the value of a computed attribute is Not_Specified, that attribute is removed.

In other words, the following macro will work as intended:

    <macro name="z" req="src" opt="height,width,border,alt">
    <img src="(( 'image/%s.jpg' % src ))" height="((height))"
         width="((width))" border="((border))" alt="((alt))">
    </macro>
If, for example, the macro is not given a height attribute, then the value of height in the macro's local namespace will be Not_Specified. Consequently, the height attribute of the <img> tag evaluates to Not_Specified. Therefore, the height attribute of the <img> will be removed.

Sometimes we want to use an optional attribute to affect behavior of a macro, that is, if an attribute is given, do this, if not, do that. The <ifspec> tag is helpful here; it checks whether an optional attribute was specified. For example, suppose you have a "document" macro that wraps the <html>,<head>, and <body> tags, and thus provides a consistent style for all web pages using it. Suppose that the document macro provides a link to the home page. However, we don't want a link to the homepage on the homepage itself. Therefore, we use an optional attribute to disable the homepage link. To wit:

    <macro name="document" req="title" opt="homepage" container>
    <html>
    <head>
    <title><e val="((title))"></title>
    </head>
    <body bgcolor="white" text="blue">
    <h1><e val="((title))"></h1>
    <content>
    <ifspec not tag="homepage">
        <p><hr><p>
        <a href="http://www.aerojockey.com/">Home</a>
    </ifspec>
    </body>
    </html>
    </macro>
If this macro is invoked with the homepage attribute present, the link to the home page is not generated.

Now, because the attributes and content are placed into the local namespace, an embedded Python scriptlet also has access to the variables. One can create very powerful macros with this technique. As a simple example, suppose you want to make a long list of links. You want the list to have as few extraneous characters as possible. You can create a macro with an embedded Python scriptlet to do this:

    <macro name="listolinks" container verbatim>
    <python>
    from string import split, strip
    for line in map(strip,split(content,'\n')):
        if len(line) == 0: continue
        url,name = split(line,'=')
	hrl.doc.write ('<a href="%s">%s</a><br>\n' 
                       % (strip(url),strip(name)))
    </python>
    </macro>
Note that this scriptlet accesses the content (look carefully in the for statement). Also notice that the macro definition includes an attribute, verbatim, that tells HRLC that the content is not HRL. The scriptlet takes each line of content, splits it at the equal sign, and outputs a link with text. With this macro, you can create a list of links like this:
    <listolinks>
    http://www.google.com/ = Google
    http://www.yahoo.com/ = Yahoo
    </listolinks>
There is virtually no limit to what you can accomplish when macros and scriptlets meet.

A word of caution about macros: macros defined with the <macro> tag cannot nest. However, there is another tag, <nmacro>, which you can nest another macro inside of. The reason for macros not nesting by default is that, thanks to some limitations of sgmllib, nestable macros can interfere with scripts and other non-SGML data. Python scriptlets and javascript should not be used within an <nmacro> tag, because it could accidently resemble SGML, in which case sgmllib will mangle it. Because it seemed more common to want to embed <python> tags than other <macro>s, I made non-nestable macros the default.

Another word of caution about macros: the macro attributes are only accessible in the macro definition, NOT IN THE CONTENT. So, the following won't work:

    <somemacro name="hello">
    <e val="((name))">
    </somemacro>
HRLC will complain that "name" is not in scope. The reason for this is quite complicated, and I won't explain it in any detail here. The technical reason is that HRL uses lexical scoping. Despite appearances, this is the Right Way(tm) to do it.

Unicode and International Character Set Support

HRL supports Unicode and HTML encodings. The <encoding> tag can change the encoding of the output, and, when given a meta attribute, it will output a <meta http-equiv . . .> tag that a web server might use to tell the browser what the encoding is. Here is an example of its use:

    <head>
    <encoding enc="utf-8" meta>
It should appear early in the HRL file, and inside the header if you wish to use the meta attribute. (Note that even if you use a <meta http-equiv . . .> tag, it won't change the encoding on all browsers. If you want to use different output encodings, you should make sure the server notifies the browser of the correct encoding in the HTTP headers. Some servers can be configured to send the right encoding to the browser automatically.)

Input files can also have different encodings. They can specify the encoding with a special declaration borrowed from Emacs. Suppose your files are ordinary, 8-bit text files in the Greek character set (ISO-8859-7). You can put the following line atop your HRL file, and HRLC will consider it encoded in Greek:

    <!--  -*- coding: iso-8859-7 -*-  -->

Note that input and output encodings have no effect on each other. The default encoding for both input and output is ISO-8859-1, that is, Latin 1. If the input encoding is changed to Greek, as above, the output encoding remains Latin 1, and HRLC will output all the Greek characters as Unicode character entities. So it's a good idea to use the <encoding> tag if you're not using the Roman alphabet.

HRLC supports Unicode character entities, as well as all named character entities defined by HTML 4.0.

Hooks

Hooks are an advanced concept. Whenever HRLC wants to output something, it calls a hook first (if any are defined), and outputs the return value instead. In this way, scriptlets can use hooks to postprocess the output of HRLC. They are useful for logging and for in-place substitutions.

To add a hook, you must call "hrl.add_hook(name,function)" from within a scriptlet, where name is the name of the hook, and function is a function to attach. You can remove a hook with "hrl.remove_hook(name,function)".

There are various hooks defined. There is a hook for every HTML tag. The name of the hook is the tag name, in lower case, plus '_tag'. Thus, the name of the hook for the <body> tag is 'body_tag'. For these hooks, the function is of the form:

    newtag, newattr = function(hrl, tag, attr)
where hrl is the global hrl variable (passed in case the function was not defined in HRL's global scope), tag is the name of the tag, and attr is a dictionary of its attributes.

There are also hooks for closing tags; their name is the tag name, in lower case, plus '_end'. For these hooks, the function is of the form (where the slash is not included in tag or newtag name):

    newtag = function(hrl, tag)

There are hooks for preprocessing text and entities. These are called "entity" for entities, "char" for numerical entities, "text" for regular text after substituting for &, >, and <, and "littext" for text before entity substitution. The function for these hooks has the form:

    newdata = function(hrl, data)
where data is the appropriate data for the hook.

Finally, there is the "visit" hook, called whenever an input file is visited (included or by command line). The function for this hook has the form:

    function(hrl, input_pathname, output_pathname)
where input_pathname is the name (including path) of the visited file, and output_pathname is the name of the file being created.

Reference: HRL Tags

Here are the tags defined by HRL. Optional attributes of the tags are surrounded by square brackets.

<include file="filename" [verbatim]>

Include a file. HRLC searches for the file in the input file's directory, the output file's directory, the current directory, the library directory. It is an error if the file is not found. The included file is processed as HRL, unless the verbatim attribute is given, in which case the file's contents are inserted as-is.

<python [verbatim]>
    python-code
</python>

Embed Python code in the HRL file. When HRLC comes to the <python> tag, it reads and executes the content. The Python scriptlet can output text by calling hrl.doc.write(). The text is processed as HRL, unless the verbatim attribute is given, in which case the the output is inserted as-is.

<define name="name" value="value">

Defines a variable, and initializes it with value. If the variable already exists, it is overwritten. The variable can be accessed by Python scriptlets or computed attributes.

Computed attributes are attribute values surrounded by a pair of parentheses. When HRLC encounters one, it evaluates the value as a Python expression. For example, I could define and use a homepage variable this way:

    <define name="homepage" value="http://www.aerojockey.com/">
    <a href="((homepage))"> 
In fact, a computed attribute can be any Python expression, so you can do things like <a href="((baseurl+filename))">.

<e val="value">

Echos the value. This is useless, of course, unless value is computed attribute.

<import module="pythonmodule">

Import a module into the global namespace. This makes the module available to all Python scriptlets and computed attributes.

<encoding enc="charset" [meta]>

Change the output encoding to charset. If the meta attribute is supplied, output a <meta http-equiv . . .> tag to inform the web browser of the encoding used.

<if [not] cond="condition">
    content
</if>

Only process the content if the condition is true. This is most often used with computed attributes, where the condition is a Python expression. The trueness of an expression in HRL is the same as in Python. Obviously, the not attribute inverts the test. Sorry, there is no else tag.

<ifspec [not] attr="attribute">
    content
</if>

A special form of if intended for use in macro definitions. HRLC processes the content if the attribute was specified in the macro invocation. The not attribute inverts the test.

<macro name="macroname" [container [verbatim]]
        [req="requiredtags"] [opt="optionaltags"]>
    macro definition
</macro>

Defines a macro named macroname. Defining a macro is basically like defining a new tag. The macro is a container if the container attribute is given. If verbatim is given, the content (not the definition) of the macro is used as-is; otherwise, it is processed as HRL. req lists the required attributes for this macro; opt lists the optional attributes.

Macro tags cannot be nested. However, the macro tag may have a number suffixed to it. For example, <macro1...<, <macro2...< , etc., all define macros. It is legal to nest macros as long as the two macros use different numbers. (I recommend a suffix indicating the level of nesting.)

<content>

To be used only in the definition of container macros. When the macro is invoked, <content> is replaced by the contents of the macro.

<deftag name="tag" [container [text [formatted] [cdata]]]
        [req="requiredtags"] [opt="optionaltags"]
        [inside="insidetags"] [closes="closestags"]>

Define a new HTML tag. There are times when the built-in list of tags is not enough. Because HRLC checks the validity of HTML tags (somewhat), this tag provides a mechanism to add new tags to its list. You can also add new tags permanently by editing hrltags.py.

The new tag is a container if the container attribute is given, and the container may contain plain text if the text attribute is given. If formatted is specified, whitespace in the content is important (as in <pre>), otherwise whitespace is decimated. If cdata is specified, it means that the contents are not to be processed as HRL, but copied verbatim to output. req lists required attributes, separated by commas; opt lists optional attributes. inside lists possible container tags (as <li> must be inside <ul>,<ol>,etc.). closes lists container tags that are to be automatically closed if this tag appears.

Reference: Attributes of the hrl global variable

hrl.error(message)

Prints an error message, and causes the processing to fail.

hrl.warning(message)

Prints a warning message.

hrl.doc

Is a string output object. You can call its write method within a scriptlet to output HTML. For example:

    <python>
    hrl.doc.write ("<h1>Hello, World!</h1>")
    </python>

hrl.univattrs

Is a tuple of universal attributes, that is, attributes that HRL accepts for any tag. The initial value is ("id","style","class"). You may change it if you wish.

hrl.exit_scriptlet()

Exits early from a scriptlet.

hrl.search_open(filename, mode, dirlist)

Search the given list of directories for filename. Open it in given mode if found. Otherwise, throw IOException. Useful, perhaps, in conjunction with hrl.searchpath.

hrl.searchpath

A list of directories in which HRL searches for include files. Useful as an argument for hrl.search_open. You may append or insert your own paths into it, if that's useful.

hrl.input_pathname
hrl.input_filename
hrl.input_dirname

Full pathname, directory name, and filename of the current input file.

hrl.output_pathname
hrl.output_filename
hrl.output_dirname

Full pathname, directory name, and filename of the output file.

hrl.add_hook(name, function)

Adds a hook function.

hrl.remove_hook(name, function)

Removes a given hook function.

hrl.remove_outer_hook(name)

Removes the previously added hook function for given hook.

hrl.Not_Specified

Gives computed attributes direct access to the Not_Specified object, if they need it.

Include Files

HRL comes with some include files. These are described briefly below.

imgsize.hri

This file adds a hook to image tags that automatically calculates width and height attributes. It requires Python Imaging Library. Highly recommended.

imgdir.hri

This file adds hooks to img, link, and body tags. If you have a single image directory, then these hooks allow you to refer to image files with only the filename, not the directory. The image directory is specified by defining the variable imagedir. Also defines a macro aimg for linking to an image.

urllist.hri

This defines a macro, urllist, that allows easy definition of an organized list of links, with nesting. A link is a single line, beginning with a number of plus signs (+), followed by the URL, followed by a vertical bar (|), followed by the text of the link. The number of plus signs indicates the level of nesting. Empty lines cause empty space in the output. Lines beginning with asterisks (*) insert a header, the number of asterisks being the level of the header.

tildesub.hri

Enables tilde substitution. Especially for users of TeX. Causes all tildes (~) to be converted to non-breaking spaces (&nbsp;). Also defines a new entity, &tilde;, which, of course, allows a tilde to be output.

egg.hri

Defines a container tag, <easteregg>, that allows HRLC to output an HTML comment, which are normally removed.

cite.hri

Simple macros for adding citations. Kind of like BibTeX, but much worse.

footnote.hri

Macros for adding footnotes. You can create a footnote with <footnote></footnote>, and output the complete list of footnotes at the bottom with <listfootnotes>.

php.hri

Defines a macro, <php>, that embeds a PHP script in the output. Entering a PHP script normally won't work, because sgmllib doesn't correctly interpret the <? and ?> markers.

Miscellaneous Features

HRLC knows all the HTML 4 entities and replaces them with their binary values in output if the output encoding supports it. HRLC will leave any entities it does not recognize alone, but will raise a warning. Currently, no other named entities are supported, but you can define them with the <defent> tag

HRLC decimates whitespace, except inside of formatted and cdata tags. Strings of whitespace including at least one newline are decimated to a single newline. Strings of whitespace not including a newline are decimated to a single space. The overall effect on the web page is nothing, since HTML itself decimates whitespace, but it does save significant bytage.

HRLC removes sgmllib's interpretation of what an SGML comment is from the output. This is the appropriate action; comments are useless to the web browser. However, if you would like to leave in a comment as an Easter Egg, the include file "egg.hri" defines a macro that allows HRLC to output an HTML comment.

HRL has rules for automatically inserting closing tags when it seems reasonable. For example, <li> will close the previous <li> tag, if it wasn't closed itself.

Pitfalls

Bugs

Possible Future Work

I enjoy this program, I use it to generate my web pages, and so I work on it pretty often. Future directions are pretty much governed by my own web page needs.

I would really like to get rid of sgmllib.

Feedback

I change my email address all the time to keep me one step ahead of the spammers. I keep my current email address updated at the HRL home page. I would appreciate reproducible bug reports, contributions of useful include files, comments on the quality of the documentation, etc.