HTMLTemplate

----------------------------------------------------------------------
SUMMARY

A fast, powerful, easy-to-use HTML templating system.

----------------------------------------------------------------------
DESCRIPTION

HTMLTemplate converts HTML/XHTML templates into simple Python object models that can be manipulated through callback functions in your scripts.


======= About HTML Templates =======

An HTML template is usually a complete HTML/XHTML document, though it may also be a fragment of one - HTMLTemplate doesn't require the document be complete, nor even that it has a single root element.

To create a Template object model, selected HTML elements must be annotated with 'compiler directives', special tag attributes that indicate how the object model is to be constructed. Here are some examples:

	<h1 node="con:title">Welcome</h1>

	<img node="con:photo" src="" />

	<a node="rep:navlink" href="#">Catalogue</a>

	<span node="-sep:navlink"> | </span>

	<div node="del:"> ... </div>


One restriction does apply when authoring templates: the template's HTML elements must be correctly closed according to XHTML rules. For example, this markup is acceptable:

	<p>Hello World</p>

	<hr />

but this is not:

	<p>Hello World

	<hr>


======= Compiler Directives =======

HTMLTemplate defines four types of compiler directive:

- 'con' defines a Container node that can appear only once at the given location
- 'rep' defines a Repeater node that can appear any number of times
- 'sep' defines a separator string to be inserted between each iteration of a Repeater object of the same name
- 'del' indicates a section of markup to be omitted in the compiled template.

The special attribute name can be anything (the default is 'node' but any other name may be specified via the Template constructor), and its values are typically of form "FOO:BAR", where FOO is a three-letter code indicating the type of directive and BAR is the name of the node to create. Every node name must be a valid Python identifier with two additional restrictions: 1. it cannot begin with an underscore, and 2. it cannot match the name of any method or property belonging to the Container, Repeater and Template classes. (Note: the 'del' directive doesn't create a named node, so may simply be written as "del:") Directive types and node names are both case-sensitive.

HTMLTemplate also supports a single directive modifier, '-', also known as the 'minus tags' modifier. When prepended to a directive, e.g. "-con:foo", the minus tags modifier indicates that the HTML element's tags should be omitted in the compiled node/separator string. Use this modifier when adding an arbitrary HTML element (typically <div> or <span>) to an HTML template purely to construct a node or separator string to prevent the rendered page being cluttered with the leftover tags.


======= The Template Object Model =======

The HTMLTemplate object model is really just a greatly simplified, highly specialised variation of the standard DOM, in which only specified HTML elements can be manipulated via a very compact, simple API that's designed specifically for templating.

The Template object model is constructed from three classes: Template, Container and Repeater.

The Template object forms the template's root node, representing the complete HTML document. Contains one or more Container/Repeater child nodes, and implements the render() method used to generate finished pages.

A Container object represents a modifiable HTML element. e.g.:

	<title node="con:pagetitle">...</title>

The HTML element may be empty - e.g. '<br />' - in which case it has no content and only its tag attributes are modifiable, or non-empty - e.g. '<p>...</p>' - in which case it may contain either modifiable content (plain text/markup) or other Container/Repeater nodes. The standard Container node has a one-to-one relationship with its parent node, appearing only once [1] when rendered.

A Repeater node is a Container node that has a one-to-many relationship with its parent node, appearing zero or more times - once for each item [1] in the collection being iterated by its repeat() method. e.g.:

	<ul>
		<li node="rep:listitem">...</li>
	</ul>


The Repeater class's repeat() method operates in a similar fashion to Python's built-in map() function, except that that it passes an extra argument to the supplied callback function and doesn't return a result. For example, the call:

	myRepeaterNode.repeat(callback, [1, 2, 3, 4, 5], *args)

will call the given callback function five times, each time passing it a copy of myRepeaterNode along with an item from the given list, as well as any additional arguments supplied by the user. The callback function can then manipulate the supplied Repeater object, inserting data into the HTML element's tag attributes and/or content, or modifying its child nodes, or calling its omit() method to prevent that instance of the Repeater from being rendered in the finished page.


-------

[1] Except when the object's omit() method is called, in which case the HTML element it represents is omitted from the finished page.


======= Controlling Template Rendering =======

Template rendering is controlled by a user-defined function, typically named 'renderTemplate', that's attached to the Template object as it's created and triggered automatically each time the Template object's render() method is called.

When the Template's render() method is called, the Template object calls its attached renderTemplate function, passing it a copy of itself along with any additional arguments passed via the render() call. This allows the renderTemplate function to manipulate this object model - inserting the user-supplied data into nodes as tag attributes and content, omitting unwanted sections, even rearranging the object model itself(!). Once the renderTemplate function returns, the now-modified object model is rendered to text and returned.

It's also possible to manipulate the Template object model directly, prior to calling its render() method. This can be useful if you have some data you want to appear in every rendered page but don't wish to re-render it each time for efficiency's sake.


======= Compiling a Template =======

A single Template object can be used to render any number of pages.

To compile a template, create a new instance of HTMLTemplate's Template class with the main callback function and the HTML text as arguments:

	template = HTMLTemplate.Template(renderTemplate, html)

Two optional arguments may also be provided:

- node -- The name of the tag attributes used to hold compiler directives. The default is 'node', but may be changed to any other valid attribute name; e.g. 'id', 'obj', 'foo:bar'. This can be useful if, for example, you have to edit or process your templates in an application that automatically rejects all non-valid HTML attribute names.

- codecs -- Allows the default HTML entity encoding/decoding functions to be replaced. These functions are applied when getting or setting the value of a Container or Repeater node's content property. By default, only the four markup characters, <>&" are converted. This minimal level of conversion is provided for security's sake, but you may want to replace these functions with your own if you need also to escape non-ASCII or other characters as standard.


----------------------------------------------------------------------
CLASSES

Node -- Abstract base class
	Properties:
		NAME : Container | Repeater -- a (child) node defined by the source HTML template ('NAME' = the node's name)



Attributes -- A simple dict-like structure containing an HTML tag's attributes; supports getting, setting and deleting of attributes by name, e.g. node.atts['href'] = 'foo.html'



Container(Node) -- A mutable HTML element ('con')
	Properties:
		atts : Attributes -- the tag's attributes

		content : string -- the HTML element's content with &<>" characters automatically escaped (note: when inserting raw HTML, use the raw property instead) [1]

		raw : string -- the HTML element's raw content (i.e. no automatic character escaping) [1]

	Methods:
		omit() -- don't render this node

		omittags() -- don't render this node's tags (only its content)



Repeater(Container) -- A mutable, repeatable HTML element ('rep')
	Methods:
		repeat(fn, sequence, *args) -- render an instance of this node for each item in sequence
			fn : function -- the function to call for each item in list [2]
			sequence : anything -- a list, tuple, or other iterable collection
			*args : anything -- any values to be passed directly to this node's callback function



Template(Node) -- The top-level template node
	Methods:
		__init__(self, callback, html, attribute='node', codecs=(defaultEncoder, defaultDecoder))
			callback : function -- the main function controlling template rendering [3]
			html : string or unicode -- the HTML template
			[attribute : string or unicode] -- name of the tag attribute used to hold compiler directives (default='node')
			[codecs : tuple] -- a tuple containing two functions used by attribute values and the content property to encode/decode HTML entities [4]
			[warnings : boolean] -- warn when non-directive attribute is encountered (default=False)

		render(*args) -- render this template
			*args : anything -- any values to be passed directly to this template's callback function 
		
		structure() -- print the object model's structure for diagnostic use



ParseError(Exception) -- A template parsing error


-------

[1] The content and raw properties can only be used when the Container/Repeater object is derived from a non-empty HTML element containing plain text/markup only. If the HTML element is empty (e.g. <br />) or contains any child nodes, the operation is ignored/an AttributeError occurs.

[2] The Repeater's callback function must accept the following arguments: 
	node : instance -- a copy of the Repeater object
	item : anything -- an item from the sequence being iterated
	*args : anything -- zero or more additional parameters corresponding to any extra arguments passed by the user to the Repeater's repeat() method

[3] The Template's callback function must accept the following arguments: 
	node : instance -- a copy of the Template object
	*args : anything -- zero or more additional parameters corresponding to any extra arguments passed by the user to the Template's render() method

[4] The default codec functions encode/decode the four standard markup characters: &<>". When supplying your own, both replacement functions should accept and return a single string/unicode value. The first function should convert specified characters into HTML entities; the second should perform the reverse operation.

----------------------------------------------------------------------
EXAMPLES

Tutorials:

- Tutorial_1.py
- Tutorial_2.py


Bundled examples:

- Demo1_Quote.py
- Demo2_Table.py
- Demo3_Links.py
- Demo4_SimpleCalendar.py
- Demo5_AlternatingRowColors.py
- Demo6_UserList.py


Working scripts (see <http://freespace.virgin.net/hamish.sanderson/>):

- HTMLCalendar
- appscript.htmldoc
- iTunes_albums_to_HTML.py

----------------------------------------------------------------------
NOTES

======= Template design tips =======

- Where two or more sibling nodes share the same type and name, only the first is included in the compiled template and the rest discarded. (If two or more sibling nodes share the same name but have different types, then unless the first node is type 'rep' and the other of type 'sep' a ParseError will occur.)

- The parser automatically removes the special attribute from any element it converts into a template node. Tag attributes whose name is the same as that used for special attributes but whose value isn't a recognised compiler directive are treated are left unchanged.

- Separators must be declared after their corresponding Repeater nodes.

- When authoring a template, you'll sometimes want to group two or more adjacent nodes so they can be repeated as a single block. If the HTML doesn't already contain a suitable parent element to add the 'rep' compiler directive to, insert an extra <div> or <span> element that wraps these nodes and add the 'rep' directive to convert it to create your Repeater node. You can the use the 'minus tags' modifier to omit this element from the rendered page. The Tutorials.txt file covers this technique in more detail.


======= Controller design tips =======

- When setting a node's content, make sure you write:

	node.foo.content = val

not:

	node.foo = val

The first assigns val as the node's content, the second replaces the node itself.


======= Template rendering tips =======

- The attributes property, atts, performs basic validation of user-supplied attribute names and values for security:
	- An attribute's name must match the pattern '^[a-zA-Z_][-.:a-zA-Z_0-9]*$'
	- An attribute's value may not contain both single and double quotes. 

- While HTMLTemplate will single/double-quote attribute values as appropriate, it won't perform any special encoding of values. Any attribute value encoding is left to the user's code, e.g. using urllib's quote() and unquote() functions.


======= Miscellaneous notes =======

- The public class structure shown in this documentation is slightly simplified from the actual (multiple inheritance-based) implementation to make it easier to understand. This is not something end users should worry about.

----------------------------------------------------------------------
KNOWN ISSUES

- Jarek Zgoda reports that Python's HTMLParser module expands the following entity references where they appear in tag attributes values: &amp; &lt; &gt; &quot; - this is probably a bug. When defining HTML templates, use the equivalent character references - &#38; &#60; &#62; &#34; - within attribute values as these are not affected. Note that entity references appearing within HTML elements' content are not affected by this problem, nor are values inserted during template rendering (which are already subject to their own escaping rules). [2004-06-16] Stephen D Evans points out that &apos; is also decoded, and that this is partial decoding of attribute values is clearly deliberate as there's a unit test to cover it. (Note: HTMLParser's treatment of tag attribute values is very strange, as this partial decoding makes it impossible to safely decode other entities later on - e.g. consider how something like <foo bar="&amp;copy;"/> should/would be treated.) UPDATE: HTMLTemplate now overrides the undocumented HTMLParser method responsible for this partial decoding of attribute values so that the original values are preserved; this is a bit of a kludge, but the HTMLParser class's implementation is unlikely to change so should not present any problems.

- Python's HTMLParser module automatically lowercases all tag and attribute names. This shouldn't present any problems for templating HTML (which is case-insensitive by nature) nor for templating XHTML (which is all-lowercase anyway), but will cause problems for any ad-hoc XML templating where tags and attributes contain both lower and uppercase characters.

- HTMLTemplate's current API isn't well suited to event-driven use or introspection. Modified designs are being explored to address these limitations; input is welcome.

- A built-in convenience option for loading HTML templates from disk, e.g. Template(fn, file='foo.html'), may be worth adding if there's sufficient demand for it. Somewhat reluctant to do this as this isn't core functionality and file-based template management would be better done by a separate dedicated module, but pragmatism may be better choice - we'll see. As regards implementing a separate templatemanager module (the preferred solution), a simple implementation might define a class, each instance of which represents a directory where templates are kept, allowing templates to be loaded by filename, e.g.:

f = templatemanager.Folder('/path/to/templates/folder/')
foo = HTMLTemplate.Template(render_foo, f.template('foo.htm'))
bar = HTMLTemplate.Template(render_bar, f.template('bar.htm'))

A more sophisticated system could wrap HTMLTemplate completely to e.g. automatically associate on-disk Controller script and HTML template files with one another (e.g. name-based association would automatically pair 'FooTemplate.html' with 'FooController.py'), serve up compiled templates by name upon request and provide nifty stuff like automatic updates so that when a template's html or controller script is updated it automatically recompiles the template object, which'd be useful in long-running systems. Code coupling should be minimised and localised to allow the templatemanager module to be reused with other templating systems with minimal effort.


----------------------------------------------------------------------
TO DO

- Replace HTMLParser

----------------------------------------------------------------------
DEPENDENCIES

- Python 2.3+

----------------------------------------------------------------------
HISTORY

2006-03-10 -- 1.4.2; changed from LGPL to MIT license

2006-01-08 -- 1.4.1; fixed attribute value escaping (thanks TWY, RB)

2005-11-17 -- 1.4.0; attribute values are now escaped using supplied codecs functions (thanks TWY); added workaround to prevent HTMLParser from decoding some HTML entities in attribute values when parsing template markup

2005-10-15 -- 1.3.0; added 'warnings' parameter to Template constructor (thanks FS)

2005-04-18 -- 1.2.1; fixed rendering of processing instructions so they don't include an extra trailing '?' (thanks AC)

2005-02-01 -- 1.2.0; fixed parsing error bug when separator tags contained other attributes, e.g. <hr node="sep:item" width="3" /> (thanks KM); now supports content and raw properties on nodes containing sub-nodes; expanded source code comments

2005-01-07 -- 1.1.2; fixed missing <td> tags in Demo5_AlternatingRowColors.py that caused HTML table to render incorrectly in Firefox (thanks FS)

2004-08-08 -- 1.1.1; now handles tag attributes without values, e.g. <hr norule>, node.atts['norule']=None (thanks GD); better type checking on tag attribute assignment

2004-06-16 -- 1.1.0; internal optimisations boost rendering speed by 3-5x; fixed faulty error message in RichContent.__setattr__; added more information on Python's HTMLParser to Known Issues

2004-06-01 -- 1.0.0; final release

----------------------------------------------------------------------
AUTHOR

- HAS <hamish.sanderson@virgin.net> <http://freespace.virgin.net/hamish.sanderson/>

----------------------------------------------------------------------
CREDITS

- Many thanks to Richard Boulton, Bud P Bruegger, Antonio Cavedoni, Graham Dumpleton, Ronald van Engelen, Stephen Evans, Matthias Fiebig, Tomas Jogin, Edvard Majakari, Ksenia Marasanova, Felix Schwarz, Tung Wai Yip, Simon Willison and Jarek Zgoda for comments, suggestions and bug reports.

----------------------------------------------------------------------
COPYRIGHT

HTMLTemplate - A fast, powerful, easy-to-use HTML templating system.

Copyright (C) 2004 HAS <hamish.sanderson@virgin.net>