Mega merge from Sam.

2007-03-17 00:45:49 -05:00 · 2007-03-17 00:45:49 -05:00 · 56ee34a7f0
commit 56ee34a7f0
parent 215777b9ee d1c1bd2c23
60 changed files with 1566 additions and 138 deletions
--- a/4
+++ b/4
@ -9,8 +9,10 @@ Harry Fuecks    - Pipe characters in file names, filter bug
 Eric van der Vlist - Filters to add language, category information
 Chris Dolan     - mkdir cache; default template_dirs; fix xsltproc
 David Sifry     - rss 2.0 xslt template based on http://atom.geekhood.net/
-Morten Fredericksen - Support WordPress LinkManager OPML
+Morten Frederiksen - Support WordPress LinkManager OPML
 Harry Fuecks    - default item date to feed date
+Antonio Cavedoni - Django templates
+Morten Frederiksen - expungeCache

 This codebase represents a radical refactoring of Planet 2.0, which lists
 the following contributors:
--- a/7
+++ b/7
@ -1,13 +1,6 @@
 TODO
 ====

-  * Expire feed history
-
-    The feed cache doesn't currently expire old entries, so could get
-    large quite rapidly.  We should probably have a config setting for
-    the cache expiry, the trouble is some channels might need a longer
-    or shorter one than others.
-
  * Allow display normalisation to specified timezone

    Some Planet admins would like their feed to be displayed in the local
--- a/docs/config.html
+++ b/docs/config.html
@ -61,8 +61,13 @@ material information.</dd>
 can be found</dd>
 <dt><ins>bill_of_materials</ins></dt>
 <dd>Space-separated list of files to be copied as is directly from the <code>template_directories</code> to the <code>output_dir</code></dd>
+<dt>filter</dt>
+<dd>Regular expression that must be found in the textual portion of the entry</dd>
+<dt>exclude</dt>
+<dd>Regular expression that must <b>not</b> be found in the textual portion of the entry</dd>
 <dt><ins>filters</ins></dt>
-<dd>Space-separated list of filters to apply to each entry</dd>
+<dd>Space-separated list of <a href="filters.html">filters</a> to apply to
+each entry</dd>

 </dl>
 <dl class="compact code">
@ -96,8 +101,8 @@ use for logging output.  Note: this configuration value is processed
 <a href="http://docs.python.org/lib/ConfigParser-objects.html">raw</a></dd>
 <dt>feed_timeout</dt>
 <dd>Number of seconds to wait for any given feed</dd>
-<dt><del>new_feed_items</del></dt>
-<dd>Number of items to take from new feeds</dd>
+<dt>new_feed_items</dt>
+<dd>Maximum number of items to include in the output from any one feed</dd>
 <dt><ins>spider_threads</ins></dt>
 <dd>The number of threads to use when spidering. When set to 0, the default, 
 no threads are used and spidering follows the traditional algorithm.</dd>
@ -106,6 +111,10 @@ no threads are used and spidering follows the traditional algorithm.</dd>
 directory to be used for an additional HTTP cache to front end the Venus
 cache.  If specified as a relative path, it is evaluated relative to the
 <code>cache_directory</code>.</dd>
+<dt><ins>cache_keep_entries</ins></dt>
+<dd>Used by <code>expunge</code> to determine how many entries should be
+kept for each source when expunging old entries from the cache directory.
+This may be overriden on a per subscription feed basis.</dd>
 </dl>
 <p>Additional options can be found in
 <a href="normalization.html#overrides">normalization level overrides</a>.</p>
--- a/docs/contributing.html
+++ b/docs/contributing.html
@ -0,0 +1,67 @@
+<!DOCTYPE html PUBLIC
+    "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
+    "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+<script type="text/javascript" src="docs.js"></script>
+<link rel="stylesheet" type="text/css" href="docs.css"/>
+<title>Contributing</title>
+</head>
+<body>
+<h2>Contributing</h2>
+<p>If you make changes to Venus, you have no obligation to share them.
+And unlike systems based on <code>CVS</code> or <code>subversion</code>,
+there is no notion of &ldquo;committers&rdquo; &mdash; everybody is
+a peer.</p>
+<p>If you should chose to share your changes, the steps outlined below may
+increase your changes of your code being picked up.</p>
+
+<h3>Documentation and Tests</h3>
+<p>For best results, include both documentation and tests in your
+contribution.</p>
+<p>Documentation can be found in the <code>docs</code> directory.  It is
+straight XHTML.</p>
+<p>Test cases can be found in the
+<a href="http://localhost/~rubys/venus/tests/">tests</a> directory, and
+make use of the
+<a href="http://docs.python.org/lib/module-unittest.html">Python Unit testing framework</a>.  To run them, simply enter:</p>
+<blockquote><pre>python runtests.py</pre></blockquote>
+
+<h3>Bzr</h3>
+<p>If you have done a <a href="index.html">bzr get</a>, you have already set up
+a repository.  The only additional step you might need to do is to introduce
+yourself to <a href="http://bazaar-vcs.org/">bzr</a>.  Type in the following,
+after replacing the <b>bold text</b> with your information:</p>
+
+<blockquote><pre>bzr whoami '<b>Your Name</b> &lt;<b>youremail</b>@<b>example.com</b>&gt;'</pre></blockquote>
+
+<p>Then, simply make the changes you like.  When you are done, type:</p>
+
+<blockquote><pre>bzr st</pre></blockquote>
+
+<p>This will tell you which files you have modified, and which ones you may
+have added.  If you add files and you want them to be included, simply do a:</p>
+
+<blockquote><pre>bzr add file1 file2...</pre></blockquote>
+
+<p>You can also do a <code>bzr diff</code> to see if there are any changes
+which you made that you don't want included.  I can't tell you how many
+debug print statements I have caught this way.</p>
+
+<p>Next, type:</p>
+
+<blockquote><pre>bzr commit</pre></blockquote>
+
+<p>This will allow you to enter a comment describing your change.  If your
+repository is already on your web server, simple let others know where they
+can find it.  If not, you can simply ftp or scp the files to your web server
+&mdash; no additional software needs to be installed on that machine.</p>
+
+<h3>Telling others</h3>
+<p>Once you have a change worth sharing, post a message on the
+<a href="http://lists.planetplanet.org/mailman/listinfo/devel">mailing list</a>.</p>
+<p>Also, consider setting up a <a href="http://bzr.mfd-consult.dk/bzr-feed/">bzr-feed</a> for your repository, so people who wish to do so can automatically
+be notified of every change.</p>
+<p>There now is even an nascent <a href="http://planet.intertwingly.net/venus/">planet</a> being formed which combines these feeds of changes.  You can <a href="http://planet.intertwingly.net/venus/atom.xml">subscribe</a> to it too.</p>
+</body>
+</html>
--- a/docs/filters.html
+++ b/docs/filters.html
@ -13,7 +13,7 @@
 parameters come from the config file, and output goes to <code>stdout</code>.
 Anything written to <code>stderr</code> is logged as an ERROR message.  If no
 <code>stdout</code> is produced, the entry is not written to the cache or
-processed further.</p>
+processed further; in fact, if the entry had previously been written to the cache, it will be removed.</p>

 <p>Input to a filter is a aggressively
 <a href="normalization.html">normalized</a> entry.  For
@ -46,9 +46,26 @@ expressions.  Again, parameters can be passed as
 <a href="../tests/data/filter/xpath-sifter2.ini">URI style</a>.
 </p>

+<p>The <a href="../filters/regexp_sifter.py">regexp sifter</a> operates just
+like the xpath sifter, except it uses
+<a href="http://docs.python.org/lib/re-syntax.html">regular expressions</a>
+instead of XPath expressions.</p>
+
 <h3>Notes</h3>

 <ul>
+<li>Filters are executed when a feed is fetched, and the results are placed
+into the cache.  Changing a configuration file alone is not sufficient to
+change the contents of the cache &mdash; typically that only occurs after
+a feed is modified.</li>
+
+<li>Filters are simply invoked in the order they are listed in the
+configuration file (think unix pipes). Planet wide filters are executed before
+feed specific filters.</li>
+
+<li>Any filters listed in the <code>[planet]</code> section of your config.ini
+will be invoked on all feeds.  Filters listed in individual
+<code>[feed]</code> sections will only be invoked on those feeds.</li>

 <li>The file extension of the filter is significant.  <code>.py</code> invokes
 python. <code>.xslt</code> involkes XSLT.  <code>.sed</code> and
@ -56,14 +73,6 @@ python. <code>.xslt</code> involkes XSLT.  <code>.sed</code> and
 perl or ruby or class/jar (java), aren't supported at the moment, but these
 would be easy to add.</li>

-<li>Any filters listed in the <code>[planet]</code> section of your config.ini
-will be invoked on all feeds.  Filters listed in individual
-<code>[feed]</code> sections will only be invoked on those feeds.</li>
-
-<li>Filters are simply invoked in the order they are listed in the
-configuration file (think unix pipes). Planet wide filters are executed before
-feed specific filters.</li>
-
 <li>Templates written using htmltmpl currently only have access to a fixed set
 of fields, whereas XSLT templates have access to everything.</li>
 </ul>
--- a/docs/index.html
+++ b/docs/index.html
@ -27,6 +27,7 @@
 <li>Other
 <ul>
 <li><a href="migration.html">Migration from Planet 2.0</a></li>
+<li><a href="contributing.html">Contributing</a></li>
 </ul>
 </li>
 <li>Reference
@ -38,6 +39,7 @@
 <li><a href="http://bitworking.org/projects/httplib2/">httplib2</a></li>
 <li><a href="http://www.w3.org/TR/xslt">XSLT</a></li>
 <li><a href="http://www.gnu.org/software/sed/manual/html_mono/sed.html">sed</a></li>
+<li><a href="http://www.djangoproject.com/documentation/templates/">Django templates</a></li>
 </ul>
 </li>
 <li>Credits and License
--- a/docs/installation.html
+++ b/docs/installation.html
@ -107,6 +107,15 @@ not yet ported to the newer python so Venus will be less featureful.

 <blockquote><pre>sudo apt-get install bzr python2.4-librdf</pre></blockquote>

+<h3 id="windows">Windows instructions</h3>
+
+<p>
+  htmltmpl templates (and Django too, since it currently piggybacks on
+  the htmltmpl implementation) on Windows require
+  the <a href="http://sourceforge.net/projects/pywin32/">pywin32</a>
+  module.
+</p>
+
 <h3 id="python22">Python 2.2 instructions</h3>

 <p>If you are running Python 2.2, you may also need to install <a href="http://pyxml.sourceforge.net/">pyxml</a>.  If the
--- a/docs/templates.html
+++ b/docs/templates.html
@ -101,6 +101,48 @@ The data values within the <code>Items</code> array are as follows:</p>
 <code>new_</code> are only set if their values differ from the previous
 Item.</p>

+<h3>django</h3>
+
+<p>
+  If you have the <a href="http://www.djangoproject.com/">Django</a>
+  framework installed, 
+  <a href="http://www.djangoproject.com/documentation/templates/"
+  >Django templates</a> are automatically available to Venus
+  projects. You will have to save them with a <code>.html.dj</code>
+  extension in your themes. The variable set is the same as the one
+  from htmltmpl, above. In the Django template context you'll have
+  access to <code>Channels</code> and <code>Items</code> and you'll be
+  able to iterate through them.
+</p>
+
+<p>
+  You also have access to the <code>Config</code> dictionary, which contains
+  the Venus configuration variables from your <code>.ini</code> file.
+</p>
+
+<p>
+  If you lose your way and want to introspect all the variable in the 
+  context, there's the useful <code>{% debug %}</code> template tag. 
+</p>
+
+<p>
+  In the <code>themes/django/</code> you'll find a sample Venus theme
+  that uses the Django templates that might be a starting point for
+  your own custom themes.
+</p>
+
+<p>
+  All the standard Django template tags and filter are supposed to
+  work, with the notable exception of the <code>date</code> filter on
+  the updated and published dates of an item (it works on the main 
+  <code>{{ date }}</code> variable).
+</p>
+
+<p>
+  Please note that Django, and therefore Venus' Django support,
+  requires at least Python 2.3.
+</p>
+
 <h3>xslt</h3>
 <p><a href="http://www.w3.org/TR/xslt">XSLT</a> is a paradox: it actually
 makes some simple things easier to do than htmltmpl, and certainly can
--- a/expunge.py
+++ b/expunge.py
@ -0,0 +1,17 @@
+#!/usr/bin/env python
+"""
+Main program to run just the expunge portion of planet
+"""
+
+import os.path
+import sys
+from planet import expunge, config
+
+if __name__ == '__main__':
+
+    if len(sys.argv) == 2 and os.path.isfile(sys.argv[1]):
+        config.load(sys.argv[1])
+        expunge.expungeCache()
+    else:
+        print "Usage:"
+        print "  python %s config.ini" % sys.argv[0]
--- a/filters/detitle.xslt
+++ b/filters/detitle.xslt
@ -0,0 +1,25 @@
+<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
+                xmlns:atom="http://www.w3.org/2005/Atom"
+                xmlns="http://www.w3.org/1999/xhtml">
+
+  <!-- only retain titles that don't duplicate summary or content -->
+  <xsl:template match="atom:title">
+    <xsl:if test="string-length(.) &lt; 30 or
+                  ( substring(.,1,string-length(.)-3) !=
+                    substring(../atom:content,1,string-length(.)-3) and
+                    substring(.,1,string-length(.)-3) !=
+                    substring(../atom:summary,1,string-length(.)-3) )">
+      <xsl:copy>
+        <xsl:apply-templates select="@*|node()"/>
+      </xsl:copy>
+    </xsl:if>
+  </xsl:template>
+
+  <!-- pass through everything else -->
+  <xsl:template match="@*|node()">
+    <xsl:copy>
+      <xsl:apply-templates select="@*|node()"/>
+    </xsl:copy>
+  </xsl:template>
+
+</xsl:stylesheet>
--- a/filters/h1title.xslt
+++ b/filters/h1title.xslt
@ -0,0 +1,30 @@
+<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
+                xmlns:atom="http://www.w3.org/2005/Atom"
+                xmlns:xhtml="http://www.w3.org/1999/xhtml">
+
+  <!-- Replace title with value of h1, if present -->
+  <xsl:template match="atom:title">
+    <xsl:apply-templates select="@*"/>
+    <xsl:copy>
+      <xsl:choose>
+        <xsl:when test="count(//xhtml:h1) = 1">
+          <xsl:value-of select="normalize-space(//xhtml:h1)"/>
+        </xsl:when>
+        <xsl:otherwise>
+          <xsl:apply-templates select="node()"/>
+        </xsl:otherwise>
+      </xsl:choose>
+    </xsl:copy>
+  </xsl:template>
+
+  <!-- Remove all h1s -->
+  <xsl:template match="xhtml:h1"/>
+
+  <!-- pass through everything else -->
+  <xsl:template match="@*|node()">
+    <xsl:copy>
+      <xsl:apply-templates select="@*|node()"/>
+    </xsl:copy>
+  </xsl:template>
+
+</xsl:stylesheet>
--- a/filters/regexp_sifter.py
+++ b/filters/regexp_sifter.py
@ -0,0 +1,44 @@
+import sys, re
+
+# parse options
+options = dict(zip(sys.argv[1::2],sys.argv[2::2]))
+
+# read entry
+doc = data = sys.stdin.read()
+
+# Apply a sequence of patterns which turn a normalized Atom entry into
+# a stream of text, after removal of non-human metadata.
+for pattern,replacement in [
+  (re.compile('<id>.*?</id>'),' '),
+  (re.compile('<url>.*?</url>'),' '),
+  (re.compile('<source>.*?</source>'),' '),
+  (re.compile('<updated.*?</updated>'),' '),
+  (re.compile('<published.*?</published>'),' '),
+  (re.compile('<link .*?>'),' '),
+  (re.compile('''<[^>]* alt=['"]([^'"]*)['"].*?>'''),r' \1 '),
+  (re.compile('''<[^>]* title=['"]([^'"]*)['"].*?>'''),r' \1 '),
+  (re.compile('''<[^>]* label=['"]([^'"]*)['"].*?>'''),r' \1 '),
+  (re.compile('''<[^>]* term=['"]([^'"]*)['"].*?>'''),r' \1 '),
+  (re.compile('<.*?>'),' '),
+  (re.compile('\s+'),' '),
+  (re.compile('&gt;'),'>'),
+  (re.compile('&lt;'),'<'),
+  (re.compile('&apos;'),"'"),
+  (re.compile('&quot;'),'"'),
+  (re.compile('&amp;'),'&'),
+  (re.compile('\s+'),' ')
+]:
+  data=pattern.sub(replacement,data)
+
+# process requirements
+if options.has_key('--require'):
+  for regexp in options['--require'].split('\n'):
+     if regexp and not re.search(regexp,data): sys.exit(1)
+
+# process exclusions
+if options.has_key('--exclude'):
+  for regexp in options['--exclude'].split('\n'):
+     if regexp and re.search(regexp,data): sys.exit(1)
+
+# if we get this far, the feed is to be included
+print doc
--- a/planet.py
+++ b/planet.py
@ -21,6 +21,7 @@ if __name__ == "__main__":
    offline = 0
    verbose = 0
    only_if_new = 0
+    expunge = 0

    for arg in sys.argv[1:]:
        if arg == "-h" or arg == "--help":
@ -31,6 +32,7 @@ if __name__ == "__main__":
            print " -o, --offline       Update the Planet from the cache only"
            print " -h, --help          Display this help message and exit"
            print " -n, --only-if-new   Only spider new feeds"
+            print " -x, --expunge       Expunge old entries from cache"
            print
            sys.exit(0)
        elif arg == "-v" or arg == "--verbose":
@ -39,6 +41,8 @@ if __name__ == "__main__":
            offline = 1
        elif arg == "-n" or arg == "--only-if-new":
            only_if_new = 1
+        elif arg == "-x" or arg == "--expunge":
+            expunge = 1
        elif arg.startswith("-"):
            print >>sys.stderr, "Unknown option:", arg
            sys.exit(1)
@ -62,3 +66,7 @@ if __name__ == "__main__":
    from planet import splice
    doc = splice.splice()
    splice.apply(doc.toxml('utf-8'))
+
+    if expunge:
+        from planet import expunge
+        expunge.expungeCache
--- a/planet/config.py
+++ b/planet/config.py
@ -26,7 +26,7 @@ Todo:
  * error handling (example: no planet section)
 """

-import os, sys, re
+import os, sys, re, urllib
 from ConfigParser import ConfigParser
 from urlparse import urljoin

@ -106,7 +106,9 @@ def __init__():
    define_planet('output_dir', 'output')
    define_planet('spider_threads', 0) 

+    define_planet_int('new_feed_items', 0) 
    define_planet_int('feed_timeout', 20)
+    define_planet_int('cache_keep_entries', 10)

    define_planet_list('template_files')
    define_planet_list('bill_of_materials')
@ -126,6 +128,8 @@ def __init__():
    define_tmpl('content_type', '')
    define_tmpl('future_dates', 'keep')
    define_tmpl('xml_base', '')
+    define_tmpl('filter', None) 
+    define_tmpl('exclude', None) 

 def load(config_file):
    """ initialize and load a configuration"""
@ -330,7 +334,7 @@ def feedtype():

 def subscriptions():
    """ list the feed subscriptions """
-    return filter(lambda feed: feed!='Planet' and 
+    return __builtins__['filter'](lambda feed: feed!='Planet' and 
        feed not in template_files()+filters()+reading_lists(),
        parser.sections())

@ -350,6 +354,12 @@ def filters(section=None):
        filters += parser.get('Planet', 'filters').split()
    if section and parser.has_option(section, 'filters'):
        filters += parser.get(section, 'filters').split()
+    if filter(section):
+        filters.append('regexp_sifter.py?require=' +
+            urllib.quote(filter(section)))
+    if exclude(section):
+        filters.append('regexp_sifter.py?exclude=' +
+            urllib.quote(filter(section)))
    return filters

 def planet_options():
--- a/planet/expunge.py
+++ b/planet/expunge.py
@ -0,0 +1,68 @@
+""" Expunge old entries from a cache of entries """
+import glob, os, planet, config, feedparser
+from xml.dom import minidom
+from spider import filename
+
+def expungeCache():
+    """ Expunge old entries from a cache of entries """
+    import planet
+    log = planet.getLogger(config.log_level(),config.log_format())
+
+    log.info("Determining feed subscriptions")
+    entry_count = {}
+    sources = config.cache_sources_directory()
+    for sub in config.subscriptions():
+        data=feedparser.parse(filename(sources,sub))
+        if not data.feed.has_key('id'): continue
+        if config.feed_options(sub).has_key('cache_keep_entries'):
+            entry_count[data.feed.id] = int(config.feed_options(sub)['cache_keep_entries'])
+        else:
+            entry_count[data.feed.id] = config.cache_keep_entries()
+
+    log.info("Listing cached entries")
+    cache = config.cache_directory()
+    dir=[(os.stat(file).st_mtime,file) for file in glob.glob(cache+"/*")
+        if not os.path.isdir(file)]
+    dir.sort()
+    dir.reverse()
+
+    for mtime,file in dir:
+
+        try:
+            entry=minidom.parse(file)
+            # determine source of entry
+            entry.normalize()
+            sources = entry.getElementsByTagName('source')
+            if not sources:
+                # no source determined, do not delete
+                log.debug("No source found for %s", file)
+                continue
+            ids = sources[0].getElementsByTagName('id')
+            if not ids:
+                # feed id not found, do not delete
+                log.debug("No source feed id found for %s", file)
+                continue
+            if ids[0].childNodes[0].nodeValue in entry_count:
+                # subscribed to feed, update entry count
+                entry_count[ids[0].childNodes[0].nodeValue] = entry_count[
+                    ids[0].childNodes[0].nodeValue] - 1
+                if entry_count[ids[0].childNodes[0].nodeValue] >= 0:
+                    # maximum not reached, do not delete
+                    log.debug("Maximum not reached for %s from %s",
+                        file, ids[0].childNodes[0].nodeValue)
+                    continue
+                else:
+                    # maximum reached
+                    log.debug("Removing %s, maximum reached for %s",
+                        file, ids[0].childNodes[0].nodeValue)
+            else:
+                # not subscribed
+                log.debug("Removing %s, not subscribed to %s",
+                    file, ids[0].childNodes[0].nodeValue)
+            # remove old entry
+            os.unlink(file)
+
+        except:
+            log.error("Error parsing %s", file)
+
+# end of expungeCache()
--- a/planet/html5lib/init.py
+++ b/planet/html5lib/init.py
@ -9,26 +9,7 @@ Example usage:
 import html5lib
 f = open("my_document.html")
 p = html5lib.HTMLParser()
-tree = p.parse(f)
-
-By default the returned treeformat is a custom "simpletree", similar
-to a DOM tree; each element has attributes childNodes and parent
-holding the parents and children respectively, a name attribute
-holding the Element name, a data attribute holding the element data
-(for text and comment nodes) and an attributes dictionary holding the
-element's attributes (for Element nodes).
-
-To get output in ElementTree format:
-
-import html5lib
-from html5lib.treebuilders import etree
-p = html5lib.HTMLParser(tree=etree.TreeBuilder)
-elementtree = p.parse(f)
-
-Note: Because HTML documents support various features not in the
-default ElementTree (e.g. doctypes), we suppy our own simple
-serializer; html5lib.treebuilders.etree.tostring At present this does not
-have the encoding support offered by the elementtree serializer.
-
+tree = p.parse(f) 
 """
 from html5parser import HTMLParser
+from liberalxmlparser import XMLParser, XHTMLParser
--- a/planet/html5lib/constants.py
+++ b/planet/html5lib/constants.py
@ -112,7 +112,8 @@ spaceCharacters = frozenset((
    u"\n",
    u"\u000B",
    u"\u000C",
-    u" "
+    u" ",
+    u"\r"
 ))

 tableInsertModeElements = frozenset((
@ -124,6 +125,7 @@ tableInsertModeElements = frozenset((
 ))

 asciiLowercase = frozenset(string.ascii_lowercase)
+asciiUppercase = frozenset(string.ascii_uppercase)
 asciiLetters = frozenset(string.ascii_letters)
 digits = frozenset(string.digits)
 hexDigits = frozenset(string.hexdigits)
@ -454,3 +456,222 @@ entities = {
    "zwj": u"\u200D",
    "zwnj": u"\u200C"
 }
+
+encodings = frozenset((
+    "ansi_x3.4-1968",
+    "iso-ir-6",
+    "ansi_x3.4-1986",
+    "iso_646.irv:1991",
+    "ascii",
+    "iso646-us",
+    "us-ascii",
+    "us",
+    "ibm367",
+    "cp367",
+    "csascii",
+    "ks_c_5601-1987",
+    "korean",
+    "iso-2022-kr",
+    "csiso2022kr",
+    "euc-kr",
+    "iso-2022-jp",
+    "csiso2022jp",
+    "iso-2022-jp-2",
+    "iso-ir-58",
+    "chinese",
+    "csiso58gb231280",
+    "iso_8859-1:1987",
+    "iso-ir-100",
+    "iso_8859-1",
+    "iso-8859-1",
+    "latin1",
+    "l1",
+    "ibm819",
+    "cp819",
+    "csisolatin1",
+    "iso_8859-2:1987",
+    "iso-ir-101",
+    "iso_8859-2",
+    "iso-8859-2",
+    "latin2",
+    "l2",
+    "csisolatin2",
+    "iso_8859-3:1988",
+    "iso-ir-109",
+    "iso_8859-3",
+    "iso-8859-3",
+    "latin3",
+    "l3",
+    "csisolatin3",
+    "iso_8859-4:1988",
+    "iso-ir-110",
+    "iso_8859-4",
+    "iso-8859-4",
+    "latin4",
+    "l4",
+    "csisolatin4",
+    "iso_8859-6:1987",
+    "iso-ir-127",
+    "iso_8859-6",
+    "iso-8859-6",
+    "ecma-114",
+    "asmo-708",
+    "arabic",
+    "csisolatinarabic",
+    "iso_8859-7:1987",
+    "iso-ir-126",
+    "iso_8859-7",
+    "iso-8859-7",
+    "elot_928",
+    "ecma-118",
+    "greek",
+    "greek8",
+    "csisolatingreek",
+    "iso_8859-8:1988",
+    "iso-ir-138",
+    "iso_8859-8",
+    "iso-8859-8",
+    "hebrew",
+    "csisolatinhebrew",
+    "iso_8859-5:1988",
+    "iso-ir-144",
+    "iso_8859-5",
+    "iso-8859-5",
+    "cyrillic",
+    "csisolatincyrillic",
+    "iso_8859-9:1989",
+    "iso-ir-148",
+    "iso_8859-9",
+    "iso-8859-9",
+    "latin5",
+    "l5",
+    "csisolatin5",
+    "iso-8859-10",
+    "iso-ir-157",
+    "l6",
+    "iso_8859-10:1992",
+    "csisolatin6",
+    "latin6",
+    "hp-roman8",
+    "roman8",
+    "r8",
+    "ibm037",
+    "cp037",
+    "csibm037",
+    "ibm424",
+    "cp424",
+    "csibm424",
+    "ibm437",
+    "cp437",
+    "437",
+    "cspc8codepage437",
+    "ibm500",
+    "cp500",
+    "csibm500",
+    "ibm775",
+    "cp775",
+    "cspc775baltic",
+    "ibm850",
+    "cp850",
+    "850",
+    "cspc850multilingual",
+    "ibm852",
+    "cp852",
+    "852",
+    "cspcp852",
+    "ibm855",
+    "cp855",
+    "855",
+    "csibm855",
+    "ibm857",
+    "cp857",
+    "857",
+    "csibm857",
+    "ibm860",
+    "cp860",
+    "860",
+    "csibm860",
+    "ibm861",
+    "cp861",
+    "861",
+    "cp-is",
+    "csibm861",
+    "ibm862",
+    "cp862",
+    "862",
+    "cspc862latinhebrew",
+    "ibm863",
+    "cp863",
+    "863",
+    "csibm863",
+    "ibm864",
+    "cp864",
+    "csibm864",
+    "ibm865",
+    "cp865",
+    "865",
+    "csibm865",
+    "ibm866",
+    "cp866",
+    "866",
+    "csibm866",
+    "ibm869",
+    "cp869",
+    "869",
+    "cp-gr",
+    "csibm869",
+    "ibm1026",
+    "cp1026",
+    "csibm1026",
+    "koi8-r",
+    "cskoi8r",
+    "koi8-u",
+    "big5-hkscs",
+    "ptcp154",
+    "csptcp154",
+    "pt154",
+    "cp154",
+    "utf-7",
+    "utf-16be",
+    "utf-16le",
+    "utf-16",
+    "utf-8",
+    "iso-8859-13",
+    "iso-8859-14",
+    "iso-ir-199",
+    "iso_8859-14:1998",
+    "iso_8859-14",
+    "latin8",
+    "iso-celtic",
+    "l8",
+    "iso-8859-15",
+    "iso_8859-15",
+    "iso-8859-16",
+    "iso-ir-226",
+    "iso_8859-16:2001",
+    "iso_8859-16",
+    "latin10",
+    "l10",
+    "gbk",
+    "cp936",
+    "ms936",
+    "gb18030",
+    "shift_jis",
+    "ms_kanji",
+    "csshiftjis",
+    "euc-jp",
+    "gb2312",
+    "big5",
+    "csbig5",
+    "windows-1250",
+    "windows-1251",
+    "windows-1252",
+    "windows-1253",
+    "windows-1254",
+    "windows-1255",
+    "windows-1256",
+    "windows-1257",
+    "windows-1258",
+    "tis-620",
+    "hz-gb-2312",
+    ))
--- a/planet/html5lib/html5parser.py
+++ b/planet/html5lib/html5parser.py
@ -840,7 +840,8 @@ class InBodyPhase(Phase):
        self.tree.insertElement(name, attributes)

    def endTagP(self, name):
-        self.tree.generateImpliedEndTags("p")
+        if self.tree.elementInScope("p"):
+            self.tree.generateImpliedEndTags("p")
        if self.tree.openElements[-1].name != "p":
            self.parser.parseError("Unexpected end tag (p).")
        while self.tree.elementInScope("p"):
@ -1150,7 +1151,8 @@ class InTablePhase(Phase):
        self.parser.phase.processStartTag(name, attributes)

    def startTagTable(self, name, attributes):
-        self.parser.parseError()
+        self.parser.parseError(_(u"Unexpected start tag (table) in table "
+          u"phase. Implies end tag (table)."))
        self.parser.phase.processEndTag("table")
        if not self.parser.innerHTML:
            self.parser.phase.processStartTag(name, attributes)
@ -1168,14 +1170,16 @@ class InTablePhase(Phase):
        if self.tree.elementInScope("table", True):
            self.tree.generateImpliedEndTags()
            if self.tree.openElements[-1].name != "table":
-                self.parser.parseError()
+                self.parser.parseError(_(u"Unexpected end tag (table). "
+                  u"Expected end tag (" + self.tree.openElements[-1].name +\
+                  u")."))
            while self.tree.openElements[-1].name != "table":
                self.tree.openElements.pop()
            self.tree.openElements.pop()
            self.parser.resetInsertionMode()
        else:
-            self.parser.parseError()
            # innerHTML case
+            self.parser.parseError()

    def endTagIgnore(self, name):
        self.parser.parseError(_("Unexpected end tag (" + name +\
@ -1787,7 +1791,7 @@ class TrailingEndPhase(Phase):
        pass

    def processComment(self, data):
-        self.parser.insertCommenr(data, self.tree.document)
+        self.tree.insertComment(data, self.tree.document)

    def processSpaceCharacters(self, data):
        self.parser.lastPhase.processSpaceCharacters(data)
--- a/planet/html5lib/inputstream.py
+++ b/planet/html5lib/inputstream.py
@ -1,7 +1,10 @@
 import codecs
 import re
+import types

-from constants import EOF
+from constants import EOF, spaceCharacters, asciiLetters, asciiUppercase
+from constants import encodings
+from utils import MethodDispatcher

 class HTMLInputStream(object):
    """Provides a unicode stream of characters to the HTMLTokenizer.
@ -11,7 +14,7 @@ class HTMLInputStream(object):

    """

-    def __init__(self, source, encoding=None):
+    def __init__(self, source, encoding=None, chardet=True):
        """Initialises the HTMLInputStream.

        HTMLInputStream(source, [encoding]) -> Normalized stream from source
@ -28,33 +31,30 @@ class HTMLInputStream(object):
        # List of where new lines occur
        self.newLines = []

-        # Encoding Information
-        self.charEncoding = encoding
-
-        # Raw Stream
+      # Raw Stream
        self.rawStream = self.openStream(source)

-        # Try to detect the encoding of the stream by looking for a BOM
-        detectedEncoding = self.detectEncoding()
-
-        # If an encoding was specified or detected from the BOM don't allow
-        # the encoding to be changed futher into the stream
-        if self.charEncoding or detectedEncoding:
-            self.allowEncodingOverride = False
-        else:
-            self.allowEncodingOverride = True
-
-        # If an encoding wasn't specified, use the encoding detected from the
-        # BOM, if present, otherwise use the default encoding
-        if not self.charEncoding:
-            self.charEncoding = detectedEncoding or "cp1252"
+        # Encoding Information
+        #Number of bytes to use when looking for a meta element with
+        #encoding information
+        self.numBytesMeta = 512
+        #Encoding to use if no other information can be found
+        self.defaultEncoding = "windows-1252"
+        
+        #Autodetect encoding if no other information can be found?
+        self.chardet = chardet
+        
+        #Detect encoding iff no explicit "transport level" encoding is supplied
+        if encoding is None or not isValidEncoding(encoding):
+            encoding = self.detectEncoding()
+        self.charEncoding = encoding

        # Read bytes from stream decoding them into Unicode
        uString = self.rawStream.read().decode(self.charEncoding, 'replace')

-        # Normalize new lines and null characters
+        # Normalize new ipythonlines and null characters
        uString = re.sub('\r\n?', '\n', uString)
-        uString = re.sub('\x00', '\xFFFD', uString)
+        uString = re.sub('\x00', u'\uFFFD', uString)

        # Convert the unicode string into a list to be used as the data stream
        self.dataStream = uString
@ -80,9 +80,39 @@ class HTMLInputStream(object):
        return stream

    def detectEncoding(self):
-        # Attempts to detect the character encoding of the stream. If
-        # an encoding can be determined from the BOM return the name of the
-        # encoding otherwise return None
+
+        #First look for a BOM
+        #This will also read past the BOM if present
+        encoding = self.detectBOM()
+        #If there is no BOM need to look for meta elements with encoding 
+        #information
+        if encoding is None:
+            encoding = self.detectEncodingMeta()
+        #Guess with chardet, if avaliable
+        if encoding is None and self.chardet:
+            try:
+                import chardet
+                buffer = self.rawStream.read()
+                encoding = chardet.detect(buffer)['encoding']
+                self.rawStream = self.openStream(buffer)
+            except ImportError:
+                pass
+        # If all else fails use the default encoding
+        if encoding is None:
+            encoding = self.defaultEncoding
+        
+        #Substitute for equivalent encodings:
+        encodingSub = {"iso-8859-1":"windows-1252"}
+
+        if encoding.lower() in encodingSub:
+            encoding = encodingSub[encoding.lower()]
+
+        return encoding
+
+    def detectBOM(self):
+        """Attempts to detect at BOM at the start of the stream. If
+        an encoding can be determined from the BOM return the name of the
+        encoding otherwise return None"""
        bomDict = {
            codecs.BOM_UTF8: 'utf-8',
            codecs.BOM_UTF16_LE: 'utf-16-le', codecs.BOM_UTF16_BE: 'utf-16-be',
@ -103,24 +133,19 @@ class HTMLInputStream(object):
                encoding = bomDict.get(string)   # UTF-32
                seek = 4

+        #AT - move this to the caller?
        # Set the read position past the BOM if one was found, otherwise
        # set it to the start of the stream
        self.rawStream.seek(encoding and seek or 0)

        return encoding

-    def declareEncoding(self, encoding):
+    def detectEncodingMeta(self):
        """Report the encoding declared by the meta element
-
-        If the encoding is currently only guessed, then this
-        will read subsequent characters in that encoding.
-
-        If the encoding is not compatible with the guessed encoding
-        and non-US-ASCII characters have been seen, return True indicating
-        parsing will have to begin again.
-
        """
-        pass
+        parser = EncodingParser(self.rawStream.read(self.numBytesMeta))
+        self.rawStream.seek(0)
+        return parser.getEncoding()

    def determineNewLines(self):
        # Looks through the stream to find where new lines occur so
@ -188,15 +213,277 @@ class HTMLInputStream(object):
        self.queue.insert(0, charStack.pop())
        return "".join(charStack)

-if __name__ == "__main__":
-    stream = HTMLInputStream("../tests/utf-8-bom.html")
-
-    c = stream.char()
-    while c:
-        line, col = stream.position()
-        if c == u"\n":
-            print "Line %s, Column %s: Line Feed" % (line, col)
+class EncodingBytes(str):
+    """String-like object with an assosiated position and various extra methods
+    If the position is ever greater than the string length then an exception is
+    raised"""
+    def __init__(self, value):
+        str.__init__(self, value)
+        self._position=-1
+    
+    def __iter__(self):
+        return self
+    
+    def next(self):
+        self._position += 1
+        rv = self[self.position]
+        return rv
+    
+    def setPosition(self, position):
+        if self._position >= len(self):
+            raise StopIteration
+        self._position = position
+    
+    def getPosition(self):
+        if self._position >= len(self):
+            raise StopIteration
+        if self._position >= 0:
+            return self._position
        else:
-            print "Line %s, Column %s: %s" % (line, col, c.encode('utf-8'))
-        c = stream.char()
-    print "EOF"
+            return None
+    
+    position = property(getPosition, setPosition)
+
+    def getCurrentByte(self):
+        return self[self.position]
+    
+    currentByte = property(getCurrentByte)
+
+    def skip(self, chars=spaceCharacters):
+        """Skip past a list of characters"""
+        while self.currentByte in chars:
+            self.position += 1
+
+    def matchBytes(self, bytes, lower=False):
+        """Look for a sequence of bytes at the start of a string. If the bytes 
+        are found return True and advance the position to the byte after the 
+        match. Otherwise return False and leave the position alone"""
+        data = self[self.position:self.position+len(bytes)]
+        if lower:
+            data = data.lower()
+        rv = data.startswith(bytes)
+        if rv == True:
+            self.position += len(bytes)
+        return rv
+    
+    def jumpTo(self, bytes):
+        """Look for the next sequence of bytes matching a given sequence. If
+        a match is found advance the position to the last byte of the match"""
+        newPosition = self[self.position:].find(bytes)
+        if newPosition > -1:
+            self._position += (newPosition + len(bytes)-1)
+            return True
+        else:
+            raise StopIteration
+    
+    def findNext(self, byteList):
+        """Move the pointer so it points to the next byte in a set of possible
+        bytes"""
+        while (self.currentByte not in byteList):
+            self.position += 1
+
+class EncodingParser(object):
+    """Mini parser for detecting character encoding from meta elements"""
+
+    def __init__(self, data):
+        """string - the data to work on for encoding detection"""
+        self.data = EncodingBytes(data)
+        self.encoding = None
+
+    def getEncoding(self):
+        methodDispatch = (
+            ("<!--",self.handleComment),
+            ("<meta",self.handleMeta),
+            ("</",self.handlePossibleEndTag),
+            ("<!",self.handleOther),
+            ("<?",self.handleOther),
+            ("<",self.handlePossibleStartTag))
+        for byte in self.data:
+            keepParsing = True
+            for key, method in methodDispatch:
+                if self.data.matchBytes(key, lower=True):
+                    try:
+                        keepParsing = method()    
+                        break
+                    except StopIteration:
+                        keepParsing=False
+                        break
+            if not keepParsing:
+                break
+        if self.encoding is not None:
+            self.encoding = self.encoding.strip()
+        return self.encoding
+
+    def handleComment(self):
+        """Skip over comments"""
+        return self.data.jumpTo("-->")
+
+    def handleMeta(self):
+        if self.data.currentByte not in spaceCharacters:
+            #if we have <meta not followed by a space so just keep going
+            return True
+        #We have a valid meta element we want to search for attributes
+        while True:
+            #Try to find the next attribute after the current position
+            attr = self.getAttribute()
+            if attr is None:
+                return True
+            else:
+                if attr[0] == "charset":
+                    tentativeEncoding = attr[1]
+                    if isValidEncoding(tentativeEncoding):
+                        self.encoding = tentativeEncoding    
+                        return False
+                elif attr[0] == "content":
+                    contentParser = ContentAttrParser(EncodingBytes(attr[1]))
+                    tentativeEncoding = contentParser.parse()
+                    if isValidEncoding(tentativeEncoding):
+                        self.encoding = tentativeEncoding    
+                        return False
+
+    def handlePossibleStartTag(self):
+        return self.handlePossibleTag(False)
+
+    def handlePossibleEndTag(self):
+        self.data.position+=1
+        return self.handlePossibleTag(True)
+
+    def handlePossibleTag(self, endTag):
+        if self.data.currentByte not in asciiLetters:
+            #If the next byte is not an ascii letter either ignore this
+            #fragment (possible start tag case) or treat it according to 
+            #handleOther
+            if endTag:
+                self.data.position -= 1
+                self.handleOther()
+            return True
+        
+        self.data.findNext(list(spaceCharacters) + ["<", ">"])
+        if self.data.currentByte == "<":
+            #return to the first step in the overall "two step" algorithm
+            #reprocessing the < byte
+            self.data.position -= 1    
+        else:
+            #Read all attributes
+            attr = self.getAttribute()
+            while attr is not None:
+                attr = self.getAttribute()
+        return True
+
+    def handleOther(self):
+        return self.data.jumpTo(">")
+
+    def getAttribute(self):
+        """Return a name,value pair for the next attribute in the stream, 
+        if one is found, or None"""
+        self.data.skip(list(spaceCharacters)+["/"])
+        if self.data.currentByte == "<":
+            self.data.position -= 1
+            return None
+        elif self.data.currentByte == ">":
+            return None
+        attrName = []
+        attrValue = []
+        spaceFound = False
+        #Step 5 attribute name
+        while True:
+            if self.data.currentByte == "=" and attrName:   
+                break
+            elif self.data.currentByte in spaceCharacters:
+                spaceFound=True
+                break
+            elif self.data.currentByte in ("/", "<", ">"):
+                return "".join(attrName), ""
+            elif self.data.currentByte in asciiUppercase:
+                attrName.extend(self.data.currentByte.lower())
+            else:
+                attrName.extend(self.data.currentByte)
+            #Step 6
+            self.data.position += 1
+        #Step 7
+        if spaceFound:
+            self.data.skip()
+            #Step 8
+            if self.data.currentByte != "=":
+                self.data.position -= 1
+                return "".join(attrName), ""
+        #XXX need to advance position in both spaces and value case
+        #Step 9
+        self.data.position += 1
+        #Step 10
+        self.data.skip()
+        #Step 11
+        if self.data.currentByte in ("'", '"'):
+            #11.1
+            quoteChar = self.data.currentByte
+            while True:
+                self.data.position+=1
+                #11.3
+                if self.data.currentByte == quoteChar:
+                    self.data.position += 1
+                    return "".join(attrName), "".join(attrValue)
+                #11.4
+                elif self.data.currentByte in asciiUppercase:
+                    attrValue.extend(self.data.currentByte.lower())
+                #11.5
+                else:
+                    attrValue.extend(self.data.currentByte)
+        elif self.data.currentByte in (">", '<'):
+                return "".join(attrName), ""
+        elif self.data.currentByte in asciiUppercase:
+            attrValue.extend(self.data.currentByte.lower())
+        else:
+            attrValue.extend(self.data.currentByte)
+        while True:
+            self.data.position +=1
+            if self.data.currentByte in (
+                list(spaceCharacters) + [">", '<']):
+                return "".join(attrName), "".join(attrValue)
+            elif self.data.currentByte in asciiUppercase:
+                attrValue.extend(self.data.currentByte.lower())
+            else:
+                attrValue.extend(self.data.currentByte)
+
+
+class ContentAttrParser(object):
+    def __init__(self, data):
+        self.data = data
+    def parse(self):
+        try:
+            #Skip to the first ";"
+            self.data.jumpTo(";")
+            self.data.position += 1
+            self.data.skip()
+            #Check if the attr name is charset 
+            #otherwise return
+            self.data.jumpTo("charset")
+            self.data.position += 1
+            self.data.skip()
+            if not self.data.currentByte == "=":
+                #If there is no = sign keep looking for attrs
+                return None
+            self.data.position += 1
+            self.data.skip()
+            #Look for an encoding between matching quote marks
+            if self.data.currentByte in ('"', "'"):
+                quoteMark = self.data.currentByte
+                self.data.position += 1
+                oldPosition = self.data.position
+                self.data.jumpTo(quoteMark)
+                return self.data[oldPosition:self.data.position]
+            else:
+                #Unquoted value
+                oldPosition = self.data.position
+                try:
+                    self.data.findNext(spaceCharacters)
+                    return self.data[oldPosition:self.data.position]
+                except StopIteration:
+                    #Return the whole remaining value
+                    return self.data[oldPosition:]
+        except StopIteration:
+            return None
+
+def isValidEncoding(encoding):
+    """Determine if a string is a supported encoding"""
+    return (encoding is not None and type(encoding) == types.StringType and
+            encoding.lower().strip() in encodings)
--- a/planet/html5lib/liberalxmlparser.py
+++ b/planet/html5lib/liberalxmlparser.py
@ -111,10 +111,6 @@ class XmlElementPhase(html5parser.Phase):
    def endTagOther(self, name):
        for node in self.tree.openElements[::-1]:
            if node.name == name:
-                self.tree.generateImpliedEndTags()
-                if self.tree.openElements[-1].name != name:
-                    self.parser.parseError(_("Unexpected end tag " + name +\
-                      "."))
                while self.tree.openElements.pop() != node:
                    pass
                break
--- a/planet/html5lib/treebuilders/_base.py
+++ b/planet/html5lib/treebuilders/_base.py
@ -303,9 +303,8 @@ class TreeBuilder(object):
        if (name in frozenset(("dd", "dt", "li", "p", "td", "th", "tr"))
            and name != exclude):
            self.openElements.pop()
-            # XXX Until someone has broven that the above breaks stuff I think
-            # we should keep it in.
-            # self.processEndTag(name)
+            # XXX This is not entirely what the specification says. We should
+            # investigate it more closely.
            self.generateImpliedEndTags(exclude)

    def getDocument(self):
--- a/planet/html5lib/treebuilders/etreefull.py
+++ b/planet/html5lib/treebuilders/etreefull.py
@ -1,7 +1,10 @@
 try:
    from xml.etree import ElementTree
 except ImportError:
-    from elementtree import ElementTree
+    try:
+        from elementtree import ElementTree
+    except:
+        pass

 import _base

--- a/planet/reconstitute.py
+++ b/planet/reconstitute.py
@ -158,14 +158,21 @@ def content(xentry, name, detail, bozo):
            for div in body.childNodes:
                if div.nodeType != Node.ELEMENT_NODE: continue
                if div.nodeName != 'div': continue
-                div.normalize()
-                if len(div.childNodes) == 1 and \
-                    div.firstChild.nodeType == Node.TEXT_NODE:
-                    data = div.firstChild
-                else:
-                    data = div
-                    xcontent.setAttribute('type', 'xhtml')
-                break
+                try:
+                    div.normalize()
+                    if len(div.childNodes) == 1 and \
+                        div.firstChild.nodeType == Node.TEXT_NODE:
+                        data = div.firstChild
+                    else:
+                        data = div
+                        xcontent.setAttribute('type', 'xhtml')
+                    break
+                except:
+                    # in extremely nested cases, the Python runtime decides
+                    # that normalize() must be in an infinite loop; mark
+                    # the content as escaped html and proceed on...
+                    xcontent.setAttribute('type', 'html')
+                    data = xdoc.createTextNode(detail.value.decode('utf-8'))

    if data: xcontent.appendChild(data)

--- a/planet/scrub.py
+++ b/planet/scrub.py
@ -99,7 +99,7 @@ def scrub(feed_uri, data):
    # resolve relative URIs and sanitize
    for entry in data.entries + [data.feed]:
        for key in entry.keys():
-            if key == 'content':
+            if key == 'content'and not entry.has_key('content_detail'):
                node = entry.content[0]
            elif key.endswith('_detail'):
                node = entry[key]
--- a/planet/shell/dj.py
+++ b/planet/shell/dj.py
@ -0,0 +1,48 @@
+import os.path
+import urlparse
+import datetime
+
+import tmpl
+from planet import config
+
+def DjangoPlanetDate(value):
+    return datetime.datetime(*value[:6])
+
+# remap PlanetDate to be a datetime, so Django template authors can use 
+# the "date" filter on these values
+tmpl.PlanetDate = DjangoPlanetDate
+
+def run(script, doc, output_file=None, options={}):
+    """process a Django template file"""
+
+    # this is needed to use the Django template system as standalone
+    # I need to re-import the settings at every call because I have to 
+    # set the TEMPLATE_DIRS variable programmatically
+    from django.conf import settings
+    try:
+        settings.configure(
+            DEBUG=True, TEMPLATE_DEBUG=True, 
+            TEMPLATE_DIRS=(os.path.dirname(script),)
+            )
+    except EnvironmentError:
+        pass
+    from django.template import Context
+    from django.template.loader import get_template
+
+    # set up the Django context by using the default htmltmpl 
+    # datatype converters
+    context = Context()
+    context.update(tmpl.template_info(doc))
+    context['Config'] = config.planet_options()
+    t = get_template(script)
+
+    if output_file:
+        reluri = os.path.splitext(os.path.basename(output_file))[0]
+        context['url'] = urlparse.urljoin(config.link(),reluri)
+        f = open(output_file, 'w')
+        f.write(t.render(context))
+        f.close()
+    else:
+        # @@this is useful for testing purposes, but does it 
+        # belong here?
+        return t.render(context)
--- a/planet/spider.py
+++ b/planet/spider.py
@ -194,7 +194,9 @@ def writeCache(feed_uri, feed_info, data):
        for filter in config.filters(feed_uri):
            output = shell.run(filter, output, mode="filter")
            if not output: break
-        if not output: continue
+        if not output:
+          if os.path.exists(cache_file): os.remove(cache_file)
+          continue

        # write out and timestamp the results
        write(output, cache_file) 
--- a/planet/splice.py
+++ b/planet/splice.py
@ -67,6 +67,8 @@ def splice():

    # insert entry information
    items = 0
+    count = {}
+    new_feed_items = config.new_feed_items()
    for mtime,file in dir:
        if index != None:
            base = os.path.basename(file)
@ -75,15 +77,23 @@ def splice():
        try:
            entry=minidom.parse(file)

-            # verify that this entry is currently subscribed to
+            # verify that this entry is currently subscribed to and that the
+            # number of entries contributed by this feed does not exceed
+            # config.new_feed_items
            entry.normalize()
            sources = entry.getElementsByTagName('source')
            if sources:
                ids = sources[0].getElementsByTagName('id')
-                if ids and ids[0].childNodes[0].nodeValue not in sub_ids:
-                    ids = sources[0].getElementsByTagName('planet:id')
-                    if not ids: continue
-                    if ids[0].childNodes[0].nodeValue not in sub_ids: continue
+                if ids:
+                    id = ids[0].childNodes[0].nodeValue
+                    count[id] = count.get(id,0) + 1
+                    if new_feed_items and count[id] > new_feed_items: continue
+
+                    if id not in sub_ids:
+                        ids = sources[0].getElementsByTagName('planet:id')
+                        if not ids: continue
+                        id = ids[0].childNodes[0].nodeValue
+                        if id not in sub_ids: continue

            # add entry to feed
            feed.appendChild(entry.documentElement)
--- a/tests/data/expunge/config.ini
+++ b/tests/data/expunge/config.ini
@ -0,0 +1,20 @@
+[Planet]
+name = test planet
+cache_directory = tests/work/expunge/cache
+cache_keep_entries = 1
+
+[tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed1]
+name = no source
+
+[tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed2]
+name = no source id
+
+[tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed3]
+name = global setting
+
+[tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed4]
+name = local setting
+cache_keep_entries = 2
+
+#[tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed5]
+#name = unsubbed
--- a/tests/data/expunge/test1.entry
+++ b/tests/data/expunge/test1.entry
@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="utf-8"?>
+<entry xmlns="http://www.w3.org/2005/Atom">
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-test1/1</id>
+  <link href="http://example.com/1/1"/>
+  <title>Test 1/1</title>
+  <content>Entry with missing source</content>
+  <updated>2007-03-01T01:01:00Z</updated>
+</entry>
--- a/tests/data/expunge/test2.entry
+++ b/tests/data/expunge/test2.entry
@ -0,0 +1,11 @@
+<?xml version="1.0" encoding="utf-8"?>
+<entry xmlns="http://www.w3.org/2005/Atom">
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-test2/1</id>
+  <link href="http://example.com/2/1"/>
+  <title>Test 2/1</title>
+  <content>Entry with missing source id</content>
+  <updated>2007-03-01T02:01:00Z</updated>
+  <source>
+    <title>Test 2/1 source</title>
+  </source>
+</entry>
--- a/tests/data/expunge/test3a.entry
+++ b/tests/data/expunge/test3a.entry
@ -0,0 +1,12 @@
+<?xml version="1.0" encoding="utf-8"?>
+<entry xmlns="http://www.w3.org/2005/Atom">
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-test3/1</id>
+  <link href="http://example.com/3/1"/>
+  <title>Test 3/1</title>
+  <content>Entry for global setting 1</content>
+  <updated>2007-03-01T03:01:00Z</updated>
+  <source>
+    <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed3</id>
+    <title>Test 3 source</title>
+  </source>
+</entry>
--- a/tests/data/expunge/test3b.entry
+++ b/tests/data/expunge/test3b.entry
@ -0,0 +1,12 @@
+<?xml version="1.0" encoding="utf-8"?>
+<entry xmlns="http://www.w3.org/2005/Atom">
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-test3/2</id>
+  <link href="http://example.com/3/2"/>
+  <title>Test 3/2</title>
+  <content>Entry for global setting 2</content>
+  <updated>2007-03-01T03:02:00Z</updated>
+  <source>
+    <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed3</id>
+    <title>Test 3 source</title>
+  </source>
+</entry>
--- a/tests/data/expunge/test3c.entry
+++ b/tests/data/expunge/test3c.entry
@ -0,0 +1,12 @@
+<?xml version="1.0" encoding="utf-8"?>
+<entry xmlns="http://www.w3.org/2005/Atom">
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-test3/3</id>
+  <link href="http://example.com/3/3"/>
+  <title>Test 3/3</title>
+  <content>Entry for global setting 3</content>
+  <updated>2007-03-01T03:03:00Z</updated>
+  <source>
+    <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed3</id>
+    <title>Test 3 source</title>
+  </source>
+</entry>
--- a/tests/data/expunge/test4a.entry
+++ b/tests/data/expunge/test4a.entry
@ -0,0 +1,12 @@
+<?xml version="1.0" encoding="utf-8"?>
+<entry xmlns="http://www.w3.org/2005/Atom">
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-test4/1</id>
+  <link href="http://example.com/4/1"/>
+  <title>Test 4/1</title>
+  <content>Entry for local setting 1</content>
+  <updated>2007-03-01T04:01:00Z</updated>
+  <source>
+    <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed4</id>
+    <title>Test 4 source</title>
+  </source>
+</entry>
--- a/tests/data/expunge/test4b.entry
+++ b/tests/data/expunge/test4b.entry
@ -0,0 +1,12 @@
+<?xml version="1.0" encoding="utf-8"?>
+<entry xmlns="http://www.w3.org/2005/Atom">
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-test4/2</id>
+  <link href="http://example.com/4/2"/>
+  <title>Test 4/2</title>
+  <content>Entry for local setting 2</content>
+  <updated>2007-03-01T04:02:00Z</updated>
+  <source>
+    <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed4</id>
+    <title>Test 4 source</title>
+  </source>
+</entry>
--- a/tests/data/expunge/test4c.entry
+++ b/tests/data/expunge/test4c.entry
@ -0,0 +1,12 @@
+<?xml version="1.0" encoding="utf-8"?>
+<entry xmlns="http://www.w3.org/2005/Atom">
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-test4/3</id>
+  <link href="http://example.com/4/3"/>
+  <title>Test 4/3</title>
+  <content>Entry for local setting 3</content>
+  <updated>2007-03-01T04:03:00Z</updated>
+  <source>
+    <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed4</id>
+    <title>Test 4 source</title>
+  </source>
+</entry>
--- a/tests/data/expunge/test5.entry
+++ b/tests/data/expunge/test5.entry
@ -0,0 +1,12 @@
+<?xml version="1.0" encoding="utf-8"?>
+<entry xmlns="http://www.w3.org/2005/Atom">
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-test5/1</id>
+  <link href="http://example.com/5/1"/>
+  <title>Test 5/1</title>
+  <content>Entry from unsubbed feed</content>
+  <updated>2007-03-01T05:01:00Z</updated>
+  <source>
+    <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed5</id>
+    <title>Test 5 source</title>
+  </source>
+</entry>
--- a/tests/data/expunge/testfeed1.atom
+++ b/tests/data/expunge/testfeed1.atom
@ -0,0 +1,5 @@
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom">
+  <link rel="self" href="http://bzr.mfd-consult.dk/venus/tests/data/expunge/testfeed1.atom"/>
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed1</id>
+</feed>
--- a/tests/data/expunge/testfeed2.atom
+++ b/tests/data/expunge/testfeed2.atom
@ -0,0 +1,5 @@
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom">
+  <link rel="self" href="http://bzr.mfd-consult.dk/venus/tests/data/expunge/testfeed2.atom"/>
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed2</id>
+</feed>
--- a/tests/data/expunge/testfeed3.atom
+++ b/tests/data/expunge/testfeed3.atom
@ -0,0 +1,5 @@
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom">
+  <link rel="self" href="http://bzr.mfd-consult.dk/venus/tests/data/expunge/testfeed3.atom"/>
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed3</id>
+</feed>
--- a/tests/data/expunge/testfeed4.atom
+++ b/tests/data/expunge/testfeed4.atom
@ -0,0 +1,5 @@
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom">
+  <link rel="self" href="http://bzr.mfd-consult.dk/venus/tests/data/expunge/testfeed4.atom"/>
+  <id>tag:bzr.mfd-consult.dk,2007:venus-expunge-testfeed4</id>
+</feed>
--- a/tests/data/filter/django/config.html.dj
+++ b/tests/data/filter/django/config.html.dj
@ -0,0 +1 @@
+{{ Config.name }}
--- a/tests/data/filter/django/test.ini
+++ b/tests/data/filter/django/test.ini
@ -0,0 +1,2 @@
+[Planet]
+name: Django on Venus
--- a/tests/data/filter/django/test.xml
+++ b/tests/data/filter/django/test.xml
@ -0,0 +1,20 @@
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom">
+
+  <title>Example Feed</title>
+  <link href="http://example.org/"/>
+  <updated>2003-12-13T18:30:02Z</updated>
+  <author>
+    <name>John Doe</name>
+  </author>
+  <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>
+
+  <entry>
+    <title>Atom-Powered Robots Run Amok</title>
+    <link href="http://example.org/2003/12/13/atom03"/>
+    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
+    <updated>2003-12-13T18:30:02Z</updated>
+    <summary>Some text.</summary>
+  </entry>
+
+</feed>
--- a/tests/data/filter/django/title.html.dj
+++ b/tests/data/filter/django/title.html.dj
@ -0,0 +1 @@
+{% for item in Items %}{{ item.title }}{% endfor %}
--- a/tests/data/filter/regexp-sifter.ini
+++ b/tests/data/filter/regexp-sifter.ini
@ -0,0 +1,2 @@
+[Planet]
+filter=two
--- a/tests/data/reconstitute/stack_overflow.xml
+++ b/tests/data/reconstitute/stack_overflow.xml
@ -0,0 +1,36 @@
+<!--
+Description:  content with extremely nested markup
+Expect:       content[0].type == 'text/html'
+-->
+
+<feed xmlns="http://www.w3.org/2005/Atom">
+<entry>
+<content type="html">
+<![CDATA[
+<span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span>
+<span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span>
+<span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span>
+<span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span>
+<span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span>
+<span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span>
+<span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span>
+<span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span>
+<span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span>
+<span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span>
+<span><span><span><span><span><span><span><span>
+
+Stack overflow
+
+</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>
+</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>
+</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>
+</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>
+</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>
+</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>
+</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>
+</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>
+</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>
+</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>
+</span></span></span>]]></content>
+</entry>
+</feed>
--- a/tests/test_docs.py
+++ b/tests/test_docs.py
@ -0,0 +1,20 @@
+#!/usr/bin/env python
+
+import unittest, os
+from xml.dom import minidom
+from glob import glob
+
+class DocsTest(unittest.TestCase):
+
+    def test_well_formed(self):
+        for doc in glob('docs/*'):
+            if os.path.isdir(doc): continue
+            if doc.endswith('.css') or doc.endswith('.js'): continue
+
+            try:
+                minidom.parse(doc)
+            except:
+                self.fail('Not well formed: ' + doc);
+                break
+        else:
+            self.assertTrue(True);
--- a/tests/test_expunge.py
+++ b/tests/test_expunge.py
@ -0,0 +1,83 @@
+#!/usr/bin/env python
+import unittest, os, glob, shutil, time
+from planet.spider import filename
+from planet import feedparser, config
+from planet.expunge import expungeCache
+from xml.dom import minidom
+import planet
+
+workdir = 'tests/work/expunge/cache'
+sourcesdir = 'tests/work/expunge/cache/sources'
+testentries = 'tests/data/expunge/test*.entry'
+testfeeds = 'tests/data/expunge/test*.atom'
+configfile = 'tests/data/expunge/config.ini'
+
+class ExpungeTest(unittest.TestCase):
+    def setUp(self):
+        # silence errors
+        planet.logger = None
+        planet.getLogger('CRITICAL',None)
+
+        try:
+            os.makedirs(workdir)
+            os.makedirs(sourcesdir)
+        except:
+            self.tearDown()
+            os.makedirs(workdir)
+            os.makedirs(sourcesdir)
+             
+    def tearDown(self):
+        shutil.rmtree(workdir)
+        os.removedirs(os.path.split(workdir)[0])
+
+    def test_expunge(self):
+        config.load(configfile)
+
+        # create test entries in cache with correct timestamp
+        for entry in glob.glob(testentries):
+            e=minidom.parse(entry)
+            e.normalize()
+            eid = e.getElementsByTagName('id')
+            efile = filename(workdir, eid[0].childNodes[0].nodeValue)
+            eupdated = e.getElementsByTagName('updated')[0].childNodes[0].nodeValue
+            emtime = time.mktime(feedparser._parse_date_w3dtf(eupdated))
+            if not eid or not eupdated: continue
+            shutil.copyfile(entry, efile)
+            os.utime(efile, (emtime, emtime))
+  
+        # create test feeds in cache
+        sources = config.cache_sources_directory()
+        for feed in glob.glob(testfeeds):
+                f=minidom.parse(feed)
+                f.normalize()
+                fid = f.getElementsByTagName('id')
+                if not fid: continue
+                ffile = filename(sources, fid[0].childNodes[0].nodeValue)
+                shutil.copyfile(feed, ffile)
+
+        # verify that exactly nine entries + one source dir were produced
+        files = glob.glob(workdir+"/*")
+        self.assertEqual(10, len(files))
+
+        # verify that exactly four feeds were produced in source dir
+        files = glob.glob(sources+"/*")
+        self.assertEqual(4, len(files))
+
+        # expunge...
+        expungeCache()
+
+        # verify that five entries and one source dir are left
+        files = glob.glob(workdir+"/*")
+        self.assertEqual(6, len(files))
+
+        # verify that the right five entries are left
+        self.assertTrue(os.path.join(workdir,
+            'bzr.mfd-consult.dk,2007,venus-expunge-test1,1') in files)
+        self.assertTrue(os.path.join(workdir,
+            'bzr.mfd-consult.dk,2007,venus-expunge-test2,1') in files)
+        self.assertTrue(os.path.join(workdir,
+            'bzr.mfd-consult.dk,2007,venus-expunge-test3,3') in files)
+        self.assertTrue(os.path.join(workdir,
+            'bzr.mfd-consult.dk,2007,venus-expunge-test4,2') in files)
+        self.assertTrue(os.path.join(workdir,
+            'bzr.mfd-consult.dk,2007,venus-expunge-test4,3') in files)
--- a/tests/test_filter_django.py
+++ b/tests/test_filter_django.py
@ -0,0 +1,43 @@
+#!/usr/bin/env python
+
+import os.path
+import unittest, xml.dom.minidom, datetime
+
+from planet import config, logger
+from planet.shell import dj
+
+class DjangoFilterTests(unittest.TestCase):
+
+    def test_django_filter(self):
+        config.load('tests/data/filter/django/test.ini')
+        results = dj.tmpl.template_info("<feed/>")
+        self.assertEqual(results['name'], 'Django on Venus')
+
+    def test_django_date_type(self):
+        config.load('tests/data/filter/django/test.ini')
+        results = dj.tmpl.template_info("<feed/>")
+        self.assertEqual(type(results['date']), datetime.datetime)
+
+    def test_django_item_title(self):
+        config.load('tests/data/filter/django/test.ini')
+        feed = open('tests/data/filter/django/test.xml')
+        input = feed.read(); feed.close()
+        results = dj.run(
+            os.path.realpath('tests/data/filter/django/title.html.dj'), input)
+        self.assertEqual(results, "Atom-Powered Robots Run Amok\n")
+
+    def test_django_config_context(self):
+        config.load('tests/data/filter/django/test.ini')
+        feed = open('tests/data/filter/django/test.xml')
+        input = feed.read(); feed.close()
+        results = dj.run(
+            os.path.realpath('tests/data/filter/django/config.html.dj'), input)
+        self.assertEqual(results, "Django on Venus\n")
+        
+
+try:
+    from django.conf import settings
+except ImportError:
+    logger.warn("Django is not available => can't test django filters")
+    for method in dir(DjangoFilterTests):
+        if method.startswith('test_'):  delattr(DjangoFilterTests,method)
--- a/tests/test_filters.py
+++ b/tests/test_filters.py
@ -89,14 +89,40 @@ class FilterTests(unittest.TestCase):

        self.assertNotEqual('', output)

+    def test_regexp_filter(self):
+        config.load('tests/data/filter/regexp-sifter.ini')
+
+        testfile = 'tests/data/filter/category-one.xml'
+
+        output = open(testfile).read()
+        for filter in config.filters():
+            output = shell.run(filter, output, mode="filter")
+
+        self.assertEqual('', output)
+
+        testfile = 'tests/data/filter/category-two.xml'
+
+        output = open(testfile).read()
+        for filter in config.filters():
+            output = shell.run(filter, output, mode="filter")
+
+        self.assertNotEqual('', output)
+
 try:
    from subprocess import Popen, PIPE

-    sed=Popen(['sed','--version'],stdout=PIPE,stderr=PIPE)
-    sed.communicate()
-    if sed.returncode != 0:
+    _no_sed = False
+    try:
+        sed = Popen(['sed','--version'],stdout=PIPE,stderr=PIPE)
+        sed.communicate()
+        if sed.returncode != 0:
+            _no_sed = True
+    except WindowsError:
+        _no_sed = True
+
+    if _no_sed:
        logger.warn("sed is not available => can't test stripAd_yahoo")
-        del FilterTests.test_stripAd_yahoo
+        del FilterTests.test_stripAd_yahoo      

    try:
        import libxml2
--- a/tests/test_spider.py
+++ b/tests/test_spider.py
@ -73,6 +73,14 @@ class SpiderTest(unittest.TestCase):
        self.spiderFeed(testfeed % '1b')
        self.verify_spiderFeed()

+    def test_spiderFeed_retroactive_filter(self):
+        config.load(configfile)
+        self.spiderFeed(testfeed % '1b')
+        self.assertEqual(5, len(glob.glob(workdir+"/*")))
+        config.parser.set('Planet', 'filter', 'two')
+        self.spiderFeed(testfeed % '1b')
+        self.assertEqual(1, len(glob.glob(workdir+"/*")))
+
    def test_spiderUpdate(self):
        config.load(configfile)
        self.spiderFeed(testfeed % '1a')
--- a/tests/test_splice.py
+++ b/tests/test_splice.py
@ -24,3 +24,11 @@ class SpliceTest(unittest.TestCase):
        self.assertEqual(8,len(doc.getElementsByTagName('entry')))
        self.assertEqual(3,len(doc.getElementsByTagName('planet:source')))
        self.assertEqual(11,len(doc.getElementsByTagName('planet:name')))
+
+    def test_splice_new_feed_items(self):
+        config.load(configfile)
+        config.parser.set('Planet','new_feed_items','3')
+        doc = splice()
+        self.assertEqual(9,len(doc.getElementsByTagName('entry')))
+        self.assertEqual(4,len(doc.getElementsByTagName('planet:source')))
+        self.assertEqual(13,len(doc.getElementsByTagName('planet:name')))
--- a/themes/asf/default.css
+++ b/themes/asf/default.css
@ -30,11 +30,15 @@ a:active {
 a:focus {
 }

-a.active {
+a.inactive {
+	color: #558;
+}
+
+a.rising {
 	font-weight: bold;
 }

-h1 {
+body > h1 {
 	font-size: x-large;
 	text-transform: uppercase;
 	letter-spacing: 0.25em;
@ -74,6 +78,11 @@ h1 {
 	border-bottom: 1px solid #ccc;
 }

+#sidebar h2 a img {
+        margin-bottom: 2px;
+        vertical-align: middle;
+}
+
 #sidebar p {
 	font-size: x-small;
 	padding-left: 20px;
@ -163,6 +172,10 @@ h1 {
 	text-decoration: none !important;
 }

+#sidebar input[name=q] {
+        margin: 4px 0 0 24px;
+}
+
 /* ---------------------------- Footer --------------------------- */

 #footer ul {
@ -177,6 +190,10 @@ h1 {
 	display: inline;
 }

+#footer ul li ul {
+	display: none;
+}
+
 #footer img {
 	display: none;
 }
@ -419,7 +436,7 @@ math[display=block] {
 	overflow: auto;
 }

-.eqno {
+.numberedEq span, .eqno {
 	float: right;
 }

--- a/themes/asf/index.html.xslt
+++ b/themes/asf/index.html.xslt
@ -40,7 +40,7 @@
          <xsl:text>&#10;&#10;</xsl:text>
        </div>

-        <h1>Subscriptions </h1>
+        <h1>Footnotes</h1>
        <xsl:text>&#10;&#10;</xsl:text>

        <div id="sidebar">
@ -80,6 +80,7 @@

        <xsl:text>&#10;&#10;</xsl:text>
        <div id="footer">
+          <h2>Subscriptions</h2>
          <ul>
            <xsl:for-each select="planet:source">
              <xsl:sort select="planet:name"/>
--- a/themes/asf/personalize.js
+++ b/themes/asf/personalize.js
@ -45,11 +45,14 @@ function navkey(event) {
  if (!checkbox || !checkbox.checked) return;

  if (!event) event=window.event;
-  key=event.keyCode;
+  if (event.originalTarget &&
+    event.originalTarget.nodeName.toLowerCase() == 'input' &&
+    event.originalTarget.id != 'navkeys') return;

  if (!document.documentElement) return;
  if (!entries[0].anchor || !entries[0].anchor.offsetTop) return;

+  key=event.keyCode;
  if (key == 'J'.charCodeAt(0)) nextArticle(event);
  if (key == 'K'.charCodeAt(0)) prevArticle(event);
 }
@ -215,14 +218,12 @@ function moveSidebar() {

  var h1 = sidebar.previousSibling;
  while (h1.nodeType != 1) h1=h1.previousSibling;
-  h1.parentNode.removeChild(h1);
+  if (h1.nodeName.toLowerCase() == 'h1') h1.parentNode.removeChild(h1);
+
  var footer = document.getElementById('footer');
-  var ul = footer.firstChild;
-  while (ul.nodeType != 1) ul=ul.nextSibling;
-  footer.removeChild(ul);
-  sidebar.insertBefore(ul, sidebar.firstChild);
-  var h2 = document.createElement('h2');
-  h2.appendChild(h1.firstChild);
+  var ul = footer.lastChild;
+  while (ul.nodeType != 1) ul=ul.previousSibling;
+
  var twisty = document.createElement('a');
  twisty.appendChild(document.createTextNode('\u25bc'));
  twisty.title = 'hide';
@ -239,10 +240,19 @@ function moveSidebar() {
    ul.style.display = display;
    createCookie("subscriptions", display, 365);
  }
+
  var cookie = readCookie("subscriptions");
  if (cookie && cookie == 'none') twisty.onclick();
-  h2.appendChild(twisty);
-  sidebar.insertBefore(h2, sidebar.firstChild);
+
+  for (var node=footer.lastChild; node; node=footer.lastChild) {
+    if (twisty && node.nodeType == 1 && node.nodeName.toLowerCase() == 'h2') {
+      node.appendChild(twisty);
+      twisty = null;
+    }
+    footer.removeChild(node);
+    sidebar.insertBefore(node, sidebar.firstChild);
+  }
+
  var body = document.getElementById('body');
  sidebar.parentNode.removeChild(sidebar);
  body.parentNode.insertBefore(sidebar, body);
--- a/themes/common/atom.xml.xslt
+++ b/themes/common/atom.xml.xslt
@ -1,4 +1,5 @@
 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
+                xmlns:access="http://www.bloglines.com/about/specs/fac-1.0"
                xmlns:atom="http://www.w3.org/2005/Atom"
                xmlns:indexing="urn:atom-extension:indexing"
                xmlns:planet="http://planet.intertwingly.net/"
@ -20,15 +21,29 @@
  <!-- Strip site meter -->
  <xsl:template match="xhtml:div[comment()[. = ' Site Meter ']]"/>

-  <!-- add Google/LiveJournal-esque noindex directive -->
+  <!-- add Google/LiveJournal-esque and Bloglines noindex directive -->
  <xsl:template match="atom:feed">
    <xsl:copy>
      <xsl:attribute name="indexing:index">no</xsl:attribute>
-      <xsl:apply-templates select="@*|node()"/>
+      <xsl:apply-templates select="@*"/>
+      <access:restriction relationship="allow"/>
+      <xsl:apply-templates select="node()"/>
      <xsl:text>&#10;</xsl:text>
    </xsl:copy>
  </xsl:template>

+<!-- popular customization: add planet name to each entry title
+  <xsl:template match="atom:entry/atom:title">
+    <xsl:text>&#10;    </xsl:text>
+    <xsl:copy>
+      <xsl:apply-templates select="@*"/>
+      <xsl:value-of select="../atom:source/planet:name"/>
+      <xsl:text>: </xsl:text>
+      <xsl:apply-templates select="node()"/>
+    </xsl:copy>
+  </xsl:template>
+-->
+
  <!-- indent atom elements -->
  <xsl:template match="atom:*">
    <!-- double space before atom:entries -->
--- a/themes/django/bland.css
+++ b/themes/django/bland.css
@ -0,0 +1,39 @@
+body { 
+ margin: 50px 60px;
+ font-family: Georgia, Times New Roman, serif;
+}
+
+h1 { 
+ font: normal 4em Georgia, serif;
+ color: #900;
+ margin-bottom: 0px;
+}
+
+.updated, .entry-tools { 
+ font: .8em Verdana, Arial, sans-serif;
+ margin-bottom: 2em;
+}
+
+#channels { 
+ float: right;
+ width: 30%;
+ padding: 20px;
+ margin: 20px;
+ margin-top: 0px;
+ border: 1px solid #FC6;
+ background: #FFC;
+}
+
+#channels h2 { 
+ margin-top: 0px;
+}
+
+#channels ul { 
+ margin-bottom: 0px;
+}
+
+.entry { 
+ border-top: 1px solid #CCC;
+ padding-top: 1em;
+}
+
--- a/themes/django/config.ini
+++ b/themes/django/config.ini
@ -0,0 +1,11 @@
+# This theme is an example Planet Venus theme using the 
+# Django template engine. 
+
+[Planet]
+template_files:
+  index.html.dj
+
+template_directories:
+
+bill_of_materials:
+  bland.css
--- a/themes/django/index.html.dj
+++ b/themes/django/index.html.dj
@ -0,0 +1,49 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+<head>
+  <title>{{ name }}</title>
+  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
+  <link rel="stylesheet" href="bland.css" type="text/css" />
+</head>
+
+<body>
+
+<h1>{{ name }}</h1>
+
+<p class="updated">
+  last updated by <a href="http://intertwingly.net/code/venus/">Venus</a> 
+  on {{ date }} on behalf of {{ author_name }}
+</p>
+
+<div id="channels">
+  <h2>Feeds</h2>
+
+  <ul>
+    {% for channel in Channels %}
+    <li>{{ channel.title }} by {{ channel.author_name }}</li>
+    {% endfor %}
+  </ul>
+</div>
+
+{% for item in Items %}
+{% ifchanged item.channel_name %}
+<h3>{{ item.channel_name }}</h3>
+{% endifchanged %}
+
+<div class="entry">
+  {% if item.title %}<h4>{{ item.title }}</h4>{% endif %}
+
+  {{ item.content }}
+  
+  <p class="entry-tools">
+    by {{ item.channel_author }}  on 
+    {{ item.date }} · 
+    <a href="{{ item.link }}">permalink</a>
+  </p>
+</div>
+{% endfor %}
+
+</body>
+</html>
+
				`@ -0,0 +1 @@`
				`{% for item in Items %}{{ item.title }}{% endfor %}`