diff --git a/docs/config.html b/docs/config.html new file mode 100644 index 0000000..9370ee8 --- /dev/null +++ b/docs/config.html @@ -0,0 +1,116 @@ + + + + +Venus Configuration + + + +

Configuration

+

Configuration files are in ConfigParser format which basically means the same +format as INI files, i.e., they consist of a series of +[sections], in square brackets, with each section containing a +list of name:value pairs (or name=value pairs, if +you prefer).

+

You are welcome to place your entire configuration into one file. +Alternately, you may factor out the templating into a "theme", and +the list of subscriptions into one or more "reading lists".

+

[planet]

+

This is the only required section, which is a bit odd as none of the +parameters listed below are required. Even so, you really do want to +provide many of these, especially ones that identify your planet and +either (or both) of template_files and theme.

+

Below is a complete list of predefined planet configuration parameters, +including ones not (yet) implemented by Venus and ones that +are either new or implemented differently by Venus.

+
+
+
name
+
Your planet's name
+
link
+
Link to the main page
+
owner_name
+
Your name
+
owner_email
+
Your e-mail address
+
cache_directory
+
Where cached feeds are stored
+
log_level
+
One of DEBUG, INFO, WARNING, ERROR or CRITICAL
+
output_theme
+
Directory containing a config.ini file which is merged +with this one. This is typically used to specify templating and bill of +material information.
+
output_dir
+
Directory to place output files
+
items_per_page
+
How many items to put on each page. Whereas Planet 2.0 allows this to +be overridden on a per template basis, Venus currently takes the maximum value +for this across all templates.
+
days_per_page
+
How many complete days of posts to put on each page This is the absolute, hard limit (over the item limit)
+
date_format
+
strftime format for the default 'date' template variable
+
new_date_format
+
strftime format for the 'new_date' template variable only applies to htmltmpl templates
+
encoding
+
Output encoding for the file, Python 2.3+ users can use the special "xml" value to output ASCII with XML character references
+
locale
+
Locale to use for (e.g.) strings in dates, default is taken from your system
+
feed_timeout
+
Number of seconds to wait for any given feed
+
new_feed_items
+
Number of items to take from new feeds
+
activity_threshold
+
If non-zero, all feeds which have not been updated in the indicated +number of days will be marked as inactive
+
template_files
+
Space-separated list of output template files
+
template_directories
+
Space-separated list of directories in which template_files +can be found
+
bill_of_materials
+
Space-separated list of files to be copied as is directly from the template_directories to the output_dir
+
filters
+
Space-separated list of filters to apply to each entry
+
+
+ +

[DEFAULT]

+

Values placed in this section are used as default values for all sections. +While it is true that few values make sense in all sections; in most cases +unused parameters cause few problems.

+ +

[subscription]

+

All sections other than planet, DEFAULT, or are +named in [planet]'s filters or +templatefiles parameters +are treated as subscriptions and typically take the form of a +URI.

+

Parameters placed in this section are passed to templates. While +you are free to include as few or as many parameters as you like, most of +the predefined themes presume that at least name is defined.

+

The content_type parameter can be defined to indicate that +this subscription is a reading list, i.e., is an external list +of subscriptions. At the moment, two formats of reading lists are supported: +opml and foaf. In the future, support for formats +like xoxo could be added.

+

Normalization overrides can +also be defined here.

+ +

[template]

+

Sections which are listed in [planet] template_files are +processed as templates. With Planet 2.0, +it is possible to override parameters like items_per_page +on a per template basis, but at the current time Planet Venus doesn't +implement this.

+ +

[filter]

+

Sections which are listed in [planet] filters are +processed as filters.

+

Parameters which are listed in this section are passed to the filter +in a language specific manner. Given the way defaults work, filters +should be prepared to ignore parameters that they didn't expect.

+ + diff --git a/docs/docs.css b/docs/docs.css new file mode 100644 index 0000000..c5a1baf --- /dev/null +++ b/docs/docs.css @@ -0,0 +1,106 @@ +body { + background-color: #fff; + color: #333; + font-family: 'Lucida Grande', Verdana, Geneva, Lucida, Helvetica, sans-serif; + font-size: small; + margin: 40px; + padding: 0; +} + +a:link, a:visited { + background-color: transparent; + color: #333; + text-decoration: none !important; + border-bottom: 1px dotted #333 !important; + text-decoration: underline; + border-bottom: 0; +} + +a:hover { + background-color: transparent; + color: #993344; + text-decoration: none !important; + text-decoration: underline; + border-bottom: 1px dotted #993344 !important; + border-bottom: 0; +} + +code { + color: green; + font-size: large +} + +h1 { + margin: 8px 0 10px 20px; + padding: 0; + font-variant: small-caps; + letter-spacing: 0.1em; + font-family: "Book Antiqua", Georgia, Palatino, Times, "Times New Roman", serif; +} + +h2 { + clear: both; +} + +ul.outer > li { + margin: 14px 0 10px 0; +} + +.z { + float:left; + background: url(img/shadowAlpha.png) no-repeat bottom right !important; + background: url(img/shadow.gif) no-repeat bottom right; + margin: -15px 0 20px -15px !important; +} + +.z p { + margin: 14px 0 10px 15px !important; +} + +.z .sectionInner { + width: 730px; + background: none !important; + background: url(img/shadow2.gif) no-repeat left top; + padding: 0 !important; + padding: 0 6px 6px 10; + } + +.z .sectionInner .sectionInner2 { + background-color: #fff; + border: 1px solid #a9a9a9; + padding: 4px; + margin: -6px 6px 6px -6px !important; + margin: 0; +} + +ins { + color: magenta; + text-decoration: none; +} + +dl.compact { + margin-bottom: 1em; + margin-top: 1em; +} + +dl.code > dt { + font-family: mono; +} + +dl.compact > dt { + float: left; + margin-bottom: 0; + padding-right: 8px; + margin-top: 0; + list-style-type: none; +} + +dl.compact > dd { + margin-bottom: 0; + margin-top: 0; + margin-left: 10em; +} + +th, td { + font-size: small; +} diff --git a/docs/docs.js b/docs/docs.js new file mode 100644 index 0000000..0b2a925 --- /dev/null +++ b/docs/docs.js @@ -0,0 +1,53 @@ +window.onload=function() { + var vindex = document.URL.lastIndexOf('venus/'); + var base = document.URL.substring(0,vindex+6); + + var body = document.getElementsByTagName('body')[0]; + var div = document.createElement('div'); + div.setAttribute('class','z'); + var h1 = document.createElement('h1'); + var span = document.createElement('span'); + span.appendChild(document.createTextNode('\u2640')); + span.setAttribute('style','color: magenta'); + h1.appendChild(span); + h1.appendChild(document.createTextNode(' Planet Venus')); + + var inner2=document.createElement('div'); + inner2.setAttribute('class','sectionInner2'); + inner2.appendChild(h1); + + var p = document.createElement('p'); + p.appendChild(document.createTextNode("Planet Venus is an awesome \u2018river of news\u2019 feed reader. It downloads news feeds published by web sites and aggregates their content together into a single combined feed, latest news first.")); + inner2.appendChild(p); + + p = document.createElement('p'); + var a = document.createElement('a'); + a.setAttribute('href',base); + a.appendChild(document.createTextNode('Download')); + p.appendChild(a); + p.appendChild(document.createTextNode(" \u00b7 ")); + a = document.createElement('a'); + a.setAttribute('href',base+'docs/'); + a.appendChild(document.createTextNode('Documentation')); + p.appendChild(a); + p.appendChild(document.createTextNode(" \u00b7 ")); + a = document.createElement('a'); + a.setAttribute('href',base+'tests/'); + a.appendChild(document.createTextNode('Unit tests')); + p.appendChild(a); + p.appendChild(document.createTextNode(" \u00b7 ")); + a = document.createElement('a'); + a.setAttribute('href','http://lists.planetplanet.org/mailman/listinfo/devel'); + a.appendChild(document.createTextNode('Mailing list')); + p.appendChild(a); + inner2.appendChild(p); + + var inner1=document.createElement('div'); + inner1.setAttribute('class','sectionInner'); + inner1.setAttribute('id','inner1'); + inner1.appendChild(inner2); + + div.appendChild(inner1); + + body.insertBefore(div, body.firstChild); +} diff --git a/docs/filters.html b/docs/filters.html new file mode 100644 index 0000000..7efdb88 --- /dev/null +++ b/docs/filters.html @@ -0,0 +1,61 @@ + + + + +Venus Filters + + +

Filters

+

Filters are simple Unix pipes. Input comes in stdin, +parameters come from the config file, and output goes to stdout. +Anything written to stderr is logged as an ERROR message. If no +stdout is produced, the entry is not written to the cache or +processed further.

+ +

Input to a filter is a aggressively +normalized entry. For +example, if a feed is RSS 1.0 with 10 items, the filter will be called ten +times, each with a single Atom 1.0 entry, with all textConstructs +expressed as XHTML, and everything encoded as UTF-8.

+ +

You will find a small set of example filters in the filters directory. The coral cdn filter will change links +to images in the entry itself. The filters in the stripAd subdirectory will strip specific +types of advertisements that you may find in feeds.

+ +

The excerpt filter adds metadata (in +the form of a planet:excerpt element) to the feed itself. You +can see examples of how parameters are passed to this program in either +excerpt-images or +opml-top100.ini.

+ +

The xpath sifter is a variation of +the above, including or excluding feeds based on the presence (or absence) of +data specified by xpath +expressions.

+ +

Notes

+ + + + diff --git a/docs/img/shadow.gif b/docs/img/shadow.gif new file mode 100644 index 0000000..f1e6cb5 Binary files /dev/null and b/docs/img/shadow.gif differ diff --git a/docs/img/shadow2.gif b/docs/img/shadow2.gif new file mode 100644 index 0000000..a0b9ed4 Binary files /dev/null and b/docs/img/shadow2.gif differ diff --git a/docs/img/shadowAlpha.png b/docs/img/shadowAlpha.png new file mode 100644 index 0000000..a2561df Binary files /dev/null and b/docs/img/shadowAlpha.png differ diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 0000000..c85f103 --- /dev/null +++ b/docs/index.html @@ -0,0 +1,42 @@ + + + + +Venus Documentation + + +

Table of Contents

+ + + diff --git a/docs/migration.html b/docs/migration.html new file mode 100644 index 0000000..8c6405c --- /dev/null +++ b/docs/migration.html @@ -0,0 +1,21 @@ + + + + +Venus Migration + + +

Migration from Planet 2.0

+

The intent is that existing Planet 2.0 users should be able to reuse +their existing config.ini and .tmpl files, +but the reality is that users will need to be aware of the following:

+ + + diff --git a/docs/normalization.html b/docs/normalization.html new file mode 100644 index 0000000..2f9a5f2 --- /dev/null +++ b/docs/normalization.html @@ -0,0 +1,87 @@ + + + + +Venus Normalization + + +

Normalization

+

Venus builds on, and extends, the Universal Feed Parser and BeautifulSoup to +convert all feeds into Atom 1.0, with well formed XHTML, and encoded as utf-8, +meaning that you don't have to worry about funky feeds, tag soup, or character +encoding.

+

Encoding

+

Input data in feeds may be enocded in a variety of formats, most commonly +ASCII, ISO-8859-1, WIN-1252, AND UTF-8. Additionally, many feeds make use of +the wide range of +character entity +references provided by HTML. Each is converted to UTF-8, an encoding +which is a proper superset of ASCII, supports the entire range of Unicode +characters, and is one of +only two +encodings required to be supported by all conformant XML processors.

+

Encoding problems are one of the more common feed errors, and every +attempt is made to correct common errors, such as the inclusion of +the so-called +moronic versions +of smart-quotes. In rare cases where individual characters can not be +converted to valid UTF-8 or into +characters allowed in XML 1.0 +documents, such characters will be replaced with the Unicode +Replacement character, with a title that describes the original character whenever possible.

+

In order to support the widest range of inputs, use of Python 2.3 or later, +as well as the installation of the python iconvcodec, is +recommended.

+

HTML

+

A number of different normalizations of HTML are performed. For starters, +the HTML is +sanitized, +meaning that HTML tags and attributes that could introduce javascript or +other security risks are removed.

+

Then, +relative +links are resolved within the HTML. This is also done for links +in other areas in the feed too.

+

Finally, unmatched tags are closed. This is done with a +knowledge of the semantics of HTML. Additionally, a +large +subset of MathML, as well as a +tiny profile of SVG is also supported.

+

Atom 1.0

+

The Universal Feed Parser also +normalizes the content of feeds. This involves a +large number of elements; the best place to start is to look at +annotated examples. Among other things a large number of +date formats +are converted into +RFC 3339 formatted dates.

+

If no ids are found in entries, attempts are made to synthesize one using (in order):

+ +

If no updated dates are found in an entry, or if the dates found +are in the future, the current time is substitued.

+

Overrides

+

All of the above describes what Venus does automatically, either directly +or through its dependencies. There are a number of errors which can not +be corrected automatically, and for these, there are configuration parameters +that can be used to help.

+ + + diff --git a/docs/templates.html b/docs/templates.html new file mode 100644 index 0000000..8ccf572 --- /dev/null +++ b/docs/templates.html @@ -0,0 +1,121 @@ + + + + +Venus Templates + + +

Templates

+

Template names take the form +name.ext.type, where +name.ext identifies the name of the output file +to be created in the output_directory, and type +indicates which language processor to use for the template.

+

Like with filters, templates may be written +in a variety of languages and are based on the standard Unix pipe convention +of producing stdout from stdin, but in practice +two languages are used more than others:

+

htmltmpl

+

Many find htmltmpl +easier to get started with as you can take a simple example of your +output file, sprinkle in a few <TMPL_VAR>s and +<TMPL_LOOP>s and you are done. Eventually, however, +you may find that your template involves <TMPL_IF> +blocks inside of attribute values, and you may find the result difficult +to read and create correctly.

+

It is also important to note that htmltmpl based templates do not +have access to the full set of information available in the feed, just +the following (rather substantial) subset:

+ +
+ + + + + + + + + + + + + + + + + + +
VARtypesource
authorStringauthor
author_nameStringauthor_detail.name
generatorStringgenerator
idStringid
iconStringicon
last_updated_822Rfc822updated_parsed
last_updated_isoRfc3399updated_parsed
last_updatedPlanetDateupdated_parsed
linkStringlink
logoStringlogo
rightsStringrights_detail.value
subtitleStringsubtitle_detail.value
titleStringtitle_detail.value
title_plainPlaintitle_detail.value
urlStringlinks[rel='self'].href
headers['location']
+
+ +

Note: when multiple sources are listed, the last one wins

+

In addition to these variables, Planet Venus makes available two +arrays, Channels and Items, with one entry +per subscription and per output entry respectively. The data values +within the Channels array exactly match the above list. +The data values within the Items array are as follows:

+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
VARtypesource
authorStringauthor
author_emailStringauthor_detail.email
author_nameStringauthor_detail.name
author_uriStringauthor_detail.href
content_languageStringcontent[0].language
contentStringsummary_detail.value
content[0].value
datePlanetDatepublished_parsed
updated_parsed
date_822Rfc822published_parsed
updated_parsed
date_isoRfc3399published_parsed
updated_parsed
idStringid
linkStringlinks[rel='alternate'].href
new_channelStringid
new_dateNewDatepublished_parsed
updated_parsed
rightsStringrights_detail.value
title_languageStringtitle_detail.language
title_plainPlaintitle_detail.value
titleStringtitle_detail.value
summary_languageStringsummary_detail.language
updatedPlanetDateupdated_parsed
updated_822Rfc822updated_parsed
updated_isoRfc3399updated_parsed
publishedPlanetDatepublished_parsed
published_822Rfc822published_parsed
published_isoRfc3399published_parsed
+
+

Note: variables above which start with +new_ are only set if their values differ from the previous +Item.

+ +

xslt

+

XSLT is a paradox: it actually +makes some simple things easier to do than htmltmpl, and certainly can +make more difficult things possible; but it is fair to say that many +find XSLT less approachable than htmltmpl.

+

But in any case, the XSLT support is easier to document as the +input is a highly normalized feed, +with a few extension elements.

+ + +