diff --git a/docs/config.html b/docs/config.html new file mode 100644 index 0000000..9370ee8 --- /dev/null +++ b/docs/config.html @@ -0,0 +1,116 @@ + +
+ + +Configuration files are in ConfigParser format which basically means the same
+format as INI files, i.e., they consist of a series of
+[sections]
, in square brackets, with each section containing a
+list of name:value
pairs (or name=value
pairs, if
+you prefer).
You are welcome to place your entire configuration into one file. +Alternately, you may factor out the templating into a "theme", and +the list of subscriptions into one or more "reading lists".
+[planet]
This is the only required section, which is a bit odd as none of the
+parameters listed below are required. Even so, you really do want to
+provide many of these, especially ones that identify your planet and
+either (or both) of template_files
and theme
.
Below is a complete list of predefined planet configuration parameters,
+including ones not (yet) implemented by Venus and ones that
+are either new or implemented differently by Venus.
++ ++
+- name
+- Your planet's name
+- link
+- Link to the main page
+- owner_name
+- Your name
+- owner_email
+- Your e-mail address
+- cache_directory
+- Where cached feeds are stored
+- log_level
+- One of
+DEBUG
,INFO
,WARNING
,ERROR
orCRITICAL
- output_theme
+- Directory containing a
+config.ini
file which is merged +with this one. This is typically used to specify templating and bill of +material information.- output_dir
+- Directory to place output files
+- items_per_page
+- How many items to put on each page. Whereas Planet 2.0 allows this to +be overridden on a per template basis, Venus currently takes the maximum value +for this across all templates.
+- +
days_per_page- How many complete days of posts to put on each page This is the absolute, hard limit (over the item limit)
+- date_format
+- strftime format for the default 'date' template variable
+- new_date_format
+- strftime format for the 'new_date' template variable only applies to htmltmpl templates
+- +
encoding- Output encoding for the file, Python 2.3+ users can use the special "xml" value to output ASCII with XML character references
+- +
locale- Locale to use for (e.g.) strings in dates, default is taken from your system
+- feed_timeout
+- Number of seconds to wait for any given feed
+- +
new_feed_items- Number of items to take from new feeds
+- activity_threshold
+- If non-zero, all feeds which have not been updated in the indicated +number of days will be marked as inactive
+- template_files
+- Space-separated list of output template files
+- template_directories
+- Space-separated list of directories in which
+template_files
+can be found- bill_of_materials
+- Space-separated list of files to be copied as is directly from the
+template_directories
to theoutput_dir
- filters
+- Space-separated list of filters to apply to each entry
+
[DEFAULT]
Values placed in this section are used as default values for all sections. +While it is true that few values make sense in all sections; in most cases +unused parameters cause few problems.
+ +[
subscription]
All sections other than planet
, DEFAULT
, or are
+named in [planet]
's filters
or
+templatefiles
parameters
+are treated as subscriptions and typically take the form of a
+URI.
Parameters placed in this section are passed to templates. While
+you are free to include as few or as many parameters as you like, most of
+the predefined themes presume that at least name
is defined.
The content_type
parameter can be defined to indicate that
+this subscription is a reading list, i.e., is an external list
+of subscriptions. At the moment, two formats of reading lists are supported:
+opml
and foaf
. In the future, support for formats
+like xoxo
could be added.
Normalization overrides can +also be defined here.
+ +[
template]
Sections which are listed in [planet] template_files
are
+processed as templates. With Planet 2.0,
+it is possible to override parameters like items_per_page
+on a per template basis, but at the current time Planet Venus doesn't
+implement this.
[
filter]
Sections which are listed in [planet] filters
are
+processed as filters.
Parameters which are listed in this section are passed to the filter +in a language specific manner. Given the way defaults work, filters +should be prepared to ignore parameters that they didn't expect.
+ + diff --git a/docs/docs.css b/docs/docs.css new file mode 100644 index 0000000..c5a1baf --- /dev/null +++ b/docs/docs.css @@ -0,0 +1,106 @@ +body { + background-color: #fff; + color: #333; + font-family: 'Lucida Grande', Verdana, Geneva, Lucida, Helvetica, sans-serif; + font-size: small; + margin: 40px; + padding: 0; +} + +a:link, a:visited { + background-color: transparent; + color: #333; + text-decoration: none !important; + border-bottom: 1px dotted #333 !important; + text-decoration: underline; + border-bottom: 0; +} + +a:hover { + background-color: transparent; + color: #993344; + text-decoration: none !important; + text-decoration: underline; + border-bottom: 1px dotted #993344 !important; + border-bottom: 0; +} + +code { + color: green; + font-size: large +} + +h1 { + margin: 8px 0 10px 20px; + padding: 0; + font-variant: small-caps; + letter-spacing: 0.1em; + font-family: "Book Antiqua", Georgia, Palatino, Times, "Times New Roman", serif; +} + +h2 { + clear: both; +} + +ul.outer > li { + margin: 14px 0 10px 0; +} + +.z { + float:left; + background: url(img/shadowAlpha.png) no-repeat bottom right !important; + background: url(img/shadow.gif) no-repeat bottom right; + margin: -15px 0 20px -15px !important; +} + +.z p { + margin: 14px 0 10px 15px !important; +} + +.z .sectionInner { + width: 730px; + background: none !important; + background: url(img/shadow2.gif) no-repeat left top; + padding: 0 !important; + padding: 0 6px 6px 10; + } + +.z .sectionInner .sectionInner2 { + background-color: #fff; + border: 1px solid #a9a9a9; + padding: 4px; + margin: -6px 6px 6px -6px !important; + margin: 0; +} + +ins { + color: magenta; + text-decoration: none; +} + +dl.compact { + margin-bottom: 1em; + margin-top: 1em; +} + +dl.code > dt { + font-family: mono; +} + +dl.compact > dt { + float: left; + margin-bottom: 0; + padding-right: 8px; + margin-top: 0; + list-style-type: none; +} + +dl.compact > dd { + margin-bottom: 0; + margin-top: 0; + margin-left: 10em; +} + +th, td { + font-size: small; +} diff --git a/docs/docs.js b/docs/docs.js new file mode 100644 index 0000000..0b2a925 --- /dev/null +++ b/docs/docs.js @@ -0,0 +1,53 @@ +window.onload=function() { + var vindex = document.URL.lastIndexOf('venus/'); + var base = document.URL.substring(0,vindex+6); + + var body = document.getElementsByTagName('body')[0]; + var div = document.createElement('div'); + div.setAttribute('class','z'); + var h1 = document.createElement('h1'); + var span = document.createElement('span'); + span.appendChild(document.createTextNode('\u2640')); + span.setAttribute('style','color: magenta'); + h1.appendChild(span); + h1.appendChild(document.createTextNode(' Planet Venus')); + + var inner2=document.createElement('div'); + inner2.setAttribute('class','sectionInner2'); + inner2.appendChild(h1); + + var p = document.createElement('p'); + p.appendChild(document.createTextNode("Planet Venus is an awesome \u2018river of news\u2019 feed reader. It downloads news feeds published by web sites and aggregates their content together into a single combined feed, latest news first.")); + inner2.appendChild(p); + + p = document.createElement('p'); + var a = document.createElement('a'); + a.setAttribute('href',base); + a.appendChild(document.createTextNode('Download')); + p.appendChild(a); + p.appendChild(document.createTextNode(" \u00b7 ")); + a = document.createElement('a'); + a.setAttribute('href',base+'docs/'); + a.appendChild(document.createTextNode('Documentation')); + p.appendChild(a); + p.appendChild(document.createTextNode(" \u00b7 ")); + a = document.createElement('a'); + a.setAttribute('href',base+'tests/'); + a.appendChild(document.createTextNode('Unit tests')); + p.appendChild(a); + p.appendChild(document.createTextNode(" \u00b7 ")); + a = document.createElement('a'); + a.setAttribute('href','http://lists.planetplanet.org/mailman/listinfo/devel'); + a.appendChild(document.createTextNode('Mailing list')); + p.appendChild(a); + inner2.appendChild(p); + + var inner1=document.createElement('div'); + inner1.setAttribute('class','sectionInner'); + inner1.setAttribute('id','inner1'); + inner1.appendChild(inner2); + + div.appendChild(inner1); + + body.insertBefore(div, body.firstChild); +} diff --git a/docs/filters.html b/docs/filters.html new file mode 100644 index 0000000..7efdb88 --- /dev/null +++ b/docs/filters.html @@ -0,0 +1,61 @@ + + + + +Filters are simple Unix pipes. Input comes in stdin
,
+parameters come from the config file, and output goes to stdout
.
+Anything written to stderr
is logged as an ERROR message. If no
+stdout
is produced, the entry is not written to the cache or
+processed further.
Input to a filter is a aggressively +normalized entry. For +example, if a feed is RSS 1.0 with 10 items, the filter will be called ten +times, each with a single Atom 1.0 entry, with all textConstructs +expressed as XHTML, and everything encoded as UTF-8.
+ +You will find a small set of example filters in the filters directory. The coral cdn filter will change links +to images in the entry itself. The filters in the stripAd subdirectory will strip specific +types of advertisements that you may find in feeds.
+ +The excerpt filter adds metadata (in
+the form of a planet:excerpt
element) to the feed itself. You
+can see examples of how parameters are passed to this program in either
+excerpt-images or
+opml-top100.ini.
The xpath sifter is a variation of +the above, including or excluding feeds based on the presence (or absence) of +data specified by xpath +expressions.
+ +.py
invokes
+python. .xslt
involkes xslt. .sed
and
+.tmpl
(a.k.a. htmltmp) are also options. Other languages, like
+perl or ruby or class/jar (java), aren't supported at the moment, but these
+would be easy to add.[planet]
section of your config.ini
+will be invoked on all feeds. Filters listed in individual
+[feed]
sections will only be invoked on those feeds.The intent is that existing Planet 2.0 users should be able to reuse
+their existing config.ini
and .tmpl
files,
+but the reality is that users will need to be aware of the following:
.tmpl
and .ini
files should work,
+though some configuration options (e.g.,
+days_per_page
) have not yet been implementedVenus builds on, and extends, the Universal Feed Parser and BeautifulSoup to +convert all feeds into Atom 1.0, with well formed XHTML, and encoded as utf-8, +meaning that you don't have to worry about funky feeds, tag soup, or character +encoding.
+Input data in feeds may be enocded in a variety of formats, most commonly +ASCII, ISO-8859-1, WIN-1252, AND UTF-8. Additionally, many feeds make use of +the wide range of +character entity +references provided by HTML. Each is converted to UTF-8, an encoding +which is a proper superset of ASCII, supports the entire range of Unicode +characters, and is one of +only two +encodings required to be supported by all conformant XML processors.
+Encoding problems are one of the more common feed errors, and every +attempt is made to correct common errors, such as the inclusion of +the so-called +moronic versions +of smart-quotes. In rare cases where individual characters can not be +converted to valid UTF-8 or into +characters allowed in XML 1.0 +documents, such characters will be replaced with the Unicode +Replacement character, with a title that describes the original character whenever possible.
+In order to support the widest range of inputs, use of Python 2.3 or later,
+as well as the installation of the python iconvcodec
, is
+recommended.
A number of different normalizations of HTML are performed. For starters, +the HTML is +sanitized, +meaning that HTML tags and attributes that could introduce javascript or +other security risks are removed.
+Then, +relative +links are resolved within the HTML. This is also done for links +in other areas in the feed too.
+Finally, unmatched tags are closed. This is done with a +knowledge of the semantics of HTML. Additionally, a +large +subset of MathML, as well as a +tiny profile of SVG is also supported.
+The Universal Feed Parser also +normalizes the content of feeds. This involves a +large number of elements; the best place to start is to look at +annotated examples. Among other things a large number of +date formats +are converted into +RFC 3339 formatted dates.
+If no ids are found in entries, attempts are made to synthesize one using (in order):
+ +If no updated dates are found in an entry, or if the dates found +are in the future, the current time is substitued.
+All of the above describes what Venus does automatically, either directly +or through its dependencies. There are a number of errors which can not +be corrected automatically, and for these, there are configuration parameters +that can be used to help.
+ignore_in_feed
allows you to list any number of elements
+which are to be ignored in feeds. This is often handy in the case of feeds
+where the id
or updated
values can't be trusted.title_type
, summary_type
,
+content_type
allow you to override the
+type
+attributes on these elements.name_type
does something similar for
+author namesTemplate names take the form
+name.
ext.
type, where
+name.
ext identifies the name of the output file
+to be created in the output_directory
, and type
+indicates which language processor to use for the template.
Like with filters, templates may be written
+in a variety of languages and are based on the standard Unix pipe convention
+of producing stdout
from stdin
, but in practice
+two languages are used more than others:
Many find htmltmpl
+easier to get started with as you can take a simple example of your
+output file, sprinkle in a few <TMPL_VAR>
s and
+<TMPL_LOOP>
s and you are done. Eventually, however,
+you may find that your template involves <TMPL_IF>
+blocks inside of attribute values, and you may find the result difficult
+to read and create correctly.
It is also important to note that htmltmpl based templates do not +have access to the full set of information available in the feed, just +the following (rather substantial) subset:
+ +++ ++
+VAR type source ++ author String author + author_name String author_detail.name + generator String generator + id String id + icon String icon + last_updated_822 Rfc822 updated_parsed + last_updated_iso Rfc3399 updated_parsed + last_updated PlanetDate updated_parsed + link String link + logo String logo + rights String rights_detail.value + subtitle String subtitle_detail.value + title String title_detail.value + title_plain Plain title_detail.value + url String links[rel='self'].href + headers['location']
Note: when multiple sources are listed, the last one wins
+In addition to these variables, Planet Venus makes available two
+arrays, Channels
and Items
, with one entry
+per subscription and per output entry respectively. The data values
+within the Channels
array exactly match the above list.
+The data values within the Items
array are as follows:
+++
+VAR type source ++ author String author + author_email String author_detail.email + author_name String author_detail.name + author_uri String author_detail.href + content_language String content[0].language + content String summary_detail.value + content[0].value + date PlanetDate published_parsed + updated_parsed + date_822 Rfc822 published_parsed + updated_parsed + date_iso Rfc3399 published_parsed + updated_parsed + id String id + link String links[rel='alternate'].href + new_channel String id + new_date NewDate published_parsed + updated_parsed + rights String rights_detail.value + title_language String title_detail.language + title_plain Plain title_detail.value + title String title_detail.value + summary_language String summary_detail.language + updated PlanetDate updated_parsed + updated_822 Rfc822 updated_parsed + updated_iso Rfc3399 updated_parsed + published PlanetDate published_parsed + published_822 Rfc822 published_parsed + published_iso Rfc3399 published_parsed
Note: variables above which start with
+new_
are only set if their values differ from the previous
+Item.
XSLT is a paradox: it actually +makes some simple things easier to do than htmltmpl, and certainly can +make more difficult things possible; but it is fair to say that many +find XSLT less approachable than htmltmpl.
+But in any case, the XSLT support is easier to document as the +input is a highly normalized feed, +with a few extension elements.
+atom:feed
will have the following child elements:planet:source
element per subscription, with the same child elements as atom:source
, as well as
+an additional child element in the planet namespace for each
+configuration parameter that applies to
+this subscription.planet:format
indicating the format and version of the source feed.planet:bozo
which is either true
or false
.atom:updated
and atom:published
will have
+a planet:format
attribute containing the referenced date
+formatted according to the [planet] date_format
specified
+in the configuration