Misc changes from Sam.

This commit is contained in:
Jacques Distler 2006-11-12 17:45:55 -06:00
commit 838269ed8f
84 changed files with 39985 additions and 276 deletions

View File

@ -1 +1,3 @@
*.tmplc *.tmplc
.DS_Store
cache

167
INSTALL
View File

@ -1,167 +0,0 @@
Installing Planet
-----------------
You'll need at least Python 2.2 installed on your system, we recommend
Python 2.4 though as there may be bugs with the earlier libraries.
Everything Pythonesque Planet need to provide basic operation should be
included in the distribution. Additionally:
* Usage of XSLT requires either xsltproc or python-libxslt.
* The current interface to filters written in non-templating languages
(e.g., python) uses the subprocess module which was introduced in
Python 2.4.
* Usage of FOAF as a reading list requires librdf.
Instructions:
i.
First you'll need to extract the files into a folder somewhere.
I expect you've already done this, after all, you're reading this
file. You can place this wherever you like, ~/planet is a good
choice, but so's anywhere else you prefer.
ii.
This is very important: from within that directory, type the following
command:
python runtests.py
This should take anywhere from a one to ten seconds to execute. No network
connection is required, and the script cleans up after itself. If the
script completes with an "OK", you are good to go. Otherwise stopping here
and inquiring on the mailing list is a good idea as it can save you lots of
frustration down the road.
iii.
Make a copy of one of the 'ini' the files in the 'examples' subdirectory,
and put them wherever you like; I like to use the Planet's name (so
~/planet/debian), but it's really up to you.
iv.
Edit the config.ini file in this directory to taste, it's pretty
well documented so you shouldn't have any problems here. Pay
particular attention to the 'output_dir' option, which should be
readable by your web server. If the directory you specify in your
'cache_dir' exists, make sure that it is empty.
v.
Run it: python planet.py pathto/config.ini
You'll want to add this to cron, make sure you run it from the
right directory.
vi. (Optional)
Tell us about it! We'd love to link to you on planetplanet.org :-)
vii. (Optional)
Build your own themes, templates, or filters! And share!
Template files
--------------
The template files used are given as a whitespace separated list in the
'template_files' option in config.ini. The extension at the end of the
file name indicates what processor to use. Templates may be implemented
using htmltmpl, xslt, or any programming language.
The final extension is removed to form the name of the file placed in the
output directory.
HtmlTmpl files
--------------
Reading through the example templates is recommended, they're designed to
pretty much drop straight into your site with little modification
anyway.
Inside these template files, <TMPL_VAR xxx> is replaced with the content
of the 'xxx' variable. The variables available are:
name .... } the value of the equivalent options
link .... } from the [Planet] section of your
owner_name . } Planet's config.ini file
owner_email }
url .... link with the output filename appended
generator .. version of planet being used
date .... { your date format
date_iso ... current date and time in { ISO date format
date_822 ... { RFC822 date format
There are also two loops, 'Items' and 'Channels'. All of the lines of
the template and variable substitutions are available for each item or
channel. Loops are created using <TMPL_LOOP LoopName>...</TMPL_LOOP>
and may be used as many times as you wish.
The 'Channels' loop iterates all of the channels (feeds) defined in the
configuration file, within it the following variables are available:
name .... value of the 'name' option in config.ini, or title
title .... title retreived from the channel's feed
tagline .... description retreived from the channel's feed
link .... link for the human-readable content (from the feed)
url .... url of the channel's feed itself
Additionally the value of any other option specified in config.ini
for the feed, or in the [DEFAULT] section, is available as a
variable of the same name.
Depending on the feed, there may be a huge variety of other
variables may be available; the best way to find out what you
have is using the 'planet-cache' tool to examine your cache files.
The 'Items' loop iterates all of the blog entries from all of the channels,
you do not place it inside a 'Channels' loop. Within it, the following
variables are available:
id .... unique id for this entry (sometimes just the link)
link .... link to a human-readable version at the origin site
title .... title of the entry
summary .... a short "first page" summary
content .... the full content of the entry
date .... { your date format
date_iso ... date and time of the entry in { ISO date format
date_822 ... { RFC822 date format
If the entry takes place on a date that has no prior entry has
taken place on, the 'new_date' variable is set to that date.
This allows you to break up the page by day.
If the entry is from a different channel to the previous entry,
or is the first entry from this channel on this day
the 'new_channel' variable is set to the same value as the
'channel_url' variable. This allows you to collate multiple
entries from the same person under the same banner.
Additionally the value of any variable that would be defined
for the channel is available, with 'channel_' prepended to the
name (e.g. 'channel_name' and 'channel_link').
Depending on the feed, there may be a huge variety of other
variables may be available; the best way to find out what you
have is using the 'planet-cache' tool to examine your cache files.
There are also a couple of other special things you can do in a template.
- If you want HTML escaping applied to the value of a variable, use the
<TMPL_VAR xxx ESCAPE="HTML"> form.
- If you want URI escaping applied to the value of a variable, use the
<TMPL_VAR xxx ESCAPE="URI"> form.
- To only include a section of the template if the variable has a
non-empty value, you can use <TMPL_IF xxx>....</TMPL_IF>. e.g.
<TMPL_IF new_date>
<h1><TMPL_VAR new_date></h1>
</TMPL_IF>
You may place a <TMPL_ELSE> within this block to specify an
alternative, or may use <TMPL_UNLESS xxx>...</TMPL_UNLESS> to
perform the opposite.

8
README
View File

@ -9,11 +9,11 @@ also actively being maintained.
It uses Mark Pilgrim's Universal Feed Parser to read from CDF, RDF, RSS and It uses Mark Pilgrim's Universal Feed Parser to read from CDF, RDF, RSS and
Atom feeds; Leonard Richardson's Beautiful Soup to correct markup issues; Atom feeds; Leonard Richardson's Beautiful Soup to correct markup issues;
and Tomas Styblo's templating engine to output static files in any and either Tomas Styblo's templating engine Daniel Viellard's implementation
format you can dream up. of XSLT to output static files in any format you can dream up.
To get started, check out the INSTALL file in this directory. If you have any To get started, check out the documentation in the docs directory. If you have
questions or comments, please don't hesitate to use the planet mailing list: any questions or comments, please don't hesitate to use the planet mailing list:
http://lists.planetplanet.org/mailman/listinfo/devel http://lists.planetplanet.org/mailman/listinfo/devel

7
THANKS
View File

@ -4,6 +4,13 @@ Elias Torres - FOAF OnlineAccounts
Jacques Distler - Template patches Jacques Distler - Template patches
Michael Koziarski - HTTP Auth fix Michael Koziarski - HTTP Auth fix
Brian Ewins - Win32 / Portalocker Brian Ewins - Win32 / Portalocker
Joe Gregorio - Invoke same version of Python for filters
Harry Fuecks - Pipe characters in file names, filter bug
Eric van der Vlist - Filters to add language, category information
Chris Dolan - mkdir cache; default template_dirs; fix xsltproc
David Sifry - rss 2.0 xslt template based on http://atom.geekhood.net/
Morten Fredericksen - Support WordPress LinkManager OPML
Harry Fuecks - default item date to feed date
This codebase represents a radical refactoring of Planet 2.0, which lists This codebase represents a radical refactoring of Planet 2.0, which lists
the following contributors: the following contributors:

5
TODO
View File

@ -1,11 +1,6 @@
TODO TODO
==== ====
* Enable per-feed adjustments
The goal is to better cope with feeds that don't have dates or ids or
consitently encode or escape things incorrectly.
* Expire feed history * Expire feed history
The feed cache doesn't currently expire old entries, so could get The feed cache doesn't currently expire old entries, so could get

140
docs/config.html Normal file
View File

@ -0,0 +1,140 @@
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Configuration</title>
</head>
<body>
<h2>Configuration</h2>
<p>Configuration files are in <a href="http://docs.python.org/lib/module-
ConfigParser.html">ConfigParser</a> format which basically means the same
format as INI files, i.e., they consist of a series of
<code>[sections]</code>, in square brackets, with each section containing a
list of <code>name:value</code> pairs (or <code>name=value</code> pairs, if
you prefer).</p>
<p>You are welcome to place your entire configuration into one file.
Alternately, you may factor out the templating into a "theme", and
the list of subscriptions into one or more "reading lists".</p>
<h3 id="planet"><code>[planet]</code></h3>
<p>This is the only required section, which is a bit odd as none of the
parameters listed below are required. Even so, you really do want to
provide many of these, especially ones that identify your planet and
either (or both) of <code>template_files</code> and <code>theme</code>.</p>
<p>Below is a complete list of predefined planet configuration parameters,
including <del>ones not (yet) implemented by Venus</del> and <ins>ones that
are either new or implemented differently by Venus</ins>.</p>
<blockquote>
<dl class="compact code">
<dt>name</dt>
<dd>Your planet's name</dd>
<dt>link</dt>
<dd>Link to the main page</dd>
<dt>owner_name</dt>
<dd>Your name</dd>
<dt>owner_email</dt>
<dd>Your e-mail address</dd>
</dl>
<dl class="compact code">
<dt>cache_directory</dt>
<dd>Where cached feeds are stored</dd>
<dt>output_dir</dt>
<dd>Directory to place output files</dd>
</dl>
<dl class="compact code">
<dt><ins>output_theme</ins></dt>
<dd>Directory containing a <code>config.ini</code> file which is merged
with this one. This is typically used to specify templating and bill of
material information.</dd>
<dt>template_files</dt>
<dd>Space-separated list of output template files</dd>
<dt><ins>template_directories</ins></dt>
<dd>Space-separated list of directories in which <code>template_files</code>
can be found</dd>
<dt><ins>bill_of_materials</ins></dt>
<dd>Space-separated list of files to be copied as is directly from the <code>template_directories</code> to the <code>output_dir</code></dd>
<dt><ins>filters</ins></dt>
<dd>Space-separated list of filters to apply to each entry</dd>
</dl>
<dl class="compact code">
<dt>items_per_page</dt>
<dd>How many items to put on each page. <ins>Whereas Planet 2.0 allows this to
be overridden on a per template basis, Venus currently takes the maximum value
for this across all templates.</ins></dd>
<dt><del>days_per_page</del></dt>
<dd>How many complete days of posts to put on each page This is the absolute, hard limit (over the item limit)</dd>
<dt>date_format</dt>
<dd><a href="http://docs.python.org/lib/module-time.html#l2h-2816">strftime</a> format for the default 'date' template variable</dd>
<dt>new_date_format</dt>
<dd><a href="http://docs.python.org/lib/module-time.html#l2h-2816">strftime</a> format for the 'new_date' template variable <ins>only applies to htmltmpl templates</ins></dd>
<dt><del>encoding</del></dt>
<dd>Output encoding for the file, Python 2.3+ users can use the special "xml" value to output ASCII with XML character references</dd>
<dt><del>locale</del></dt>
<dd>Locale to use for (e.g.) strings in dates, default is taken from your system</dd>
<dt>activity_threshold</dt>
<dd>If non-zero, all feeds which have not been updated in the indicated
number of days will be marked as inactive</dd>
</dl>
<dl class="compact code">
<dt>log_level</dt>
<dd>One of <code>DEBUG</code>, <code>INFO</code>, <code>WARNING</code>, <code>ERROR</code> or <code>CRITICAL</code></dd>
<dt><ins>log_format</ins></dt>
<dd><a href="http://docs.python.org/lib/node422.html">format string</a> to
use for logging output. Note: this configuration value is processed
<a href="http://docs.python.org/lib/ConfigParser-objects.html">raw</a></dd>
<dt>feed_timeout</dt>
<dd>Number of seconds to wait for any given feed</dd>
<dt><del>new_feed_items</del></dt>
<dd>Number of items to take from new feeds</dd>
</dl>
</blockquote>
<h3 id="default"><code>[DEFAULT]</code></h3>
<p>Values placed in this section are used as default values for all sections.
While it is true that few values make sense in all sections; in most cases
unused parameters cause few problems.</p>
<h3 id="subscription"><code>[</code><em>subscription</em><code>]</code></h3>
<p>All sections other than <code>planet</code>, <code>DEFAULT</code>, or are
named in <code>[planet]</code>'s <code>filters</code> or
<code>templatefiles</code> parameters
are treated as subscriptions and typically take the form of a
<acronym title="Uniform Resource Identifier">URI</acronym>.</p>
<p>Parameters placed in this section are passed to templates. While
you are free to include as few or as many parameters as you like, most of
the predefined themes presume that at least <code>name</code> is defined.</p>
<p>The <code>content_type</code> parameter can be defined to indicate that
this subscription is a <em>reading list</em>, i.e., is an external list
of subscriptions. At the moment, two formats of reading lists are supported:
<code>opml</code> and <code>foaf</code>. In the future, support for formats
like <code>xoxo</code> could be added.</p>
<p><a href="normalization.html#overrides">Normalization overrides</a> can
also be defined here.</p>
<h3 id="template"><code>[</code><em>template</em><code>]</code></h3>
<p>Sections which are listed in <code>[planet] template_files</code> are
processed as <a href="templates.html">templates</a>. With Planet 2.0,
it is possible to override parameters like <code>items_per_page</code>
on a per template basis, but at the current time Planet Venus doesn't
implement this.</p>
<h3 id="filter"><code>[</code><em>filter</em><code>]</code></h3>
<p>Sections which are listed in <code>[planet] filters</code> are
processed as <a href="filters.html">filters</a>.</p>
<p>Parameters which are listed in this section are passed to the filter
in a language specific manner. Given the way defaults work, filters
should be prepared to ignore parameters that they didn't expect.</p>
</body>
</html>

104
docs/docs.css Normal file
View File

@ -0,0 +1,104 @@
body {
background-color: #fff;
color: #333;
font-family: 'Lucida Grande', Verdana, Geneva, Lucida, Helvetica, sans-serif;
font-size: small;
margin: 40px;
padding: 0;
}
a:link, a:visited {
background-color: transparent;
color: #333;
text-decoration: none !important;
border-bottom: 1px dotted #333 !important;
}
a:hover {
background-color: transparent;
color: #934;
text-decoration: none !important;
border-bottom: 1px dotted #993344 !important;
}
pre, code {
background-color: #FFF;
color: #00F;
font-size: large
}
h1 {
margin: 8px 0 10px 20px;
padding: 0;
font-variant: small-caps;
letter-spacing: 0.1em;
font-family: "Book Antiqua", Georgia, Palatino, Times, "Times New Roman", serif;
}
h2 {
clear: both;
}
ul, ul.outer > li {
margin: 14px 0 10px 0;
}
.z {
float:left;
background: url(img/shadowAlpha.png) no-repeat bottom right !important;
margin: -15px 0 20px -15px !important;
}
.z .logo {
color: magenta;
}
.z p {
margin: 14px 0 10px 15px !important;
}
.z .sectionInner {
width: 730px;
background: none !important;
padding: 0 !important;
}
.z .sectionInner .sectionInner2 {
border: 1px solid #a9a9a9;
padding: 4px;
margin: -6px 6px 6px -6px !important;
}
ins {
background-color: #FFF;
color: #F0F;
text-decoration: none;
}
dl.compact {
margin-bottom: 1em;
margin-top: 1em;
}
dl.code > dt {
font-family: monospace;
font-size: large;
}
dl.compact > dt {
float: left;
margin-bottom: 0;
padding-right: 8px;
margin-top: 0;
list-style-type: none;
}
dl.compact > dd {
margin-bottom: 0;
margin-top: 0;
margin-left: 10em;
}
th, td {
font-size: small;
}

54
docs/docs.js Normal file
View File

@ -0,0 +1,54 @@
window.onload=function() {
var vindex = document.URL.lastIndexOf('venus/');
if (vindex<0) vindex = document.URL.lastIndexOf('planet/');
var base = document.URL.substring(0,vindex+6);
var body = document.getElementsByTagName('body')[0];
var div = document.createElement('div');
div.setAttribute('class','z');
var h1 = document.createElement('h1');
var span = document.createElement('span');
span.appendChild(document.createTextNode('\u2640'));
span.setAttribute('class','logo');
h1.appendChild(span);
h1.appendChild(document.createTextNode(' Planet Venus'));
var inner2=document.createElement('div');
inner2.setAttribute('class','sectionInner2');
inner2.appendChild(h1);
var p = document.createElement('p');
p.appendChild(document.createTextNode("Planet Venus is an awesome \u2018river of news\u2019 feed reader. It downloads news feeds published by web sites and aggregates their content together into a single combined feed, latest news first."));
inner2.appendChild(p);
p = document.createElement('p');
var a = document.createElement('a');
a.setAttribute('href',base+'index.html');
a.appendChild(document.createTextNode('Download'));
p.appendChild(a);
p.appendChild(document.createTextNode(" \u00b7 "));
a = document.createElement('a');
a.setAttribute('href',base+'docs/index.html');
a.appendChild(document.createTextNode('Documentation'));
p.appendChild(a);
p.appendChild(document.createTextNode(" \u00b7 "));
a = document.createElement('a');
a.setAttribute('href',base+'tests/');
a.appendChild(document.createTextNode('Unit tests'));
p.appendChild(a);
p.appendChild(document.createTextNode(" \u00b7 "));
a = document.createElement('a');
a.setAttribute('href','http://lists.planetplanet.org/mailman/listinfo/devel');
a.appendChild(document.createTextNode('Mailing list'));
p.appendChild(a);
inner2.appendChild(p);
var inner1=document.createElement('div');
inner1.setAttribute('class','sectionInner');
inner1.setAttribute('id','inner1');
inner1.appendChild(inner2);
div.appendChild(inner1);
body.insertBefore(div, body.firstChild);
}

71
docs/filters.html Normal file
View File

@ -0,0 +1,71 @@
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Filters</title>
</head>
<body>
<h2>Filters</h2>
<p>Filters are simple Unix pipes. Input comes in <code>stdin</code>,
parameters come from the config file, and output goes to <code>stdout</code>.
Anything written to <code>stderr</code> is logged as an ERROR message. If no
<code>stdout</code> is produced, the entry is not written to the cache or
processed further.</p>
<p>Input to a filter is a aggressively
<a href="normalization.html">normalized</a> entry. For
example, if a feed is RSS 1.0 with 10 items, the filter will be called ten
times, each with a single Atom 1.0 entry, with all textConstructs
expressed as XHTML, and everything encoded as UTF-8.</p>
<p>You will find a small set of example filters in the <a
href="../filters">filters</a> directory. The <a
href="../filters/coral_cdn_filter.py">coral cdn filter</a> will change links
to images in the entry itself. The filters in the <a
href="../filters/stripAd/">stripAd</a> subdirectory will strip specific
types of advertisements that you may find in feeds.</p>
<p>The <a href="../filters/excerpt.py">excerpt</a> filter adds metadata (in
the form of a <code>planet:excerpt</code> element) to the feed itself. You
can see examples of how parameters are passed to this program in either
<a href="../tests/data/filter/excerpt-images.ini">excerpt-images</a> or
<a href="../examples/opml-top100.ini">opml-top100.ini</a>.
Alternately parameters may be passed
<abbr title="Uniform Resource Identifier">URI</abbr> style, for example:
<a href="../tests/data/filter/excerpt-images2.ini">excerpt-images2</a>.
</p>
<p>The <a href="../filters/xpath_sifter.py">xpath sifter</a> is a variation of
the above, including or excluding feeds based on the presence (or absence) of
data specified by <a href="http://www.w3.org/TR/xpath20/">xpath</a>
expressions. Again, parameters can be passed as
<a href="../tests/data/filter/xpath-sifter.ini">config options</a> or
<a href="../tests/data/filter/xpath-sifter2.ini">URI style</a>.
</p>
<h3>Notes</h3>
<ul>
<li>The file extension of the filter is significant. <code>.py</code> invokes
python. <code>.xslt</code> involkes XSLT. <code>.sed</code> and
<code>.tmpl</code> (a.k.a. htmltmp) are also options. Other languages, like
perl or ruby or class/jar (java), aren't supported at the moment, but these
would be easy to add.</li>
<li>Any filters listed in the <code>[planet]</code> section of your config.ini
will be invoked on all feeds. Filters listed in individual
<code>[feed]</code> sections will only be invoked on those feeds.</li>
<li>Filters are simply invoked in the order they are listed in the
configuration file (think unix pipes). Planet wide filters are executed before
feed specific filters.</li>
<li>Templates written using htmltmpl currently only have access to a fixed set
of fields, whereas XSLT templates have access to everything.</li>
</ul>
</body>
</html>

BIN
docs/img/shadowAlpha.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.3 KiB

51
docs/index.html Normal file
View File

@ -0,0 +1,51 @@
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Documentation</title>
</head>
<body>
<h2>Table of Contents</h2>
<ul class="outer">
<li><a href="installation.html">Getting started</a></li>
<li>Basic Features
<ul>
<li><a href="config.html">Configuration</a></li>
<li><a href="templates.html">Templates</a></li>
</ul>
</li>
<li>Advanced Features
<ul>
<li><a href="venus.svg">Architecture</a></li>
<li><a href="normalization.html">Normalization</a></li>
<li><a href="filters.html">Filters</a></li>
</ul>
</li>
<li>Other
<ul>
<li><a href="migration.html">Migration from Planet 2.0</a></li>
</ul>
</li>
<li>Reference
<ul>
<li><a href="http://www.planetplanet.org/">Planet</a></li>
<li><a href="http://feedparser.org/docs/">Universal Feed Parser</a></li>
<li><a href="http://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a></li>
<li><a href="http://htmltmpl.sourceforge.net/">htmltmpl</a></li>
<li><a href="http://www.w3.org/TR/xslt">XSLT</a></li>
<li><a href="http://www.gnu.org/software/sed/manual/html_mono/sed.html">sed</a></li>
</ul>
</li>
<li>Credits and License
<ul>
<li><a href="../AUTHORS">Authors</a></li>
<li><a href="../THANKS">Contributors</a></li>
<li><a href="../LICENCE">License</a></li>
</ul>
</li>
</ul>
</body>
</html>

112
docs/installation.html Normal file
View File

@ -0,0 +1,112 @@
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Installation</title>
</head>
<body>
<h2>Installation</h2>
<p>Venus has been tested on Linux, and Mac OSX, and Windows.</p>
<p>You'll need at least Python 2.2 installed on your system, we recommend
Python 2.4 though as there may be bugs with the earlier libraries.</p>
<p>Everything Pythonesque Planet needs to provide basic operation should be
included in the distribution. Some optional features may require
additional libraries, for example:</p>
<ul>
<li>Usage of XSLT requires either
<a href="http://xmlsoft.org/XSLT/xsltproc2.html">xsltproc</a>
or <a href="http://xmlsoft.org/XSLT/python.html">python-libxslt</a>.</li>
<li>The current interface to filters written in non-templating languages
(e.g., python) uses the
<a href="http://docs.python.org/lib/module-subprocess.html">subprocess</a>
module which was introduced in Python 2.4.</li>
<li>Usage of FOAF as a reading list requires
<a href="http://librdf.org/">librdf</a>.</li>
</ul>
<h3>General Instructions</h3>
<p>
These instructions apply to any platform. Check the instructions
below for more specific instructions for your platform.
</p>
<ol>
<li><p>If you are reading this online, you will need to
<a href="../index.html">download</a> and extract the files into a folder somewhere.
You can place this wherever you like, <code>~/planet</code>
and <code>~/venus</code> are good
choices, but so's anywhere else you prefer.</p></li>
<li><p>This is very important: from within that directory, type the following
command:</p>
<blockquote><code>python runtests.py</code></blockquote>
<p>This should take anywhere from a one to ten seconds to execute. No network
connection is required, and the script cleans up after itself. If the
script completes with an "OK", you are good to go. Otherwise stopping here
and inquiring on the
<a href="http://lists.planetplanet.org/mailman/listinfo/devel">mailing list</a>
is a good idea as it can save you lots of frustration down the road.</p></li>
<li><p>Make a copy of one of the <code>ini</code> the files in the
<a href="../examples">examples</a> subdirectory,
and put it wherever you like; I like to use the Planet's name (so
<code>~/planet/debian</code>), but it's really up to you.</p></li>
<li><p>Edit the <code>config.ini</code> file in this directory to taste,
it's pretty well documented so you shouldn't have any problems here. Pay
particular attention to the <code>output_dir</code> option, which should be
readable by your web server. If the directory you specify in your
<code>cache_dir</code> exists; make sure that it is empty.</p></li>
<li><p>Run it: <code>python planet.py pathto/config.ini</code></p>
<p>You'll want to add this to cron, make sure you run it from the
right directory.</p></li>
<li><p>(Optional)</p>
<p>Tell us about it! We'd love to link to you on planetplanet.org :-)</p></li>
<li><p>(Optional)</p>
<p>Build your own themes, templates, or filters! And share!</p></li>
</ol>
<h3>Mac OS X and Fink Instructions</h3>
<p>
The <a href="http://fink.sourceforge.net/">Fink Project</a> packages
various open source software for MacOS. This makes it a little easier
to get started with projects like Planet Venus.
</p>
<p>
Note: in the following, we recommend explicitly
using <code>python2.4</code>. As of this writing, Fink is starting to
support <code>python2.5</code> but the XML libraries, for example, are
not yet ported to the newer python so Venus will be less featureful.
</p>
<ol>
<li><p>Install the XCode development tools from your Mac OS X install
disks</p></li>
<li><p><a href="http://fink.sourceforge.net/download/">Download</a>
and install Fink</p></li>
<li><p>Tell fink to install the Planet Venus prerequisites:<br />
<code>fink install python24 celementtree-py24 bzr-py24 libxslt-py24
libxml2-py24</code></p></li>
<li><p><a href="../index.html">Download</a> and extract the Venus files into a
folder somewhere</p></li>
<li><p>Run the tests: <code>python2.4 runtests.py</code><br /> This
will warn you that the RDF library is missing, but that's
OK.</p></li>
<li><p>Continue with the general steps above, starting with Step 3. You
may want to explicitly specify <code>python2.4</code>.</p></li>
</ol>
<h3>Ubuntu Linux (Edgy Eft) instructions</h3>
<p>Before starting, issue the following command:</p>
<ul>
<li><code>sudo apt-get install bzr python2.4-librdf</code></li>
</ul>
</body>
</html>

42
docs/migration.html Normal file
View File

@ -0,0 +1,42 @@
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Migration</title>
</head>
<body>
<h2>Migration from Planet 2.0</h2>
<p>The intent is that existing Planet 2.0 users should be able to reuse
their existing <code>config.ini</code> and <code>.tmpl</code> files,
but the reality is that users will need to be aware of the following:</p>
<ul>
<li>You will need to start over with a new cache directory as the format
of the cache has changed dramatically.</li>
<li>Existing <code>.tmpl</code> and <code>.ini</code> files should work,
though some <a href="config.html">configuration</a> options (e.g.,
<code>days_per_page</code>) have not yet been implemented</li>
<li>No testing has been done on Python 2.1, and it is presumed not to work.</li>
<li>To take advantage of all features, you should install the optional
XML and RDF libraries described on
the <a href="installation.html">Installation</a> page.</li>
</ul>
<p>
Common changes to config.ini include:
</p>
<ul>
<li><p>Filename changes:</p>
<pre>
examples/fancy/index.html.tmpl => themes/classic_fancy/index.html.tmpl
examples/atom.xml.tmpl => themes/common/atom.xml.xslt
examples/rss20.xml.tmpl => themes/common/rss20.xml.tmpl
examples/rss10.xml.tmpl => themes/common/rss10.xml.tmpl
examples/opml.xml.tmpl => themes/common/opml.xml.xslt
examples/foafroll.xml.tmpl => themes/common/foafroll.xml.xslt
</pre></li>
</ul>
</body>
</html>

92
docs/normalization.html Normal file
View File

@ -0,0 +1,92 @@
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Normalization</title>
</head>
<body>
<h2>Normalization</h2>
<p>Venus builds on, and extends, the <a
href="http://www.feedparser.org/">Universal Feed Parser</a> and <a
href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a> to
convert all feeds into Atom 1.0, with well formed XHTML, and encoded as UTF-8,
meaning that you don't have to worry about funky feeds, tag soup, or character
encoding.</p>
<h3>Encoding</h3>
<p>Input data in feeds may be encoded in a variety of formats, most commonly
ASCII, ISO-8859-1, WIN-1252, AND UTF-8. Additionally, many feeds make use of
the wide range of
<a href="http://www.w3.org/TR/html401/sgml/entities.html">character entity
references</a> provided by HTML. Each is converted to UTF-8, an encoding
which is a proper superset of ASCII, supports the entire range of Unicode
characters, and is one of
<a href="http://www.w3.org/TR/2006/REC-xml-20060816/#charsets">only two</a>
encodings required to be supported by all conformant XML processors.</p>
<p>Encoding problems are one of the more common feed errors, and every
attempt is made to correct common errors, such as the inclusion of
the so-called
<a href="http://www.fourmilab.ch/webtools/demoroniser/">moronic</a> versions
of smart-quotes. In rare cases where individual characters can not be
converted to valid UTF-8 or into
<a href="http://www.w3.org/TR/xml/#charsets">characters allowed in XML 1.0
documents</a>, such characters will be replaced with the Unicode
<a href="http://www.fileformat.info/info/unicode/char/fffd/index.htm">Replacement character</a>, with a title that describes the original character whenever possible.</p>
<p>In order to support the widest range of inputs, use of Python 2.3 or later,
as well as the installation of the python <code>iconvcodec</code>, is
recommended.</p>
<h3>HTML</h3>
<p>A number of different normalizations of HTML are performed. For starters,
the HTML is
<a href="http://www.feedparser.org/docs/html-sanitization.html">sanitized</a>,
meaning that HTML tags and attributes that could introduce javascript or
other security risks are removed.</p>
<p>Then,
<a href="http://www.feedparser.org/docs/resolving-relative-links.html">relative
links are resolved</a> within the HTML. This is also done for links
in other areas in the feed too.</p>
<p>Finally, unmatched tags are closed. This is done with a
<a href="http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing%20HTML">knowledge of the semantics of HTML</a>. Additionally, a
<a href="http://golem.ph.utexas.edu/~distler/blog/archives/000165.html#sanitizespec">large
subset of MathML</a>, as well as a
<a href="http://www.w3.org/TR/SVGMobile/">tiny profile of SVG</a>
is also supported.</p>
<h3>Atom 1.0</h3>
<p>The Universal Feed Parser also
<a href="http://www.feedparser.org/docs/content-normalization.html">normalizes the content of feeds</a>. This involves a
<a href="http://www.feedparser.org/docs/reference.html">large number of elements</a>; the best place to start is to look at
<a href="http://www.feedparser.org/docs/annotated-examples.html">annotated examples</a>. Among other things a wide variety of
<a href="http://www.feedparser.org/docs/date-parsing.html">date formats</a>
are converted into
<a href="http://www.ietf.org/rfc/rfc3339.txt">RFC 3339</a> formatted dates.</p>
<p>If no <a href="http://www.feedparser.org/docs/reference-entry-id.html">ids</a> are found in entries, attempts are made to synthesize one using (in order):</p>
<ul>
<li><a href="http://www.feedparser.org/docs/reference-entry-link.html">link</a></li>
<li><a href="http://www.feedparser.org/docs/reference-entry-title.html">title</a></li>
<li><a href="http://www.feedparser.org/docs/reference-entry-summary.html">summary</a></li>
<li><a href="http://www.feedparser.org/docs/reference-entry-content.html">content</a></li>
</ul>
<p>If no <a href="http://www.feedparser.org/docs/reference-feed-
updated.html">updated</a> dates are found in an entry, or if the dates found
are in the future, the current time is substituted.</p>
<h3 id="overrides">Overrides</h3>
<p>All of the above describes what Venus does automatically, either directly
or through its dependencies. There are a number of errors which can not
be corrected automatically, and for these, there are configuration parameters
that can be used to help.</p>
<ul>
<li><code>ignore_in_feed</code> allows you to list any number of elements
or attributes which are to be ignored in feeds. This is often handy in the
case of feeds where the <code>id</code>, <code>updated</code> or
<code>xml:lang</code> values can't be trusted.</li>
<li><code>title_type</code>, <code>summary_type</code>,
<code>content_type</code> allow you to override the
<a href="http://www.feedparser.org/docs/reference-entry-title_detail.html#reference.entry.title_detail.type"><code>type</code></a>
attributes on these elements.</li>
<li><code>name_type</code> does something similar for
<a href="http://www.feedparser.org/docs/reference-entry-author_detail.html#reference.entry.author_detail.name">author names</a></li>
</ul>
</body>
</html>

129
docs/templates.html Normal file
View File

@ -0,0 +1,129 @@
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Templates</title>
</head>
<body>
<h2>Templates</h2>
<p>Template names take the form
<em>name</em><code>.</code><em>ext</em><code>.</code><em>type</em>, where
<em>name</em><code>.</code><em>ext</em> identifies the name of the output file
to be created in the <code>output_directory</code>, and <em>type</em>
indicates which language processor to use for the template.</p>
<p>Like with <a href="filter.html">filters</a>, templates may be written
in a variety of languages and are based on the standard Unix pipe convention
of producing <code>stdout</code> from <code>stdin</code>, but in practice
two languages are used more than others:</p>
<h3>htmltmpl</h3>
<p>Many find <a href="http://htmltmpl.sourceforge.net/">htmltmpl</a>
easier to get started with as you can take a simple example of your
output file, sprinkle in a few <code>&lt;TMPL_VAR&gt;</code>s and
<code>&lt;TMPL_LOOP&gt;</code>s and you are done. Eventually, however,
you may find that your template involves <code>&lt;TMPL_IF&gt;</code>
blocks inside of attribute values, and you may find the result difficult
to read and create correctly.</p>
<p>It is also important to note that htmltmpl based templates do not
have access to the full set of information available in the feed, just
the following (rather substantial) subset:</p>
<blockquote>
<table border="1" cellpadding="5" cellspacing="0">
<tr><th>VAR</th><th>type</th><th>source</th></tr>
<tr><td>author</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-author.html">author</a></td></tr>
<tr><td>author_name</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-author_detail.html#reference.feed.author_detail.name">author_detail.name</a></td></tr>
<tr><td>generator</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-generator.html">generator</a></td></tr>
<tr><td>id</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-id.html">id</a></td></tr>
<tr><td>icon</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-icon.html">icon</a></td></tr>
<tr><td>last_updated_822</td><td>Rfc822</td><td><a href="http://feedparser.org/docs/reference-feed-icon.html">updated_parsed</a></td></tr>
<tr><td>last_updated_iso</td><td>Rfc3399</td><td><a href="http://feedparser.org/docs/reference-feed-icon.html">updated_parsed</a></td></tr>
<tr><td>last_updated</td><td>PlanetDate</td><td><a href="http://feedparser.org/docs/reference-feed-icon.html">updated_parsed</a></td></tr>
<tr><td>link</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-link.html">link</a></td></tr>
<tr><td>logo</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-logo.html">logo</a></td></tr>
<tr><td>rights</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-rights_detail.html#reference.feed.rights_detail.value">rights_detail.value</a></td></tr>
<tr><td>subtitle</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-subtitle_detail.html#reference.feed.subtitle_detail.value">subtitle_detail.value</a></td></tr>
<tr><td>title</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-title_detail.html#reference.feed.title_detail.value">title_detail.value</a></td></tr>
<tr><td>title_plain</td><td>Plain</td><td><a href="http://feedparser.org/docs/reference-feed-title_detail.html#reference.feed.title_detail.value">title_detail.value</a></td></tr>
<tr><td rowspan="2">url</td><td rowspan="2">String</td><td><a href="http://feedparser.org/docs/reference-feed-links.html#reference.feed.links.href">links[rel='self'].href</a></td></tr>
<tr><td><a href="http://feedparser.org/docs/reference-headers.html">headers['location']</a></td></tr>
</table>
</blockquote>
<p>Note: when multiple sources are listed, the last one wins</p>
<p>In addition to these variables, Planet Venus makes available two
arrays, <code>Channels</code> and <code>Items</code>, with one entry
per subscription and per output entry respectively. The data values
within the <code>Channels</code> array exactly match the above list.
The data values within the <code>Items</code> array are as follows:</p>
<blockquote>
<table border="1" cellpadding="5" cellspacing="0">
<tr><th>VAR</th><th>type</th><th>source</th></tr>
<tr><td>author</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-author.html">author</a></td></tr>
<tr><td>author_email</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-author_detail.html#reference.entry.author_detail.email">author_detail.email</a></td></tr>
<tr><td>author_name</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-author_detail.html#reference.entry.author_detail.name">author_detail.name</a></td></tr>
<tr><td>author_uri</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-author_detail.html#reference.entry.author_detail.href">author_detail.href</a></td></tr>
<tr><td>content_language</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-content.html#reference.entry.content.language">content[0].language</a></td></tr>
<tr><td rowspan="2">content</td><td rowspan="2">String</td><td><a href="http://feedparser.org/docs/reference-entry-summary_detail.html#reference.entry.summary_detail.value">summary_detail.value</a></td></tr>
<tr><td><a href="http://feedparser.org/docs/reference-entry-content.html#reference.entry.content.value">content[0].value</a></td></tr>
<tr><td rowspan="2">date</td><td rowspan="2">PlanetDate</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
<tr><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
<tr><td rowspan="2">date_822</td><td rowspan="2">Rfc822</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
<tr><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
<tr><td rowspan="2">date_iso</td><td rowspan="2">Rfc3399</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
<tr><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
<tr><td><ins>enclosure_href</ins></td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-enclosures.html#reference.entry.enclosures.href">enclosures[0].href</a></td></tr>
<tr><td><ins>enclosure_length</ins></td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-enclosures.html#reference.entry.enclosures.length">enclosures[0].length</a></td></tr>
<tr><td><ins>enclosure_type</ins></td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-enclosures.html#reference.entry.enclosures.type">enclosures[0].type</a></td></tr>
<tr><td><ins>guid_isPermaLink</ins></td><td>String</td><td><a href="http://blogs.law.harvard.edu/tech/rss#ltguidgtSubelementOfLtitemgt">isPermaLink</a></td></tr>
<tr><td>id</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-id.html">id</a></td></tr>
<tr><td>link</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-links.html#reference.entry.links.href">links[rel='alternate'].href</a></td></tr>
<tr><td>new_channel</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-id.html">id</a></td></tr>
<tr><td rowspan="2">new_date</td><td rowspan="2">NewDate</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
<tr><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
<tr><td>rights</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-rights_detail.html#reference.entry.rights_detail.value">rights_detail.value</a></td></tr>
<tr><td>title_language</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-title_detail.html#reference.entry.title_detail.language">title_detail.language</a></td></tr>
<tr><td>title_plain</td><td>Plain</td><td><a href="http://feedparser.org/docs/reference-entry-title_detail.html#reference.entry.title_detail.value">title_detail.value</a></td></tr>
<tr><td>title</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-title_detail.html#reference.entry.title_detail.value">title_detail.value</a></td></tr>
<tr><td>summary_language</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-summary_detail.html#reference.entry.summary_detail.language">summary_detail.language</a></td></tr>
<tr><td>updated</td><td>PlanetDate</td><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
<tr><td>updated_822</td><td>Rfc822</td><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
<tr><td>updated_iso</td><td>Rfc3399</td><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
<tr><td>published</td><td>PlanetDate</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
<tr><td>published_822</td><td>Rfc822</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
<tr><td>published_iso</td><td>Rfc3399</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
</table>
</blockquote>
<p>Note: variables above which start with
<code>new_</code> are only set if their values differ from the previous
Item.</p>
<h3>xslt</h3>
<p><a href="http://www.w3.org/TR/xslt">XSLT</a> is a paradox: it actually
makes some simple things easier to do than htmltmpl, and certainly can
make more difficult things possible; but it is fair to say that many
find XSLT less approachable than htmltmpl.</p>
<p>But in any case, the XSLT support is easier to document as the
input is a <a href="normalization.html">highly normalized</a> feed,
with a few extension elements.</p>
<ul>
<li><code>atom:feed</code> will have the following child elements:
<ul>
<li>A <code>planet:source</code> element per subscription, with the same child elements as <a href="http://www.atomenabled.org/developers/syndication/atom-format-spec.php#element.source"><code>atom:source</code></a>, as well as
an additional child element in the planet namespace for each
<a href="config.html#subscription">configuration parameter</a> that applies to
this subscription.</li>
<li><a href="http://www.feedparser.org/docs/reference-version.html"><code>planet:format</code></a> indicating the format and version of the source feed.</li>
<li><a href="http://www.feedparser.org/docs/reference-bozo.html"><code>planet:bozo</code></a> which is either <code>true</code> or <code>false</code>.</li>
</ul>
</li>
<li><code>atom:updated</code> and <code>atom:published</code> will have
a <code>planet:format</code> attribute containing the referenced date
formatted according to the <code>[planet] date_format</code> specified
in the configuration</li>
</ul>
</body>
</html>

View File

@ -0,0 +1,82 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY categoryTerm "WebSemantique">
]>
<!--
This transformation is released under the same licence as Python
see http://www.intertwingly.net/code/venus/LICENCE.
Author: Eric van der Vlist <vdv@dyomedea.com>
This transformation is meant to be used as a filter that determines if
Atom entries are relevant to a specific topic and adds the corresonding
<category/> element when it is the case.
This is done by a simple keyword matching mechanism.
To customize this filter to your needs:
1) Replace WebSemantique by your own category name in the definition of
the categoryTerm entity above.
2) Review the "upper" and "lower" variables that are used to convert text
nodes to lower case and replace common ponctuation signs into spaces
to check that they meet your needs.
3) Define your own list of keywords in <d:keyword/> elements. Note that
the leading and trailing spaces are significant: "> rdf <" will match rdf
as en entier word while ">rdf<" would match the substring "rdf" and
"> rdf<" would match words starting by rdf. Also note that the test is done
after conversion to lowercase.
To use it with venus, just add this filter to the list of filters, for instance:
filters= categories.xslt guess_language.py
-->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:atom="http://www.w3.org/2005/Atom" xmlns="http://www.w3.org/2005/Atom"
xmlns:d="http://ns.websemantique.org/data/" exclude-result-prefixes="d atom" version="1.0">
<xsl:variable name="upper"
>,.;AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZzÀàÁáÂâÃãÄäÅ寿ÇçÈèÉéÊêËëÌìÍíÎîÏïÐðÑñÒòÓóÔôÕõÖöØøÙùÚúÛûÜüÝýÞþ</xsl:variable>
<xsl:variable name="lower"
> aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzzaaaaaaaaaaaaææcceeeeeeeeiiiiiiiiððnnooooooooooøøuuuuuuuuyyþþ</xsl:variable>
<d:keywords>
<d:keyword> wiki semantique </d:keyword>
<d:keyword> wikis semantiques </d:keyword>
<d:keyword> web semantique </d:keyword>
<d:keyword> websemantique </d:keyword>
<d:keyword> semantic web</d:keyword>
<d:keyword> semweb</d:keyword>
<d:keyword> rdf</d:keyword>
<d:keyword> owl </d:keyword>
<d:keyword> sparql </d:keyword>
<d:keyword> topic map</d:keyword>
<d:keyword> doap </d:keyword>
<d:keyword> foaf </d:keyword>
<d:keyword> sioc </d:keyword>
<d:keyword> ontology </d:keyword>
<d:keyword> ontologie</d:keyword>
<d:keyword> dublin core </d:keyword>
</d:keywords>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="atom:entry/atom:updated">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
<xsl:variable name="concatenatedText">
<xsl:for-each select="../atom:title|../atom:summary|../atom:content|../atom:category/@term">
<xsl:text> </xsl:text>
<xsl:value-of select="translate(., $upper, $lower)"/>
</xsl:for-each>
<xsl:text> </xsl:text>
</xsl:variable>
<xsl:if test="document('')/*/d:keywords/d:keyword[contains($concatenatedText, .)]">
<category term="WebSemantique"/>
</xsl:if>
</xsl:template>
<xsl:template match="atom:category[@term='&categoryTerm;']"/>
</xsl:stylesheet>

View File

@ -0,0 +1,37 @@
This filter is released under the same licence as Python
see http://www.intertwingly.net/code/venus/LICENCE.
Author: Eric van der Vlist <vdv@dyomedea.com>
This filter guesses whether an Atom entry is written
in English or French. It should be trivial to chose between
two other languages, easy to extend to more than two languages
and useful to pass these languages as Venus configuration
parameters.
The code used to guess the language is the one that has been
described by Douglas Bagnall as the Python recipe titled
"Language detection using character trigrams"
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/326576.
To add support for a new language, this language must first be
"learned" using learn-language.py. This learning phase is nothing
more than saving a pickled version of the Trigram object for this
language.
To learn Finnish, you would execute:
$ ./learn-language.py http://gutenberg.net/dirs/1/0/4/9/10492/10492-8.txt fi.data
where http://gutenberg.net/dirs/1/0/4/9/10492/10492-8.txt is a text
representative of the Finnish language and "fi.data" is the name of the
data file for "fi" (ISO code for Finnish).
To install this filter, copy this directory under the Venus
filter directory and declare it in your filters list, for instance:
filters= categories.xslt guess-language/guess-language.py
NOTE: this filter depends on Amara
(http://uche.ogbuji.net/tech/4suite/amara/)

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,58 @@
#!/usr/bin/env python
"""A filter to guess languages.
This filter guesses whether an Atom entry is written
in English or French. It should be trivial to chose between
two other languages, easy to extend to more than two languages
and useful to pass these languages as Venus configuration
parameters.
(See the REAME file for more details).
Requires Python 2.1, recommends 2.4.
"""
__authors__ = [ "Eric van der Vlist <vdv@dyomedea.com>"]
__license__ = "Python"
import amara
from sys import stdin, stdout
from trigram import Trigram
from xml.dom import XML_NAMESPACE as XML_NS
import cPickle
ATOM_NSS = {
u'atom': u'http://www.w3.org/2005/Atom',
u'xml': XML_NS
}
langs = {}
def tri(lang):
if not langs.has_key(lang):
f = open('filters/guess-language/%s.data' % lang, 'r')
t = cPickle.load(f)
f.close()
langs[lang] = t
return langs[lang]
def guess_language(entry):
text = u'';
for child in entry.xml_xpath(u'atom:title|atom:summary|atom:content'):
text = text + u' '+ child.__unicode__()
t = Trigram()
t.parseString(text)
if tri('fr') - t > tri('en') - t:
lang=u'en'
else:
lang=u'fr'
entry.xml_set_attribute((u'xml:lang', XML_NS), lang)
def main():
feed = amara.parse(stdin, prefixes=ATOM_NSS)
for entry in feed.xml_xpath(u'//atom:entry[not(@xml:lang)]'):
guess_language(entry)
feed.xml(stdout)
if __name__ == '__main__':
main()

View File

@ -0,0 +1,25 @@
#!/usr/bin/env python
"""A filter to guess languages.
This utility saves a Trigram object on file.
(See the REAME file for more details).
Requires Python 2.1, recommends 2.4.
"""
__authors__ = [ "Eric van der Vlist <vdv@dyomedea.com>"]
__license__ = "Python"
from trigram import Trigram
from sys import argv
from cPickle import dump
def main():
tri = Trigram(argv[1])
out = open(argv[2], 'w')
dump(tri, out)
out.close()
if __name__ == '__main__':
main()

View File

@ -0,0 +1,188 @@
#!/usr/bin/python
# -*- coding: UTF-8 -*-
"""
This class is based on the Python recipe titled
"Language detection using character trigrams"
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/326576
by Douglas Bagnall.
It has been (slightly) adapted by Eric van der Vlist to support
Unicode and accept a method to parse strings.
"""
__authors__ = [ "Douglas Bagnall", "Eric van der Vlist <vdv@dyomedea.com>"]
__license__ = "Python"
import random
from urllib import urlopen
class Trigram:
"""
From one or more text files, the frequency of three character
sequences is calculated. When treated as a vector, this information
can be compared to other trigrams, and the difference between them
seen as an angle. The cosine of this angle varies between 1 for
complete similarity, and 0 for utter difference. Since letter
combinations are characteristic to a language, this can be used to
determine the language of a body of text. For example:
>>> reference_en = Trigram('/path/to/reference/text/english')
>>> reference_de = Trigram('/path/to/reference/text/german')
>>> unknown = Trigram('url://pointing/to/unknown/text')
>>> unknown.similarity(reference_de)
0.4
>>> unknown.similarity(reference_en)
0.95
would indicate the unknown text is almost cetrtainly English. As
syntax sugar, the minus sign is overloaded to return the difference
between texts, so the above objects would give you:
>>> unknown - reference_de
0.6
>>> reference_en - unknown # order doesn't matter.
0.05
As it stands, the Trigram ignores character set information, which
means you can only accurately compare within a single encoding
(iso-8859-1 in the examples). A more complete implementation might
convert to unicode first.
As an extra bonus, there is a method to make up nonsense words in the
style of the Trigram's text.
>>> reference_en.makeWords(30)
My withillonquiver and ald, by now wittlectionsurper, may sequia,
tory, I ad my notter. Marriusbabilly She lady for rachalle spen
hat knong al elf
Beware when using urls: HTML won't be parsed out.
Most methods chatter away to standard output, to let you know they're
still there.
"""
length = 0
def __init__(self, fn=None):
self.lut = {}
if fn is not None:
self.parseFile(fn)
def _parseAFragment(self, line, pair=' '):
for letter in line:
d = self.lut.setdefault(pair, {})
d[letter] = d.get(letter, 0) + 1
pair = pair[1] + letter
return pair
def parseString(self, string):
self._parseAFragment(string)
self.measure()
def parseFile(self, fn, encoding="iso-8859-1"):
pair = ' '
if '://' in fn:
#print "trying to fetch url, may take time..."
f = urlopen(fn)
else:
f = open(fn)
for z, line in enumerate(f):
#if not z % 1000:
# print "line %s" % z
# \n's are spurious in a prose context
pair = self._parseAFragment(line.strip().decode(encoding) + ' ')
f.close()
self.measure()
def measure(self):
"""calculates the scalar length of the trigram vector and
stores it in self.length."""
total = 0
for y in self.lut.values():
total += sum([ x * x for x in y.values() ])
self.length = total ** 0.5
def similarity(self, other):
"""returns a number between 0 and 1 indicating similarity.
1 means an identical ratio of trigrams;
0 means no trigrams in common.
"""
if not isinstance(other, Trigram):
raise TypeError("can't compare Trigram with non-Trigram")
lut1 = self.lut
lut2 = other.lut
total = 0
for k in lut1.keys():
if k in lut2:
a = lut1[k]
b = lut2[k]
for x in a:
if x in b:
total += a[x] * b[x]
return float(total) / (self.length * other.length)
def __sub__(self, other):
"""indicates difference between trigram sets; 1 is entirely
different, 0 is entirely the same."""
return 1 - self.similarity(other)
def makeWords(self, count):
"""returns a string of made-up words based on the known text."""
text = []
k = ' '
while count:
n = self.likely(k)
text.append(n)
k = k[1] + n
if n in ' \t':
count -= 1
return ''.join(text)
def likely(self, k):
"""Returns a character likely to follow the given string
two character string, or a space if nothing is found."""
if k not in self.lut:
return ' '
# if you were using this a lot, caching would a good idea.
letters = []
for k, v in self.lut[k].items():
letters.append(k * v)
letters = ''.join(letters)
return random.choice(letters)
def test():
en = Trigram('http://gutenberg.net/dirs/etext97/lsusn11.txt')
#NB fr and some others have English license text.
# no has english excerpts.
fr = Trigram('http://gutenberg.net/dirs/etext03/candi10.txt')
fi = Trigram('http://gutenberg.net/dirs/1/0/4/9/10492/10492-8.txt')
no = Trigram('http://gutenberg.net/dirs/1/2/8/4/12844/12844-8.txt')
se = Trigram('http://gutenberg.net/dirs/1/0/1/1/10117/10117-8.txt')
no2 = Trigram('http://gutenberg.net/dirs/1/3/0/4/13041/13041-8.txt')
en2 = Trigram('http://gutenberg.net/dirs/etext05/cfgsh10.txt')
fr2 = Trigram('http://gutenberg.net/dirs/1/3/7/0/13704/13704-8.txt')
print "calculating difference:"
print "en - fr is %s" % (en - fr)
print "fr - en is %s" % (fr - en)
print "en - en2 is %s" % (en - en2)
print "en - fr2 is %s" % (en - fr2)
print "fr - en2 is %s" % (fr - en2)
print "fr - fr2 is %s" % (fr - fr2)
print "fr2 - en2 is %s" % (fr2 - en2)
print "fi - fr is %s" % (fi - fr)
print "fi - en is %s" % (fi - en)
print "fi - se is %s" % (fi - se)
print "no - se is %s" % (no - se)
print "en - no is %s" % (en - no)
print "no - no2 is %s" % (no - no2)
print "se - no2 is %s" % (se - no2)
print "en - no2 is %s" % (en - no2)
print "fr - no2 is %s" % (fr - no2)
if __name__ == '__main__':
test()

View File

@ -20,6 +20,7 @@ if __name__ == "__main__":
config_file = "config.ini" config_file = "config.ini"
offline = 0 offline = 0
verbose = 0 verbose = 0
only_if_new = 0
for arg in sys.argv[1:]: for arg in sys.argv[1:]:
if arg == "-h" or arg == "--help": if arg == "-h" or arg == "--help":
@ -29,12 +30,15 @@ if __name__ == "__main__":
print " -v, --verbose DEBUG level logging during update" print " -v, --verbose DEBUG level logging during update"
print " -o, --offline Update the Planet from the cache only" print " -o, --offline Update the Planet from the cache only"
print " -h, --help Display this help message and exit" print " -h, --help Display this help message and exit"
print " -n, --only-if-new Only spider new feeds"
print print
sys.exit(0) sys.exit(0)
elif arg == "-v" or arg == "--verbose": elif arg == "-v" or arg == "--verbose":
verbose = 1 verbose = 1
elif arg == "-o" or arg == "--offline": elif arg == "-o" or arg == "--offline":
offline = 1 offline = 1
elif arg == "-n" or arg == "--only-if-new":
only_if_new = 1
elif arg.startswith("-"): elif arg.startswith("-"):
print >>sys.stderr, "Unknown option:", arg print >>sys.stderr, "Unknown option:", arg
sys.exit(1) sys.exit(1)
@ -46,11 +50,11 @@ if __name__ == "__main__":
if verbose: if verbose:
import planet import planet
planet.getLogger('DEBUG') planet.getLogger('DEBUG',config.log_format())
if not offline: if not offline:
from planet import spider from planet import spider
spider.spiderPlanet() spider.spiderPlanet(only_if_new=only_if_new)
from planet import splice from planet import splice
doc = splice.splice() doc = splice.splice()

View File

@ -9,7 +9,7 @@ config.__init__()
from ConfigParser import ConfigParser from ConfigParser import ConfigParser
from urlparse import urljoin from urlparse import urljoin
def getLogger(level): def getLogger(level, format):
""" get a logger with the specified log level """ """ get a logger with the specified log level """
global logger global logger
if logger: return logger if logger: return logger
@ -19,7 +19,7 @@ def getLogger(level):
except: except:
import compat_logging as logging import compat_logging as logging
logging.basicConfig() logging.basicConfig(format=format)
logging.getLogger().setLevel(logging.getLevelName(level)) logging.getLogger().setLevel(logging.getLevelName(level))
logger = logging.getLogger("planet.runner") logger = logging.getLogger("planet.runner")
try: try:

View File

@ -1090,7 +1090,7 @@ Logger.manager = Manager(Logger.root)
BASIC_FORMAT = "%(levelname)s:%(name)s:%(message)s" BASIC_FORMAT = "%(levelname)s:%(name)s:%(message)s"
def basicConfig(): def basicConfig(format=BASIC_FORMAT):
""" """
Do basic configuration for the logging system by creating a Do basic configuration for the logging system by creating a
StreamHandler with a default Formatter and adding it to the StreamHandler with a default Formatter and adding it to the
@ -1098,7 +1098,7 @@ def basicConfig():
""" """
if len(root.handlers) == 0: if len(root.handlers) == 0:
hdlr = StreamHandler() hdlr = StreamHandler()
fmt = Formatter(BASIC_FORMAT) fmt = Formatter(format)
hdlr.setFormatter(fmt) hdlr.setFormatter(fmt)
root.addHandler(hdlr) root.addHandler(hdlr)

View File

@ -32,7 +32,7 @@ from urlparse import urljoin
parser = ConfigParser() parser = ConfigParser()
planet_predefined_options = [] planet_predefined_options = ['filters']
def __init__(): def __init__():
"""define the struture of an ini file""" """define the struture of an ini file"""
@ -43,6 +43,8 @@ def __init__():
if section and parser.has_option(section, option): if section and parser.has_option(section, option):
return parser.get(section, option) return parser.get(section, option)
elif parser.has_option('Planet', option): elif parser.has_option('Planet', option):
if option == 'log_format':
return parser.get('Planet', option, raw=True)
return parser.get('Planet', option) return parser.get('Planet', option)
else: else:
return default return default
@ -69,8 +71,8 @@ def __init__():
planet_predefined_options.append(name) planet_predefined_options.append(name)
# define a list planet-level variable # define a list planet-level variable
def define_planet_list(name): def define_planet_list(name, default=''):
setattr(config, name, lambda : expand(get(None,name,''))) setattr(config, name, lambda : expand(get(None,name,default)))
planet_predefined_options.append(name) planet_predefined_options.append(name)
# define a string template-level variable # define a string template-level variable
@ -88,6 +90,7 @@ def __init__():
define_planet('link', '') define_planet('link', '')
define_planet('cache_directory', "cache") define_planet('cache_directory', "cache")
define_planet('log_level', "WARNING") define_planet('log_level', "WARNING")
define_planet('log_format', "%(levelname)s:%(name)s:%(message)s")
define_planet('feed_timeout', 20) define_planet('feed_timeout', 20)
define_planet('date_format', "%B %d, %Y %I:%M %p") define_planet('date_format', "%B %d, %Y %I:%M %p")
define_planet('new_date_format', "%B %d, %Y") define_planet('new_date_format', "%B %d, %Y")
@ -100,7 +103,7 @@ def __init__():
define_planet_list('template_files') define_planet_list('template_files')
define_planet_list('bill_of_materials') define_planet_list('bill_of_materials')
define_planet_list('template_directories') define_planet_list('template_directories', '.')
define_planet_list('filter_directories') define_planet_list('filter_directories')
# template options # template options
@ -123,7 +126,7 @@ def load(config_file):
import config, planet import config, planet
from planet import opml, foaf from planet import opml, foaf
log = planet.getLogger(config.log_level()) log = planet.getLogger(config.log_level(),config.log_format())
# Theme support # Theme support
theme = config.output_theme() theme = config.output_theme()
@ -146,10 +149,11 @@ def load(config_file):
# complete search list for theme directories # complete search list for theme directories
dirs += [os.path.join(theme_dir,dir) for dir in dirs += [os.path.join(theme_dir,dir) for dir in
config.template_directories()] config.template_directories() if dir not in dirs]
# merge configurations, allowing current one to override theme # merge configurations, allowing current one to override theme
template_files = config.template_files() template_files = config.template_files()
parser.set('Planet','template_files','')
parser.read(config_file) parser.read(config_file)
for file in config.bill_of_materials(): for file in config.bill_of_materials():
if not file in bom: bom.append(file) if not file in bom: bom.append(file)
@ -178,6 +182,12 @@ def load(config_file):
opml.opml2config(data, cached_config) opml.opml2config(data, cached_config)
elif content_type(list).find('foaf')>=0: elif content_type(list).find('foaf')>=0:
foaf.foaf2config(data, cached_config) foaf.foaf2config(data, cached_config)
else:
from planet import shell
import StringIO
cached_config.readfp(StringIO.StringIO(shell.run(
content_type(list), data.getvalue(), mode="filter")))
if cached_config.sections() in [[], [list]]: if cached_config.sections() in [[], [list]]:
raise Exception raise Exception
@ -314,7 +324,7 @@ def reading_lists():
for section in parser.sections(): for section in parser.sections():
if parser.has_option(section, 'content_type'): if parser.has_option(section, 'content_type'):
type = parser.get(section, 'content_type') type = parser.get(section, 'content_type')
if type.find('opml')>=0 or type.find('foaf')>=0: if type.find('opml')>=0 or type.find('foaf')>=0 or type.find('.')>=0:
result.append(section) result.append(section)
return result return result
@ -328,7 +338,8 @@ def filters(section=None):
def planet_options(): def planet_options():
""" dictionary of planet wide options""" """ dictionary of planet wide options"""
return dict(map(lambda opt: (opt, parser.get('Planet',opt)), return dict(map(lambda opt: (opt,
parser.get('Planet', opt, raw=(opt=="log_format"))),
parser.options('Planet'))) parser.options('Planet')))
def feed_options(section): def feed_options(section):

View File

@ -11,7 +11,7 @@ Recommended: Python 2.3 or later
Recommended: CJKCodecs and iconv_codec <http://cjkpython.i18n.org/> Recommended: CJKCodecs and iconv_codec <http://cjkpython.i18n.org/>
""" """
__version__ = "4.2-pre-" + "$Revision: 1.142 $"[11:16] + "-cvs" __version__ = "4.2-pre-" + "$Revision: 1.144 $"[11:16] + "-cvs"
__license__ = """Copyright (c) 2002-2006, Mark Pilgrim, All rights reserved. __license__ = """Copyright (c) 2002-2006, Mark Pilgrim, All rights reserved.
Redistribution and use in source and binary forms, with or without modification, Redistribution and use in source and binary forms, with or without modification,
@ -218,6 +218,9 @@ class FeedParserDict(UserDict):
def __getitem__(self, key): def __getitem__(self, key):
if key == 'category': if key == 'category':
return UserDict.__getitem__(self, 'tags')[0]['term'] return UserDict.__getitem__(self, 'tags')[0]['term']
if key == 'enclosures':
norel = lambda link: FeedParserDict([(name,value) for (name,value) in link.items() if name!='rel'])
return [norel(link) for link in UserDict.__getitem__(self, 'links') if link['rel']=='enclosure']
if key == 'categories': if key == 'categories':
return [(tag['scheme'], tag['term']) for tag in UserDict.__getitem__(self, 'tags')] return [(tag['scheme'], tag['term']) for tag in UserDict.__getitem__(self, 'tags')]
realkey = self.keymap.get(key, key) realkey = self.keymap.get(key, key)
@ -1303,15 +1306,15 @@ class _FeedParserMixin:
attrsD.setdefault('type', 'application/atom+xml') attrsD.setdefault('type', 'application/atom+xml')
else: else:
attrsD.setdefault('type', 'text/html') attrsD.setdefault('type', 'text/html')
context = self._getContext()
attrsD = self._itsAnHrefDamnIt(attrsD) attrsD = self._itsAnHrefDamnIt(attrsD)
if attrsD.has_key('href'): if attrsD.has_key('href'):
attrsD['href'] = self.resolveURI(attrsD['href']) attrsD['href'] = self.resolveURI(attrsD['href'])
if attrsD.get('rel')=='enclosure' and not context.get('id'):
context['id'] = attrsD.get('href')
expectingText = self.infeed or self.inentry or self.insource expectingText = self.infeed or self.inentry or self.insource
context = self._getContext()
context.setdefault('links', []) context.setdefault('links', [])
context['links'].append(FeedParserDict(attrsD)) context['links'].append(FeedParserDict(attrsD))
if attrsD['rel'] == 'enclosure':
self._start_enclosure(attrsD)
if attrsD.has_key('href'): if attrsD.has_key('href'):
expectingText = 0 expectingText = 0
if (attrsD.get('rel') == 'alternate') and (self.mapContentType(attrsD.get('type')) in self.html_types): if (attrsD.get('rel') == 'alternate') and (self.mapContentType(attrsD.get('type')) in self.html_types):
@ -1357,6 +1360,7 @@ class _FeedParserMixin:
self._start_content(attrsD) self._start_content(attrsD)
else: else:
self.pushContent('description', attrsD, 'text/html', self.infeed or self.inentry or self.insource) self.pushContent('description', attrsD, 'text/html', self.infeed or self.inentry or self.insource)
_start_dc_description = _start_description
def _start_abstract(self, attrsD): def _start_abstract(self, attrsD):
self.pushContent('description', attrsD, 'text/plain', self.infeed or self.inentry or self.insource) self.pushContent('description', attrsD, 'text/plain', self.infeed or self.inentry or self.insource)
@ -1368,6 +1372,7 @@ class _FeedParserMixin:
value = self.popContent('description') value = self.popContent('description')
self._summaryKey = None self._summaryKey = None
_end_abstract = _end_description _end_abstract = _end_description
_end_dc_description = _end_description
def _start_info(self, attrsD): def _start_info(self, attrsD):
self.pushContent('info', attrsD, 'text/plain', 1) self.pushContent('info', attrsD, 'text/plain', 1)
@ -1427,7 +1432,8 @@ class _FeedParserMixin:
def _start_enclosure(self, attrsD): def _start_enclosure(self, attrsD):
attrsD = self._itsAnHrefDamnIt(attrsD) attrsD = self._itsAnHrefDamnIt(attrsD)
context = self._getContext() context = self._getContext()
context.setdefault('enclosures', []).append(FeedParserDict(attrsD)) attrsD['rel']='enclosure'
context.setdefault('links', []).append(FeedParserDict(attrsD))
href = attrsD.get('href') href = attrsD.get('href')
if href and not context.get('id'): if href and not context.get('id'):
context['id'] = href context['id'] = href

97
planet/idindex.py Normal file
View File

@ -0,0 +1,97 @@
from glob import glob
import os, sys, dbhash
if __name__ == '__main__':
rootdir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, rootdir)
from planet.spider import filename
from planet import config
def open():
try:
cache = config.cache_directory()
index=os.path.join(cache,'index')
if not os.path.exists(index): return None
return dbhash.open(filename(index, 'id'),'w')
except Exception, e:
if e.__class__.__name__ == 'DBError': e = e.args[-1]
from planet import logger as log
log.error(str(e))
def destroy():
from planet import logger as log
cache = config.cache_directory()
index=os.path.join(cache,'index')
if not os.path.exists(index): return None
idindex = filename(index, 'id')
if os.path.exists(idindex): os.unlink(idindex)
os.removedirs(index)
log.info(idindex + " deleted")
def create():
from planet import logger as log
cache = config.cache_directory()
index=os.path.join(cache,'index')
if not os.path.exists(index): os.makedirs(index)
index = dbhash.open(filename(index, 'id'),'c')
try:
import libxml2
except:
libxml2 = False
from xml.dom import minidom
for file in glob(cache+"/*"):
if os.path.isdir(file):
continue
elif libxml2:
try:
doc = libxml2.parseFile(file)
ctxt = doc.xpathNewContext()
ctxt.xpathRegisterNs('atom','http://www.w3.org/2005/Atom')
entry = ctxt.xpathEval('/atom:entry/atom:id')
source = ctxt.xpathEval('/atom:entry/atom:source/atom:id')
if entry and source:
index[filename('',entry[0].content)] = source[0].content
doc.freeDoc()
except:
log.error(file)
else:
try:
doc = minidom.parse(file)
doc.normalize()
ids = doc.getElementsByTagName('id')
entry = [e for e in ids if e.parentNode.nodeName == 'entry']
source = [e for e in ids if e.parentNode.nodeName == 'source']
if entry and source:
index[filename('',entry[0].childNodes[0].nodeValue)] = \
source[0].childNodes[0].nodeValue
doc.freeDoc()
except:
log.error(file)
log.info(str(len(index.keys())) + " entries indexed")
index.close()
return open()
if __name__ == '__main__':
if len(sys.argv) < 2:
print 'Usage: %s [-c|-d]' % sys.argv[0]
sys.exit(1)
config.load(sys.argv[1])
if len(sys.argv) > 2 and sys.argv[2] == '-c':
create()
elif len(sys.argv) > 2 and sys.argv[2] == '-d':
destroy()
else:
from planet import logger as log
index = open()
if index:
log.info(str(len(index.keys())) + " entries indexed")
index.close()
else:
log.info("no entries indexed")

View File

@ -48,6 +48,10 @@ class OpmlParser(ContentHandler,SGMLParser):
# this is an entry in a subscription list, but some leave this # this is an entry in a subscription list, but some leave this
# attribute off, and others have placed 'atom' in here # attribute off, and others have placed 'atom' in here
if attrs.has_key('type'): if attrs.has_key('type'):
if attrs['type'] == 'link' and not attrs.has_key('url'):
# Auto-correct WordPress link manager OPML files
attrs = dict(attrs.items())
attrs['type'] = 'rss'
if attrs['type'].lower() not in['rss','atom']: return if attrs['type'].lower() not in['rss','atom']: return
# The feed itself is supposed to be in an attribute named 'xmlUrl' # The feed itself is supposed to be in an attribute named 'xmlUrl'

View File

@ -25,7 +25,11 @@ illegal_xml_chars = re.compile("[\x01-\x08\x0B\x0C\x0E-\x1F]")
def createTextElement(parent, name, value): def createTextElement(parent, name, value):
""" utility function to create a child element with the specified text""" """ utility function to create a child element with the specified text"""
if not value: return if not value: return
if isinstance(value,str): value=value.decode('utf-8') if isinstance(value,str):
try:
value=value.decode('utf-8')
except:
value=value.decode('iso-8859-1')
xdoc = parent.ownerDocument xdoc = parent.ownerDocument
xelement = xdoc.createElement(name) xelement = xdoc.createElement(name)
xelement.appendChild(xdoc.createTextNode(value)) xelement.appendChild(xdoc.createTextNode(value))
@ -100,6 +104,8 @@ def links(xentry, entry):
xlink.setAttribute('type', link.get('type')) xlink.setAttribute('type', link.get('type'))
if link.has_key('rel'): if link.has_key('rel'):
xlink.setAttribute('rel', link.get('rel',None)) xlink.setAttribute('rel', link.get('rel',None))
if link.has_key('length'):
xlink.setAttribute('length', link.get('length'))
xentry.appendChild(xlink) xentry.appendChild(xlink)
def date(xentry, name, parsed): def date(xentry, name, parsed):
@ -157,7 +163,7 @@ def content(xentry, name, detail, bozo):
xcontent.setAttribute('type', 'html') xcontent.setAttribute('type', 'html')
xcontent.appendChild(xdoc.createTextNode(detail.value.decode('utf-8'))) xcontent.appendChild(xdoc.createTextNode(detail.value.decode('utf-8')))
if detail.language: if detail.get("language"):
xcontent.setAttribute('xml:lang', detail.language) xcontent.setAttribute('xml:lang', detail.language)
xentry.appendChild(xcontent) xentry.appendChild(xcontent)
@ -170,13 +176,13 @@ def source(xsource, source, bozo, format):
createTextElement(xsource, 'icon', source.get('icon', None)) createTextElement(xsource, 'icon', source.get('icon', None))
createTextElement(xsource, 'logo', source.get('logo', None)) createTextElement(xsource, 'logo', source.get('logo', None))
if not source.has_key('logo') and source.has_key('image'):
createTextElement(xsource, 'logo', source.image.get('href',None))
for tag in source.get('tags',[]): for tag in source.get('tags',[]):
category(xsource, tag) category(xsource, tag)
author_detail = source.get('author_detail',{}) author(xsource, 'author', source.get('author_detail',{}))
if not author_detail.has_key('name') and source.has_key('planet_name'):
author_detail['name'] = source['planet_name']
author(xsource, 'author', author_detail)
for contributor in source.get('contributors',[]): for contributor in source.get('contributors',[]):
author(xsource, 'contributor', contributor) author(xsource, 'contributor', contributor)
@ -204,6 +210,8 @@ def reconstitute(feed, entry):
if entry.has_key('language'): if entry.has_key('language'):
xentry.setAttribute('xml:lang', entry.language) xentry.setAttribute('xml:lang', entry.language)
elif feed.feed.has_key('language'):
xentry.setAttribute('xml:lang', feed.feed.language)
id(xentry, entry) id(xentry, entry)
links(xentry, entry) links(xentry, entry)
@ -217,18 +225,46 @@ def reconstitute(feed, entry):
content(xentry, 'content', entry.get('content',[None])[0], bozo) content(xentry, 'content', entry.get('content',[None])[0], bozo)
content(xentry, 'rights', entry.get('rights_detail',None), bozo) content(xentry, 'rights', entry.get('rights_detail',None), bozo)
date(xentry, 'updated', entry.get('updated_parsed',time.gmtime())) date(xentry, 'updated', entry_updated(feed.feed, entry, time.gmtime()))
date(xentry, 'published', entry.get('published_parsed',None)) date(xentry, 'published', entry.get('published_parsed',None))
for tag in entry.get('tags',[]): for tag in entry.get('tags',[]):
category(xentry, tag) category(xentry, tag)
author(xentry, 'author', entry.get('author_detail',None)) # known, simple text extensions
for ns,name in [('feedburner','origlink')]:
if entry.has_key('%s_%s' % (ns,name)) and \
feed.namespaces.has_key(ns):
xoriglink = createTextElement(xentry, '%s:%s' % (ns,name),
entry['%s_%s' % (ns,name)])
xoriglink.setAttribute('xmlns:%s' % ns, feed.namespaces[ns])
author_detail = entry.get('author_detail',{})
if author_detail and not author_detail.has_key('name') and \
feed.feed.has_key('planet_name'):
author_detail['name'] = feed.feed['planet_name']
author(xentry, 'author', author_detail)
for contributor in entry.get('contributors',[]): for contributor in entry.get('contributors',[]):
author(xentry, 'contributor', contributor) author(xentry, 'contributor', contributor)
xsource = xdoc.createElement('source') xsource = xdoc.createElement('source')
source(xsource, entry.get('source') or feed.feed, bozo, feed.version) src = entry.get('source') or feed.feed
src_author = src.get('author_detail',{})
if (not author_detail or not author_detail.has_key('name')) and \
not src_author.has_key('name') and feed.feed.has_key('planet_name'):
if src_author: src_author = src_author.__class__(src_author.copy())
src['author_detail'] = src_author
src_author['name'] = feed.feed['planet_name']
source(xsource, src, bozo, feed.version)
xentry.appendChild(xsource) xentry.appendChild(xsource)
return xdoc return xdoc
def entry_updated(feed, entry, default = None):
chks = ((entry, 'updated_parsed'),
(entry, 'published_parsed'),
(feed, 'updated_parsed'),)
for node, field in chks:
if node.has_key(field) and node[field]:
return node[field]
return default

View File

@ -6,13 +6,21 @@ logged_modes = []
def run(template_file, doc, mode='template'): def run(template_file, doc, mode='template'):
""" select a template module based on file extension and execute it """ """ select a template module based on file extension and execute it """
log = planet.getLogger(planet.config.log_level()) log = planet.getLogger(planet.config.log_level(),planet.config.log_format())
if mode == 'template': if mode == 'template':
dirs = planet.config.template_directories() dirs = planet.config.template_directories()
else: else:
dirs = planet.config.filter_directories() dirs = planet.config.filter_directories()
# parse out "extra" options
if template_file.find('?') < 0:
extra_options = {}
else:
import cgi
template_file, extra_options = template_file.split('?',1)
extra_options = dict(cgi.parse_qsl(extra_options))
# see if the template can be located # see if the template can be located
for template_dir in dirs: for template_dir in dirs:
template_resolved = os.path.join(template_dir, template_file) template_resolved = os.path.join(template_dir, template_file)
@ -43,6 +51,7 @@ def run(template_file, doc, mode='template'):
# Execute the shell module # Execute the shell module
options = planet.config.template_options(template_file) options = planet.config.template_options(template_file)
options.update(extra_options)
log.debug("Processing %s %s using %s", mode, log.debug("Processing %s %s using %s", mode,
os.path.realpath(template_resolved), module_name) os.path.realpath(template_resolved), module_name)
if mode == 'filter': if mode == 'filter':

View File

@ -97,6 +97,9 @@ Items = [
['date_822', Rfc822, 'updated_parsed'], ['date_822', Rfc822, 'updated_parsed'],
['date_iso', Rfc3399, 'published_parsed'], ['date_iso', Rfc3399, 'published_parsed'],
['date_iso', Rfc3399, 'updated_parsed'], ['date_iso', Rfc3399, 'updated_parsed'],
['enclosure_href', String, 'links', {'rel': 'enclosure'}, 'href'],
['enclosure_length', String, 'links', {'rel': 'enclosure'}, 'length'],
['enclosure_type', String, 'links', {'rel': 'enclosure'}, 'type'],
['id', String, 'id'], ['id', String, 'id'],
['link', String, 'links', {'rel': 'alternate'}, 'href'], ['link', String, 'links', {'rel': 'alternate'}, 'href'],
['new_channel', String, 'id'], ['new_channel', String, 'id'],
@ -190,6 +193,13 @@ def template_info(source):
for entry in data.entries: for entry in data.entries:
output['Items'].append(tmpl_mapper(entry, Items)) output['Items'].append(tmpl_mapper(entry, Items))
# synthesize isPermaLink attribute
for item in output['Items']:
if item.get('id') == item.get('link'):
item['guid_isPermaLink']='true'
else:
item['guid_isPermaLink']='false'
# feed level information # feed level information
output['generator'] = config.generator_uri() output['generator'] = config.generator_uri()
output['name'] = config.name() output['name'] = config.name()

View File

@ -1,5 +1,19 @@
import os import os
def quote(string, apos):
""" quote a string so that it can be passed as a parameter """
if type(string) == unicode:
string=string.encode('utf-8')
if apos.startswith("\\"): string.replace('\\','\\\\')
if string.find("'")<0:
return "'" + string + "'"
elif string.find("'")<0:
return '"' + string + '"'
else:
# unclear how to quote strings with both types of quotes for libxslt
return "'" + string.replace("'",apos) + "'"
def run(script, doc, output_file=None, options={}): def run(script, doc, output_file=None, options={}):
""" process an XSLT stylesheet """ """ process an XSLT stylesheet """
@ -12,6 +26,22 @@ def run(script, doc, output_file=None, options={}):
except: except:
# otherwise, use the command line interface # otherwise, use the command line interface
dom = None dom = None
# do it
result = None
if dom:
styledoc = libxml2.parseFile(script)
style = libxslt.parseStylesheetDoc(styledoc)
for key in options.keys():
options[key] = quote(options[key], apos="\xe2\x80\x99")
output = style.applyStylesheet(dom, options)
if output_file:
style.saveResultToFilename(output_file, output, 0)
else:
result = str(output)
style.freeStylesheet()
output.freeDoc()
elif output_file:
import warnings import warnings
if hasattr(warnings, 'simplefilter'): if hasattr(warnings, 'simplefilter'):
warnings.simplefilter('ignore', RuntimeWarning) warnings.simplefilter('ignore', RuntimeWarning)
@ -20,16 +50,28 @@ def run(script, doc, output_file=None, options={}):
file.write(doc) file.write(doc)
file.close() file.close()
# do it cmdopts = []
if dom: for key,value in options.items():
styledoc = libxml2.parseFile(script) cmdopts += ['--stringparam', key, quote(value, apos=r"\'")]
style = libxslt.parseStylesheetDoc(styledoc)
result = style.applyStylesheet(dom, None) os.system('xsltproc %s %s %s > %s' %
style.saveResultToFilename(output_file, result, 0) (' '.join(cmdopts), script, docfile, output_file))
style.freeStylesheet() os.unlink(docfile)
result.freeDoc()
else: else:
os.system('xsltproc %s %s > %s' % (script, docfile, output_file)) import sys
from subprocess import Popen, PIPE
options = sum([['--stringparam', key, value]
for key,value in options.items()], [])
proc = Popen(['xsltproc'] + options + [script, '-'],
stdin=PIPE, stdout=PIPE, stderr=PIPE)
result, stderr = proc.communicate(doc)
if stderr:
import planet
planet.logger.error(stderr)
if dom: dom.freeDoc() if dom: dom.freeDoc()
if docfile: os.unlink(docfile)
return result

View File

@ -11,10 +11,12 @@ import planet, config, feedparser, reconstitute, shell
# Regular expressions to sanitise cache filenames # Regular expressions to sanitise cache filenames
re_url_scheme = re.compile(r'^\w+:/*(\w+:|www\.)?') re_url_scheme = re.compile(r'^\w+:/*(\w+:|www\.)?')
re_slash = re.compile(r'[?/:]+') re_slash = re.compile(r'[?/:|]+')
re_initial_cruft = re.compile(r'^[,.]*') re_initial_cruft = re.compile(r'^[,.]*')
re_final_cruft = re.compile(r'[,.]*$') re_final_cruft = re.compile(r'[,.]*$')
index = True
def filename(directory, filename): def filename(directory, filename):
"""Return a filename suitable for the cache. """Return a filename suitable for the cache.
@ -29,6 +31,8 @@ def filename(directory, filename):
filename=filename.encode('idna') filename=filename.encode('idna')
except: except:
pass pass
if isinstance(filename,unicode):
filename=filename.encode('utf-8')
filename = re_url_scheme.sub("", filename) filename = re_url_scheme.sub("", filename)
filename = re_slash.sub(",", filename) filename = re_slash.sub(",", filename)
filename = re_initial_cruft.sub("", filename) filename = re_initial_cruft.sub("", filename)
@ -59,10 +63,16 @@ def scrub(feed, data):
# some data is not trustworthy # some data is not trustworthy
for tag in config.ignore_in_feed(feed).split(): for tag in config.ignore_in_feed(feed).split():
if tag.find('lang')>=0: tag='language'
if data.feed.has_key(tag): del data.feed[tag]
for entry in data.entries: for entry in data.entries:
if entry.has_key(tag): del entry[tag] if entry.has_key(tag): del entry[tag]
if entry.has_key(tag + "_detail"): del entry[tag + "_detail"] if entry.has_key(tag + "_detail"): del entry[tag + "_detail"]
if entry.has_key(tag + "_parsed"): del entry[tag + "_parsed"] if entry.has_key(tag + "_parsed"): del entry[tag + "_parsed"]
for key in entry.keys():
if not key.endswith('_detail'): continue
for detail in entry[key].copy():
if detail == tag: del entry[key][detail]
# adjust title types # adjust title types
if config.title_type(feed): if config.title_type(feed):
@ -107,15 +117,22 @@ def scrub(feed, data):
source.author_detail['name'] = \ source.author_detail['name'] = \
str(stripHtml(source.author_detail.name)) str(stripHtml(source.author_detail.name))
def spiderFeed(feed): def spiderFeed(feed, only_if_new=0):
""" Spider (fetch) a single feed """ """ Spider (fetch) a single feed """
log = planet.logger log = planet.logger
# read cached feed info # read cached feed info
sources = config.cache_sources_directory() sources = config.cache_sources_directory()
if not os.path.exists(sources):
os.makedirs(sources, 0700)
feed_source = filename(sources, feed) feed_source = filename(sources, feed)
feed_info = feedparser.parse(feed_source) feed_info = feedparser.parse(feed_source)
if feed_info.feed.get('planet_http_status',None) == '410': return if feed_info.feed and only_if_new:
log.info("Feed %s already in cache", feed)
return
if feed_info.feed.get('planet_http_status',None) == '410':
log.info("Feed %s gone", feed)
return
# read feed itself # read feed itself
modified = None modified = None
@ -142,6 +159,10 @@ def spiderFeed(feed):
# process based on the HTTP status code # process based on the HTTP status code
if data.status == 200 and data.has_key("url"): if data.status == 200 and data.has_key("url"):
data.feed['planet_http_location'] = data.url data.feed['planet_http_location'] = data.url
if feed == data.url:
log.info("Updating feed %s", feed)
else:
log.info("Updating feed %s @ %s", feed, data.url)
elif data.status == 301 and data.has_key("entries") and len(data.entries)>0: elif data.status == 301 and data.has_key("entries") and len(data.entries)>0:
log.warning("Feed has moved from <%s> to <%s>", feed, data.url) log.warning("Feed has moved from <%s> to <%s>", feed, data.url)
data.feed['planet_http_location'] = data.url data.feed['planet_http_location'] = data.url
@ -171,6 +192,7 @@ def spiderFeed(feed):
if not data.version and feed_info.version: if not data.version and feed_info.version:
data.feed = feed_info.feed data.feed = feed_info.feed
data.bozo = feed_info.feed.get('planet_bozo','true') == 'true' data.bozo = feed_info.feed.get('planet_bozo','true') == 'true'
data.version = feed_info.feed.get('planet_format')
data.feed['planet_http_status'] = str(data.status) data.feed['planet_http_status'] = str(data.status)
# capture etag and last-modified information # capture etag and last-modified information
@ -184,18 +206,28 @@ def spiderFeed(feed):
data.feed['planet_http_last_modified']) data.feed['planet_http_last_modified'])
# capture feed and data from the planet configuration file # capture feed and data from the planet configuration file
if not data.feed.has_key('links'): data.feed['links'] = list() if data.version:
for link in data.feed.links: if not data.feed.has_key('links'): data.feed['links'] = list()
if link.rel == 'self': break feedtype = 'application/atom+xml'
else: if data.version.startswith('rss'): feedtype = 'application/rss+xml'
data.feed.links.append(feedparser.FeedParserDict( if data.version in ['rss090','rss10']: feedtype = 'application/rdf+xml'
{'rel':'self', 'type':'application/atom+xml', 'href':feed})) for link in data.feed.links:
if link.rel == 'self':
link['type'] = feedtype
break
else:
data.feed.links.append(feedparser.FeedParserDict(
{'rel':'self', 'type':feedtype, 'href':feed}))
for name, value in config.feed_options(feed).items(): for name, value in config.feed_options(feed).items():
data.feed['planet_'+name] = value data.feed['planet_'+name] = value
# perform user configured scrub operations on the data # perform user configured scrub operations on the data
scrub(feed, data) scrub(feed, data)
from planet import idindex
global index
if index != None: index = idindex.open()
# write each entry to the cache # write each entry to the cache
cache = config.cache_directory() cache = config.cache_directory()
for entry in data.entries: for entry in data.entries:
@ -211,16 +243,20 @@ def spiderFeed(feed):
mtime = None mtime = None
if not entry.has_key('updated_parsed'): if not entry.has_key('updated_parsed'):
if entry.has_key('published_parsed'): if entry.has_key('published_parsed'):
entry['updated_parsed'] = entry.published_parsed entry['updated_parsed'] = entry['published_parsed']
if entry.has_key('updated_parsed'): if not entry.has_key('updated_parsed'):
mtime = calendar.timegm(entry.updated_parsed) try:
if mtime > time.time(): mtime = None mtime = calendar.timegm(entry.updated_parsed)
except:
pass
if not mtime: if not mtime:
try: try:
mtime = os.stat(cache_file).st_mtime mtime = os.stat(cache_file).st_mtime
except: except:
mtime = time.time() if data.feed.has_key('updated_parsed'):
entry['updated_parsed'] = time.gmtime(mtime) mtime = calendar.timegm(data.feed.updated_parsed)
if not mtime or mtime > time.time(): mtime = time.time()
entry['updated_parsed'] = time.gmtime(mtime)
# apply any filters # apply any filters
xdoc = reconstitute.reconstitute(data, entry) xdoc = reconstitute.reconstitute(data, entry)
@ -228,12 +264,22 @@ def spiderFeed(feed):
xdoc.unlink() xdoc.unlink()
for filter in config.filters(feed): for filter in config.filters(feed):
output = shell.run(filter, output, mode="filter") output = shell.run(filter, output, mode="filter")
if not output: return if not output: break
if not output: continue
# write out and timestamp the results # write out and timestamp the results
write(output, cache_file) write(output, cache_file)
os.utime(cache_file, (mtime, mtime)) os.utime(cache_file, (mtime, mtime))
# optionally index
if index != None:
feedid = data.feed.get('id', data.feed.get('link',None))
if feedid:
if type(feedid) == unicode: feedid = feedid.encode('utf-8')
index[filename('', entry.id)] = feedid
if index: index.close()
# identify inactive feeds # identify inactive feeds
if config.activity_threshold(feed): if config.activity_threshold(feed):
updated = [entry.updated_parsed for entry in data.entries updated = [entry.updated_parsed for entry in data.entries
@ -254,6 +300,8 @@ def spiderFeed(feed):
# report channel level errors # report channel level errors
if data.status == 226: if data.status == 226:
if data.feed.has_key('planet_message'): del data.feed['planet_message'] if data.feed.has_key('planet_message'): del data.feed['planet_message']
if feed_info.feed.has_key('planet_updated'):
data.feed['planet_updated'] = feed_info.feed['planet_updated']
elif data.status == 403: elif data.status == 403:
data.feed['planet_message'] = "403: forbidden" data.feed['planet_message'] = "403: forbidden"
elif data.status == 404: elif data.status == 404:
@ -275,14 +323,17 @@ def spiderFeed(feed):
write(xdoc.toxml('utf-8'), filename(sources, feed)) write(xdoc.toxml('utf-8'), filename(sources, feed))
xdoc.unlink() xdoc.unlink()
def spiderPlanet(): def spiderPlanet(only_if_new = False):
""" Spider (fetch) an entire planet """ """ Spider (fetch) an entire planet """
log = planet.getLogger(config.log_level()) log = planet.getLogger(config.log_level(),config.log_format())
planet.setTimeout(config.feed_timeout()) planet.setTimeout(config.feed_timeout())
global index
index = True
for feed in config.subscriptions(): for feed in config.subscriptions():
try: try:
spiderFeed(feed) spiderFeed(feed, only_if_new=only_if_new)
except Exception,e: except Exception,e:
import sys, traceback import sys, traceback
type, value, tb = sys.exc_info() type, value, tb = sys.exc_info()

View File

@ -4,11 +4,12 @@ from xml.dom import minidom
import planet, config, feedparser, reconstitute, shell import planet, config, feedparser, reconstitute, shell
from reconstitute import createTextElement, date from reconstitute import createTextElement, date
from spider import filename from spider import filename
from planet import idindex
def splice(): def splice():
""" Splice together a planet from a cache of entries """ """ Splice together a planet from a cache of entries """
import planet import planet
log = planet.getLogger(config.log_level()) log = planet.getLogger(config.log_level(),config.log_format())
log.info("Loading cached data") log.info("Loading cached data")
cache = config.cache_directory() cache = config.cache_directory()
@ -62,9 +63,15 @@ def splice():
reconstitute.source(xdoc.documentElement, data.feed, None, None) reconstitute.source(xdoc.documentElement, data.feed, None, None)
feed.appendChild(xdoc.documentElement) feed.appendChild(xdoc.documentElement)
index = idindex.open()
# insert entry information # insert entry information
items = 0 items = 0
for mtime,file in dir: for mtime,file in dir:
if index:
base = file.split('/')[-1]
if index.has_key(base) and index[base] not in sub_ids: continue
try: try:
entry=minidom.parse(file) entry=minidom.parse(file)
@ -83,12 +90,14 @@ def splice():
except: except:
log.error("Error parsing %s", file) log.error("Error parsing %s", file)
if index: index.close()
return doc return doc
def apply(doc): def apply(doc):
output_dir = config.output_dir() output_dir = config.output_dir()
if not os.path.exists(output_dir): os.makedirs(output_dir) if not os.path.exists(output_dir): os.makedirs(output_dir)
log = planet.getLogger(config.log_level()) log = planet.getLogger(config.log_level(),config.log_format())
# Go-go-gadget-template # Go-go-gadget-template
for template_file in config.template_files(): for template_file in config.template_files():

View File

@ -23,7 +23,7 @@ modules = map(fullmodname, glob.glob(os.path.join('tests', 'test_*.py')))
# enable warnings # enable warnings
import planet import planet
planet.getLogger("WARNING") planet.getLogger("WARNING",None)
# load all of the tests into a suite # load all of the tests into a suite
try: try:
@ -33,5 +33,11 @@ except Exception, exception:
for module in modules: __import__(module) for module in modules: __import__(module)
raise raise
verbosity = 1
if "-q" in sys.argv or '--quiet' in sys.argv:
verbosity = 0
if "-v" in sys.argv or '--verbose' in sys.argv:
verbosity = 2
# run test suite # run test suite
unittest.TextTestRunner().run(suite) unittest.TextTestRunner(verbosity=verbosity).run(suite)

View File

@ -18,9 +18,10 @@ os.chdir(sys.path[0])
# copy spider output to splice input # copy spider output to splice input
import planet import planet
from planet import spider, config from planet import spider, config
planet.getLogger('CRITICAL') planet.getLogger('CRITICAL',None)
spider.spiderPlanet('tests/data/spider/config.ini') config.load('tests/data/spider/config.ini')
spider.spiderPlanet()
if os.path.exists('tests/data/splice/cache'): if os.path.exists('tests/data/splice/cache'):
shutil.rmtree('tests/data/splice/cache') shutil.rmtree('tests/data/splice/cache')
shutil.move('tests/work/spider/cache', 'tests/data/splice/cache') shutil.move('tests/work/spider/cache', 'tests/data/splice/cache')
@ -31,7 +32,7 @@ dest1.write(source.read().replace('/work/spider/', '/data/splice/'))
dest1.close() dest1.close()
source.seek(0) source.seek(0)
dest2=open('tests/data/apply/config.ini', 'w') dest2=open('tests/work/apply_config.ini', 'w')
dest2.write(source.read().replace('[Planet]', '''[Planet] dest2.write(source.read().replace('[Planet]', '''[Planet]
output_theme = asf output_theme = asf
output_dir = tests/work/apply''')) output_dir = tests/work/apply'''))
@ -41,12 +42,13 @@ source.close()
# copy splice output to apply input # copy splice output to apply input
from planet import splice from planet import splice
file=open('tests/data/apply/feed.xml', 'w') file=open('tests/data/apply/feed.xml', 'w')
data=splice.splice('tests/data/splice/config.ini').toxml('utf-8') config.load('tests/data/splice/config.ini')
data=splice.splice().toxml('utf-8')
file.write(data) file.write(data)
file.close() file.close()
# copy apply output to config/reading-list input # copy apply output to config/reading-list input
config.load('tests/data/apply/config.ini') config.load('tests/work/apply_config.ini')
splice.apply(data) splice.apply(data)
shutil.move('tests/work/apply/opml.xml', 'tests/data/config') shutil.move('tests/work/apply/opml.xml', 'tests/data/config')

File diff suppressed because one or more lines are too long

View File

@ -1,8 +1,8 @@
<?xml version="1.0"?> <?xml version="1.0"?>
<opml xmlns="http://www.w3.org/1999/xhtml" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/" version="1.1"> <opml version="1.1">
<head> <head>
<title>test planet</title> <title>test planet</title>
<dateModified>August 25, 2006 01:41 PM</dateModified> <dateModified>October 14, 2006 01:02 PM</dateModified>
<ownerName>Anonymous Coward</ownerName> <ownerName>Anonymous Coward</ownerName>
<ownerEmail></ownerEmail> <ownerEmail></ownerEmail>
</head> </head>

View File

@ -0,0 +1,2 @@
[Planet]
filters = excerpt.py?omit=img

View File

@ -0,0 +1,11 @@
<!--
Description: link relationship
Expect: Items[0]['enclosure_href'] == 'http://example.com/music.mp3'
-->
<feed xmlns="http://www.w3.org/2005/Atom">
<entry>
<link rel="enclosure" href="http://example.com/music.mp3"/>
</entry>
</feed>

View File

@ -0,0 +1,11 @@
<!--
Description: link relationship
Expect: Items[0]['enclosure_length'] == '100'
-->
<feed xmlns="http://www.w3.org/2005/Atom">
<entry>
<link rel="enclosure" length="100"/>
</entry>
</feed>

View File

@ -0,0 +1,11 @@
<!--
Description: link relationship
Expect: Items[0]['enclosure_type'] == 'audio/mpeg'
-->
<feed xmlns="http://www.w3.org/2005/Atom">
<entry>
<link rel="enclosure" type="audio/mpeg"/>
</entry>
</feed>

View File

@ -0,0 +1,7 @@
[Planet]
filters = translate.xslt
filter_directories = tests/data/filter
[translate.xslt]
in = aeiou
out = AEIOU

View File

@ -0,0 +1,20 @@
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:param name="in"/>
<xsl:param name="out"/>
<!-- translate $in characters to $out in attribute values -->
<xsl:template match="@*">
<xsl:attribute name="{name()}">
<xsl:value-of select="translate(.,$in,$out)"/>
</xsl:attribute>
</xsl:template>
<!-- pass through everything else -->
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

View File

@ -0,0 +1,2 @@
[Planet]
filters = xpath_sifter.py?require=//atom%3Acategory%5B%40term%3D%27two%27%5D

View File

@ -0,0 +1,40 @@
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:planet="http://planet.intertwingly.net/"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns="http://www.w3.org/1999/xhtml">
<!-- indent atom and planet elements -->
<xsl:template match="atom:*|planet:*">
<!-- double space before atom:entries and planet:source -->
<xsl:if test="self::atom:entry | self::planet:source">
<xsl:text>&#10;</xsl:text>
</xsl:if>
<!-- indent start tag -->
<xsl:text>&#10;</xsl:text>
<xsl:for-each select="ancestor::*">
<xsl:text> </xsl:text>
</xsl:for-each>
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
<!-- indent end tag if there are element children -->
<xsl:if test="*">
<xsl:text>&#10;</xsl:text>
<xsl:for-each select="ancestor::*">
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:if>
</xsl:copy>
</xsl:template>
<!-- pass through everything else -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

View File

@ -0,0 +1,13 @@
<!--
Description: enclosure
Expect: links[0].rel == 'enclosure' and id == 'http://example.com/1'
-->
<rss>
<channel>
<item>
<enclosure href="http://example.com/1"/>
</item>
</channel>
</rss>

View File

@ -0,0 +1,12 @@
<!--
Description: feedburner origlink relationship
Expect: feedburner_origlink == 'http://example.com/1'
-->
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
<entry>
<feedburner:origlink>http://example.com/1</feedburner:origlink>
</entry>
</feed>

View File

@ -0,0 +1,11 @@
<!--
Description: link relationship
Expect: links[0].length == '4000000'
-->
<feed xmlns="http://www.w3.org/2005/Atom">
<entry>
<link rel="enclosure" href="http://example.com/music.mp3" length="4000000"/>
</entry>
</feed>

View File

@ -0,0 +1,14 @@
<!--
Description: if item pubdate is missing, use to channel level date
Expect: updated_parsed == (2006, 6, 21, 13, 16, 41, 2, 172, 0)
-->
<rss version="0.91">
<channel>
<pubDate>Wed, 21 Jun 2006 14:16:41 +0100</pubDate>
<item/>
</channel>
</rss>

View File

@ -0,0 +1,12 @@
<!--
Description: logo
Expect: source.logo == 'http://example.com/logo.jpg'
-->
<rss version="2.0">
<channel>
<image><url>http://example.com/logo.jpg</url></image>
<item/>
</channel>
</rss>

View File

@ -0,0 +1,14 @@
<!--
Description: link relationship
Expect: title_detail.language == 'en'
-->
<rss version="2.0">
<channel>
<language>en</language>
<item>
<title>foo</title>
</item>
</channel>
</rss>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>http://example.com/3</id><link href="http://example.com/3" rel="alternate" type="text/html"/><title>Earth</title><summary>the Blue Planet</summary><updated planet:format="January 03, 2006 12:00 AM">2006-01-03T00:00:00Z</updated><source><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><planet:name>three</planet:name><planet:http_status>200</planet:http_status></source></entry> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>http://example.com/3</id><link href="http://example.com/3" rel="alternate" type="text/html"/><title>Earth</title><summary>the Blue Planet</summary><updated planet:format="January 03, 2006 12:00 AM">2006-01-03T00:00:00Z</updated><source><id>http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss</id><author><name>three</name></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><planet:format>rss20</planet:format><planet:name>three</planet:name><planet:bozo>true</planet:bozo><planet:http_status>200</planet:http_status></source></entry>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>http://example.com/4</id><link href="http://example.com/4" rel="alternate" type="text/html"/><title>Mars</title><summary>the Red Planet</summary><updated planet:format="August 25, 2006 01:41 PM">2006-08-25T13:41:22Z</updated><source><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><planet:name>three</planet:name><planet:http_status>200</planet:http_status></source></entry> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>http://example.com/4</id><link href="http://example.com/4" rel="alternate" type="text/html"/><title>Mars</title><summary>the Red Planet</summary><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><source><id>http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss</id><author><name>three</name></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><planet:format>rss20</planet:format><planet:name>three</planet:name><planet:bozo>true</planet:bozo><planet:http_status>200</planet:http_status></source></entry>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/1</id><link href="http://example.com/1" rel="alternate" type="text/html"/><title>Mercury</title><content>Messenger of the Roman Gods</content><updated planet:format="January 01, 2006 12:00 AM">2006-01-01T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/1</id><link href="http://example.com/1" rel="alternate" type="text/html"/><title>Mercury</title><content>Messenger of the Roman Gods</content><updated planet:format="January 01, 2006 12:00 AM">2006-01-01T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/2</id><link href="http://example.com/2" rel="alternate" type="text/html"/><title>Venus</title><content>the Jewel of the Sky</content><updated planet:format="February 02, 2006 12:00 AM">2006-02-02T00:00:00Z</updated><published planet:format="January 02, 2006 12:00 AM">2006-01-02T00:00:00Z</published><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/2</id><link href="http://example.com/2" rel="alternate" type="text/html"/><title>Venus</title><content>the Jewel of the Sky</content><updated planet:format="February 02, 2006 12:00 AM">2006-02-02T00:00:00Z</updated><published planet:format="January 02, 2006 12:00 AM">2006-01-02T00:00:00Z</published><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/3</id><link href="http://example.com/3" rel="alternate" type="text/html"/><title>Earth</title><content>the Blue Planet</content><updated planet:format="January 03, 2006 12:00 AM">2006-01-03T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/3</id><link href="http://example.com/3" rel="alternate" type="text/html"/><title>Earth</title><content>the Blue Planet</content><updated planet:format="January 03, 2006 12:00 AM">2006-01-03T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/4</id><link href="http://example.com/4" rel="alternate" type="text/html"/><title>Mars</title><content>the Red Planet</content><updated planet:format="January 04, 2006 12:00 AM">2006-01-04T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/4</id><link href="http://example.com/4" rel="alternate" type="text/html"/><title>Mars</title><content>the Red Planet</content><updated planet:format="January 04, 2006 12:00 AM">2006-01-04T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/1</id><link href="http://example.com/1" rel="alternate" type="text/html"/><title xml:lang="en-us">Mercury</title><content xml:lang="en-us">Messenger of the Roman Gods</content><updated planet:format="January 01, 2006 12:00 AM">2006-01-01T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/1</id><link href="http://example.com/1" rel="alternate" type="text/html"/><title xml:lang="en-us">Mercury</title><content xml:lang="en-us">Messenger of the Roman Gods</content><updated planet:format="January 01, 2006 12:00 AM">2006-01-01T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/2</id><link href="http://example.com/2" rel="alternate" type="text/html"/><title xml:lang="en-us">Venus</title><content xml:lang="en-us">the Morning Star</content><updated planet:format="January 02, 2006 12:00 AM">2006-01-02T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/2</id><link href="http://example.com/2" rel="alternate" type="text/html"/><title xml:lang="en-us">Venus</title><content xml:lang="en-us">the Morning Star</content><updated planet:format="January 02, 2006 12:00 AM">2006-01-02T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/3</id><link href="http://example.com/3" rel="alternate" type="text/html"/><title>Earth</title><content xml:lang="en-us">the Blue Planet</content><updated planet:format="January 03, 2006 12:00 AM">2006-01-03T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/3</id><link href="http://example.com/3" rel="alternate" type="text/html"/><title>Earth</title><content xml:lang="en-us">the Blue Planet</content><updated planet:format="January 03, 2006 12:00 AM">2006-01-03T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/4</id><link href="http://example.com/4" rel="alternate" type="text/html"/><title>Mars</title><content>the Red Planet</content><updated planet:format="January 04, 2006 12:00 AM">2006-01-04T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/4</id><link href="http://example.com/4" rel="alternate" type="text/html"/><title>Mars</title><content>the Red Planet</content><updated planet:format="January 04, 2006 12:00 AM">2006-01-04T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed3/1</id><link href="http://example.com/1" rel="alternate" type="text/html"/><title>Mercury</title><summary>Messenger of the Roman Gods</summary><updated planet:format="January 01, 2006 12:00 AM">2006-01-01T00:00:00Z</updated><source><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><planet:name>three</planet:name><planet:http_status>200</planet:http_status></source></entry> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed3/1</id><link href="http://example.com/1" rel="alternate" type="text/html"/><title>Mercury</title><summary>Messenger of the Roman Gods</summary><updated planet:format="January 01, 2006 12:00 AM">2006-01-01T00:00:00Z</updated><source><id>http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss</id><author><name>three</name></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><planet:format>rss20</planet:format><planet:name>three</planet:name><planet:bozo>true</planet:bozo><planet:http_status>200</planet:http_status></source></entry>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed3/2</id><link href="http://example.com/2" rel="alternate" type="text/html"/><title>Venus</title><summary>the Morning Star</summary><updated planet:format="August 25, 2006 01:41 PM">2006-08-25T13:41:22Z</updated><source><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><planet:name>three</planet:name><planet:http_status>200</planet:http_status></source></entry> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed3/2</id><link href="http://example.com/2" rel="alternate" type="text/html"/><title>Venus</title><summary>the Morning Star</summary><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><source><id>http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss</id><author><name>three</name></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><planet:format>rss20</planet:format><planet:name>three</planet:name><planet:bozo>true</planet:bozo><planet:http_status>200</planet:http_status></source></entry>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><link href="tests/data/spider/testfeed0.atom" rel="self" type="application/atom+xml"/><planet:name>not found</planet:name><planet:http_status>500</planet:http_status></feed> <feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><author><name>not found</name></author><link href="tests/data/spider/testfeed0.atom" rel="self" type="application/atom+xml"/><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><planet:message>internal server error</planet:message><planet:bozo>true</planet:bozo><planet:http_status>500</planet:http_status><planet:name>not found</planet:name></feed>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>one</planet:name><planet:http_status>200</planet:http_status></feed> <feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>one</planet:name><planet:http_status>200</planet:http_status></feed>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>two</planet:name><planet:http_status>200</planet:http_status></feed> <feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>two</planet:name><planet:http_status>200</planet:http_status></feed>

View File

@ -1,2 +1,2 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><planet:name>three</planet:name><planet:http_status>200</planet:http_status></feed> <feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss</id><author><name>three</name></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>Its just data</subtitle><title>Sam Ruby</title><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><planet:format>rss20</planet:format><planet:name>three</planet:name><planet:bozo>true</planet:bozo><planet:http_status>200</planet:http_status></feed>

87
tests/reconstitute.py Normal file
View File

@ -0,0 +1,87 @@
#!/usr/bin/env python
import os, sys, ConfigParser, shutil, glob
venus_base = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0,venus_base)
if __name__ == "__main__":
hide_planet_ns = True
while len(sys.argv) > 1:
if sys.argv[1] == '-v' or sys.argv[1] == '--verbose':
import planet
planet.getLogger('DEBUG',None)
del sys.argv[1]
elif sys.argv[1] == '-p' or sys.argv[1] == '--planet':
hide_planet_ns = False
del sys.argv[1]
else:
break
parser = ConfigParser.ConfigParser()
parser.add_section('Planet')
parser.add_section(sys.argv[1])
work = reduce(os.path.join, ['tests','work','reconsititute'], venus_base)
output = os.path.join(work, 'output')
filters = os.path.join(venus_base,'filters')
parser.set('Planet','cache_directory',work)
parser.set('Planet','output_dir',output)
parser.set('Planet','filter_directories',filters)
if hide_planet_ns:
parser.set('Planet','template_files','themes/common/atom.xml.xslt')
else:
parser.set('Planet','template_files','tests/data/reconstitute.xslt')
for name, value in zip(sys.argv[2::2],sys.argv[3::2]):
parser.set(sys.argv[1], name.lstrip('-'), value)
from planet import config
config.parser = parser
from planet import spider
spider.spiderPlanet(only_if_new=False)
from planet import feedparser
for source in glob.glob(os.path.join(work, 'sources/*')):
feed = feedparser.parse(source).feed
if feed.has_key('title'):
config.parser.set('Planet','name',feed.title_detail.value)
if feed.has_key('link'):
config.parser.set('Planet','link',feed.link)
if feed.has_key('author_detail'):
if feed.author_detail.has_key('name'):
config.parser.set('Planet','owner_name',feed.author_detail.name)
if feed.author_detail.has_key('email'):
config.parser.set('Planet','owner_email',feed.author_detail.email)
from planet import splice
doc = splice.splice()
sources = doc.getElementsByTagName('planet:source')
if hide_planet_ns and len(sources) == 1:
source = sources[0]
feed = source.parentNode
child = feed.firstChild
while child:
next = child.nextSibling
if child.nodeName not in ['planet:source','entry']:
feed.removeChild(child)
child = next
while source.hasChildNodes():
child = source.firstChild
source.removeChild(child)
feed.insertBefore(child, source)
for source in doc.getElementsByTagName('source'):
source.parentNode.removeChild(source)
splice.apply(doc.toxml('utf-8'))
if hide_planet_ns:
atom = open(os.path.join(output,'atom.xml')).read()
else:
atom = open(os.path.join(output,'reconstitute')).read()
shutil.rmtree(work)
os.removedirs(os.path.dirname(work))
print atom

28
tests/test_filter_xslt.py Normal file
View File

@ -0,0 +1,28 @@
#!/usr/bin/env python
import unittest, xml.dom.minidom
from planet import shell, config, logger
class XsltFilterTests(unittest.TestCase):
def test_xslt_filter(self):
config.load('tests/data/filter/translate.ini')
testfile = 'tests/data/filter/category-one.xml'
input = open(testfile).read()
output = shell.run(config.filters()[0], input, mode="filter")
dom = xml.dom.minidom.parseString(output)
catterm = dom.getElementsByTagName('category')[0].getAttribute('term')
self.assertEqual('OnE', catterm)
try:
import libxslt
except:
try:
from subprocess import Popen, PIPE
xsltproc=Popen(['xsltproc','--version'],stdout=PIPE,stderr=PIPE)
xsltproc.communicate()
if xsltproc.returncode != 0: raise ImportError
except:
logger.warn("libxslt is not available => can't test xslt filters")
del XsltFilterTests.test_xslt_filter

View File

@ -14,10 +14,16 @@ class FilterTests(unittest.TestCase):
imgsrc = dom.getElementsByTagName('img')[0].getAttribute('src') imgsrc = dom.getElementsByTagName('img')[0].getAttribute('src')
self.assertEqual('http://example.com.nyud.net:8080/foo.png', imgsrc) self.assertEqual('http://example.com.nyud.net:8080/foo.png', imgsrc)
def test_excerpt_images(self): def test_excerpt_images1(self):
testfile = 'tests/data/filter/excerpt-images.xml'
config.load('tests/data/filter/excerpt-images.ini') config.load('tests/data/filter/excerpt-images.ini')
self.verify_images()
def test_excerpt_images2(self):
config.load('tests/data/filter/excerpt-images2.ini')
self.verify_images()
def verify_images(self):
testfile = 'tests/data/filter/excerpt-images.xml'
output = open(testfile).read() output = open(testfile).read()
for filter in config.filters(): for filter in config.filters():
output = shell.run(filter, output, mode="filter") output = shell.run(filter, output, mode="filter")
@ -58,8 +64,15 @@ class FilterTests(unittest.TestCase):
self.assertEqual(u'before--after', self.assertEqual(u'before--after',
excerpt.firstChild.firstChild.nodeValue) excerpt.firstChild.firstChild.nodeValue)
def test_xpath_filter(self): def test_xpath_filter1(self):
config.load('tests/data/filter/xpath-sifter.ini') config.load('tests/data/filter/xpath-sifter.ini')
self.verify_xpath()
def test_xpath_filter2(self):
config.load('tests/data/filter/xpath-sifter2.ini')
self.verify_xpath()
def verify_xpath(self):
testfile = 'tests/data/filter/category-one.xml' testfile = 'tests/data/filter/category-one.xml'
output = open(testfile).read() output = open(testfile).read()
@ -89,9 +102,10 @@ try:
import libxml2 import libxml2
except: except:
logger.warn("libxml2 is not available => can't test xpath_sifter") logger.warn("libxml2 is not available => can't test xpath_sifter")
del FilterTests.test_xpath_filter del FilterTests.test_xpath_filter1
del FilterTests.test_xpath_filter2
except ImportError: except ImportError:
logger.warn("Popen is not available => can't test filters") logger.warn("Popen is not available => can't test standard filters")
for method in dir(FilterTests): for method in dir(FilterTests):
if method.startswith('test_'): delattr(FilterTests,method) if method.startswith('test_'): delattr(FilterTests,method)

74
tests/test_idindex.py Normal file
View File

@ -0,0 +1,74 @@
#!/usr/bin/env python
import unittest
from planet import idindex, config, logger
class idIndexTest(unittest.TestCase):
def setUp(self):
# silence errors
import planet
planet.logger = None
planet.getLogger('CRITICAL',None)
def tearDown(self):
idindex.destroy()
def test_unicode(self):
from planet.spider import filename
index = idindex.create()
iri = 'http://www.\xe8\xa9\xb9\xe5\xa7\x86\xe6\x96\xaf.com/'
index[filename('', iri)] = 'data'
index[filename('', iri.decode('utf-8'))] = 'data'
index[filename('', u'1234')] = 'data'
index.close()
def test_index_spider(self):
import test_spider
config.load(test_spider.configfile)
index = idindex.create()
self.assertEqual(0, len(index))
index.close()
from planet.spider import spiderPlanet
try:
spiderPlanet()
index = idindex.open()
self.assertEqual(12, len(index))
self.assertEqual('tag:planet.intertwingly.net,2006:testfeed1', index['planet.intertwingly.net,2006,testfeed1,1'])
self.assertEqual('http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss', index['planet.intertwingly.net,2006,testfeed3,1'])
index.close()
finally:
import os, shutil
shutil.rmtree(test_spider.workdir)
os.removedirs(os.path.split(test_spider.workdir)[0])
def test_index_splice(self):
import test_splice
config.load(test_splice.configfile)
index = idindex.create()
self.assertEqual(12, len(index))
self.assertEqual('tag:planet.intertwingly.net,2006:testfeed1', index['planet.intertwingly.net,2006,testfeed1,1'])
self.assertEqual('http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss', index['planet.intertwingly.net,2006,testfeed3,1'])
for key in index.keys():
value = index[key]
if value.find('testfeed2')>0: index[key] = value.swapcase()
index.close()
from planet.splice import splice
doc = splice()
self.assertEqual(8,len(doc.getElementsByTagName('entry')))
self.assertEqual(4,len(doc.getElementsByTagName('planet:source')))
self.assertEqual(12,len(doc.getElementsByTagName('planet:name')))
try:
module = 'dbhash'
except ImportError:
logger.warn("dbhash is not available => can't test id index")
for method in dir(idIndexTest):
if method.startswith('test_'): delattr(idIndexTest,method)

View File

@ -76,6 +76,14 @@ class OpmlTest(unittest.TestCase):
text="sample feed"/>''', self.config) text="sample feed"/>''', self.config)
self.assertFalse(self.config.has_section("http://example.com/feed.xml")) self.assertFalse(self.config.has_section("http://example.com/feed.xml"))
def test_WordPress_link_manager(self):
# http://www.wasab.dk/morten/blog/archives/2006/10/22/wp-venus
opml2config('''<outline type="link"
xmlUrl="http://example.com/feed.xml"
text="sample feed"/>''', self.config)
self.assertEqual('sample feed',
self.config.get("http://example.com/feed.xml", 'name'))
# #
# xmlUrl # xmlUrl
# #

View File

@ -7,7 +7,7 @@ from planet import feedparser, config
feed = ''' feed = '''
<feed xmlns='http://www.w3.org/2005/Atom'> <feed xmlns='http://www.w3.org/2005/Atom'>
<author><name>F&amp;ouml;o</name></author> <author><name>F&amp;ouml;o</name></author>
<entry> <entry xml:lang="en">
<id>ignoreme</id> <id>ignoreme</id>
<author><name>F&amp;ouml;o</name></author> <author><name>F&amp;ouml;o</name></author>
<updated>2000-01-01T00:00:00Z</updated> <updated>2000-01-01T00:00:00Z</updated>
@ -23,7 +23,7 @@ feed = '''
configData = ''' configData = '''
[testfeed] [testfeed]
ignore_in_feed = id updated ignore_in_feed = id updated xml:lang
name_type = html name_type = html
title_type = html title_type = html
summary_type = html summary_type = html
@ -40,12 +40,14 @@ class ScrubTest(unittest.TestCase):
self.assertTrue(data.entries[0].has_key('id')) self.assertTrue(data.entries[0].has_key('id'))
self.assertTrue(data.entries[0].has_key('updated')) self.assertTrue(data.entries[0].has_key('updated'))
self.assertTrue(data.entries[0].has_key('updated_parsed')) self.assertTrue(data.entries[0].has_key('updated_parsed'))
self.assertTrue(data.entries[0].summary_detail.has_key('language'))
scrub('testfeed', data) scrub('testfeed', data)
self.assertFalse(data.entries[0].has_key('id')) self.assertFalse(data.entries[0].has_key('id'))
self.assertFalse(data.entries[0].has_key('updated')) self.assertFalse(data.entries[0].has_key('updated'))
self.assertFalse(data.entries[0].has_key('updated_parsed')) self.assertFalse(data.entries[0].has_key('updated_parsed'))
self.assertFalse(data.entries[0].summary_detail.has_key('language'))
self.assertEqual('F\xc3\xb6o', data.feed.author_detail.name) self.assertEqual('F\xc3\xb6o', data.feed.author_detail.name)
self.assertEqual('F\xc3\xb6o', data.entries[0].author_detail.name) self.assertEqual('F\xc3\xb6o', data.entries[0].author_detail.name)

View File

@ -13,7 +13,7 @@ class SpiderTest(unittest.TestCase):
def setUp(self): def setUp(self):
# silence errors # silence errors
planet.logger = None planet.logger = None
planet.getLogger('CRITICAL') planet.getLogger('CRITICAL',None)
try: try:
os.makedirs(workdir) os.makedirs(workdir)
@ -58,6 +58,8 @@ class SpiderTest(unittest.TestCase):
# verify that the file timestamps match atom:updated # verify that the file timestamps match atom:updated
data = feedparser.parse(files[2]) data = feedparser.parse(files[2])
self.assertEqual(['application/atom+xml'], [link.type
for link in data.entries[0].source.links if link.rel=='self'])
self.assertEqual('one', data.entries[0].source.planet_name) self.assertEqual('one', data.entries[0].source.planet_name)
self.assertEqual(os.stat(files[2]).st_mtime, self.assertEqual(os.stat(files[2]).st_mtime,
calendar.timegm(data.entries[0].updated_parsed)) calendar.timegm(data.entries[0].updated_parsed))
@ -82,5 +84,7 @@ class SpiderTest(unittest.TestCase):
data = feedparser.parse(workdir + data = feedparser.parse(workdir +
'/planet.intertwingly.net,2006,testfeed3,1') '/planet.intertwingly.net,2006,testfeed3,1')
self.assertEqual(['application/rss+xml'], [link.type
for link in data.entries[0].source.links if link.rel=='self'])
self.assertEqual('three', data.entries[0].source.author_detail.name) self.assertEqual('three', data.entries[0].source.author_detail.name)

View File

@ -4,7 +4,7 @@ import unittest
from planet import config from planet import config
from os.path import split from os.path import split
class ConfigTest(unittest.TestCase): class ThemesTest(unittest.TestCase):
def setUp(self): def setUp(self):
config.load('tests/data/config/themed.ini') config.load('tests/data/config/themed.ini')
@ -17,7 +17,8 @@ class ConfigTest(unittest.TestCase):
# administrivia # administrivia
def test_template(self): def test_template(self):
self.assertTrue('index.html.xslt' in config.template_files()) self.assertEqual(1, len([1 for file in config.template_files()
if file == 'index.html.xslt']))
def test_feeds(self): def test_feeds(self):
feeds = config.subscriptions() feeds = config.subscriptions()

View File

@ -7,6 +7,7 @@ template_files:
foafroll.xml.xslt foafroll.xml.xslt
index.html.xslt index.html.xslt
opml.xml.xslt opml.xml.xslt
validate.html.xslt
template_directories: template_directories:
../common ../common

View File

@ -56,6 +56,7 @@
</xsl:choose> </xsl:choose>
<img src="images/feed-icon-10x10.png" alt="(feed)"/> <img src="images/feed-icon-10x10.png" alt="(feed)"/>
</a> </a>
<xsl:text> </xsl:text>
<!-- name --> <!-- name -->
<a href="{atom:link[@rel='alternate']/@href}"> <a href="{atom:link[@rel='alternate']/@href}">
@ -153,7 +154,9 @@
<img src="{atom:source/atom:icon}" class="icon"/> <img src="{atom:source/atom:icon}" class="icon"/>
</xsl:if> </xsl:if>
<a href="{atom:source/atom:link[@rel='alternate']/@href}"> <a href="{atom:source/atom:link[@rel='alternate']/@href}">
<xsl:attribute name="title" select="{atom:source/atom:title}"/> <xsl:attribute name="title">
<xsl:value-of select="atom:source/atom:title"/>
</xsl:attribute>
<xsl:value-of select="atom:source/planet:name"/> <xsl:value-of select="atom:source/planet:name"/>
</a> </a>
<xsl:if test="string-length(atom:title) &gt; 0"> <xsl:if test="string-length(atom:title) &gt; 0">
@ -236,6 +239,9 @@
<!-- Feedburner detritus --> <!-- Feedburner detritus -->
<xsl:template match="xhtml:div[@class='feedflare']"/> <xsl:template match="xhtml:div[@class='feedflare']"/>
<!-- Strip site meter -->
<xsl:template match="xhtml:div[comment()[. = ' Site Meter ']]"/>
<!-- pass through everything else --> <!-- pass through everything else -->
<xsl:template match="@*|node()"> <xsl:template match="@*|node()">
<xsl:copy> <xsl:copy>

View File

@ -14,14 +14,18 @@
<xsl:template match="atom:link[@rel='service.post']"/> <xsl:template match="atom:link[@rel='service.post']"/>
<xsl:template match="atom:link[@rel='service.feed']"/> <xsl:template match="atom:link[@rel='service.feed']"/>
<!-- Feedburner detritus --> <!-- Feedburner detritus -->
<xsl:template match="xhtml:div[@class='feedflare']"/> <xsl:template match="xhtml:div[@class='feedflare']"/>
<!-- Strip site meter -->
<xsl:template match="xhtml:div[comment()[. = ' Site Meter ']]"/>
<!-- add Google/LiveJournal-esque noindex directive --> <!-- add Google/LiveJournal-esque noindex directive -->
<xsl:template match="atom:feed"> <xsl:template match="atom:feed">
<xsl:copy> <xsl:copy>
<xsl:attribute name="indexing:index">no</xsl:attribute> <xsl:attribute name="indexing:index">no</xsl:attribute>
<xsl:apply-templates select="@*|node()"/> <xsl:apply-templates select="@*|node()"/>
<xsl:text>&#10;</xsl:text>
</xsl:copy> </xsl:copy>
</xsl:template> </xsl:template>

View File

@ -10,7 +10,7 @@
<TMPL_LOOP Items> <TMPL_LOOP Items>
<item> <item>
<title><TMPL_VAR channel_name ESCAPE="HTML"><TMPL_IF title>: <TMPL_VAR title_plain ESCAPE="HTML"></TMPL_IF></title> <title><TMPL_VAR channel_name ESCAPE="HTML"><TMPL_IF title>: <TMPL_VAR title_plain ESCAPE="HTML"></TMPL_IF></title>
<guid><TMPL_VAR id ESCAPE="HTML"></guid> <guid isPermaLink="<TMPL_VAR guid_isPermaLink>"><TMPL_VAR id ESCAPE="HTML"></guid>
<link><TMPL_VAR link ESCAPE="HTML"></link> <link><TMPL_VAR link ESCAPE="HTML"></link>
<TMPL_IF content> <TMPL_IF content>
<description><TMPL_VAR content ESCAPE="HTML"></description> <description><TMPL_VAR content ESCAPE="HTML"></description>
@ -23,6 +23,9 @@
<author><TMPL_VAR author_email></author> <author><TMPL_VAR author_email></author>
</TMPL_IF> </TMPL_IF>
</TMPL_IF> </TMPL_IF>
<TMPL_IF enclosure_href>
<enclosure url="<TMPL_VAR enclosure_href ESCAPE="HTML">" length="<TMPL_VAR enclosure_length>" type="<TMPL_VAR enclosure_type>"/>
</TMPL_IF>
</item> </item>
</TMPL_LOOP> </TMPL_LOOP>

View File

@ -0,0 +1,146 @@
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:planet="http://planet.intertwingly.net/"
xmlns="http://www.w3.org/1999/xhtml">
<xsl:template match="atom:feed">
<html xmlns="http://www.w3.org/1999/xhtml">
<!-- head -->
<xsl:text>&#10;&#10;</xsl:text>
<head>
<title><xsl:value-of select="atom:title"/></title>
<meta name="robots" content="noindex,nofollow" />
<meta name="generator" content="{atom:generator}" />
<link rel="shortcut icon" href="/favicon.ico" />
<style type="text/css">
img{border:0}
a{text-decoration:none}
a:hover{text-decoration:underline}
.message{border-bottom:1px dashed red} a.message:hover{cursor: help;text-decoration: none}
dl{margin:0}
dt{float:left;width:9em}
dt:after{content:':'}
</style>
</head>
<!-- body -->
<xsl:text>&#10;&#10;</xsl:text>
<body>
<table border="1" cellpadding="3" cellspacing="0">
<thead>
<tr>
<th></th>
<th>Name</th>
<th>Format</th>
<xsl:if test="//planet:ignore_in_feed | //planet:filters |
//planet:*[contains(local-name(),'_type')]">
<th>Notes</th>
</xsl:if>
</tr>
</thead>
<xsl:apply-templates select="planet:source">
<xsl:sort select="planet:name"/>
</xsl:apply-templates>
<xsl:text>&#10;</xsl:text>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="planet:source">
<xsl:variable name="validome_format">
<xsl:choose>
<xsl:when test="planet:format = 'rss090'">rss_0_90</xsl:when>
<xsl:when test="planet:format = 'rss091n'">rss_0_91</xsl:when>
<xsl:when test="planet:format = 'rss091u'">rss_0_91</xsl:when>
<xsl:when test="planet:format = 'rss10'">rss_1_0</xsl:when>
<xsl:when test="planet:format = 'rss092'">rss_0_90</xsl:when>
<xsl:when test="planet:format = 'rss093'"></xsl:when>
<xsl:when test="planet:format = 'rss094'">rss_0_90</xsl:when>
<xsl:when test="planet:format = 'rss20'">rss_2_0</xsl:when>
<xsl:when test="planet:format = 'rss'">rss_2_0</xsl:when>
<xsl:when test="planet:format = 'atom01'"></xsl:when>
<xsl:when test="planet:format = 'atom02'"></xsl:when>
<xsl:when test="planet:format = 'atom03'">atom_0_3</xsl:when>
<xsl:when test="planet:format = 'atom10'">atom_1_0</xsl:when>
<xsl:when test="planet:format = 'atom'">atom_1_0</xsl:when>
<xsl:when test="planet:format = 'cdf'"></xsl:when>
<xsl:when test="planet:format = 'hotrss'"></xsl:when>
</xsl:choose>
</xsl:variable>
<xsl:text>&#10;</xsl:text>
<tr>
<xsl:if test="planet:bozo='true'">
<xsl:attribute name="bgcolor">#FCC</xsl:attribute>
</xsl:if>
<td>
<a title="feed validator">
<xsl:attribute name="href">
<xsl:text>http://feedvalidator.org/check?url=</xsl:text>
<xsl:choose>
<xsl:when test="planet:http_location">
<xsl:value-of select="planet:http_location"/>
</xsl:when>
<xsl:when test="atom:link[@rel='self']/@href">
<xsl:value-of select="atom:link[@rel='self']/@href"/>
</xsl:when>
</xsl:choose>
</xsl:attribute>
<img src="http://feedvalidator.org/favicon.ico" hspace='2' vspace='1'/>
</a>
<a title="validome">
<xsl:attribute name="href">
<xsl:text>http://www.validome.org/rss-atom/validate?</xsl:text>
<xsl:text>viewSourceCode=1&amp;version=</xsl:text>
<xsl:value-of select="$validome_format"/>
<xsl:text>&amp;url=</xsl:text>
<xsl:choose>
<xsl:when test="planet:http_location">
<xsl:value-of select="planet:http_location"/>
</xsl:when>
<xsl:when test="atom:link[@rel='self']/@href">
<xsl:value-of select="atom:link[@rel='self']/@href"/>
</xsl:when>
</xsl:choose>
</xsl:attribute>
<img src="http://validome.org/favicon.ico" hspace='2' vspace='1'/>
</a>
</td>
<td>
<a href="{atom:link[@rel='alternate']/@href}">
<xsl:choose>
<xsl:when test="planet:message">
<xsl:attribute name="class">message</xsl:attribute>
<xsl:attribute name="title">
<xsl:value-of select="planet:message"/>
</xsl:attribute>
</xsl:when>
<xsl:when test="atom:title">
<xsl:attribute name="title">
<xsl:value-of select="atom:title"/>
</xsl:attribute>
</xsl:when>
</xsl:choose>
<xsl:value-of select="planet:name"/>
</a>
</td>
<td><xsl:value-of select="planet:format"/></td>
<xsl:if test="planet:ignore_in_feed | planet:filters |
planet:*[contains(local-name(),'_type')]">
<td>
<dl>
<xsl:for-each select="planet:ignore_in_feed | planet:filters |
planet:*[contains(local-name(),'_type')]">
<xsl:sort select="local-name()"/>
<dt><xsl:value-of select="local-name()"/></dt>
<dd><xsl:value-of select="."/></dd>
</xsl:for-each>
</dl>
</td>
</xsl:if>
</tr>
</xsl:template>
</xsl:stylesheet>

View File

@ -9,6 +9,7 @@ template_files:
index.html.xslt index.html.xslt
mobile.html.xslt mobile.html.xslt
opml.xml.xslt opml.xml.xslt
validate.html.xslt
template_directories: template_directories:
../asf ../asf