Misc changes from Sam.
This commit is contained in:
commit
838269ed8f
@ -1 +1,3 @@
|
|||||||
*.tmplc
|
*.tmplc
|
||||||
|
.DS_Store
|
||||||
|
cache
|
||||||
|
167
INSTALL
167
INSTALL
@ -1,167 +0,0 @@
|
|||||||
Installing Planet
|
|
||||||
-----------------
|
|
||||||
|
|
||||||
You'll need at least Python 2.2 installed on your system, we recommend
|
|
||||||
Python 2.4 though as there may be bugs with the earlier libraries.
|
|
||||||
|
|
||||||
Everything Pythonesque Planet need to provide basic operation should be
|
|
||||||
included in the distribution. Additionally:
|
|
||||||
* Usage of XSLT requires either xsltproc or python-libxslt.
|
|
||||||
* The current interface to filters written in non-templating languages
|
|
||||||
(e.g., python) uses the subprocess module which was introduced in
|
|
||||||
Python 2.4.
|
|
||||||
* Usage of FOAF as a reading list requires librdf.
|
|
||||||
|
|
||||||
Instructions:
|
|
||||||
|
|
||||||
i.
|
|
||||||
First you'll need to extract the files into a folder somewhere.
|
|
||||||
I expect you've already done this, after all, you're reading this
|
|
||||||
file. You can place this wherever you like, ~/planet is a good
|
|
||||||
choice, but so's anywhere else you prefer.
|
|
||||||
|
|
||||||
ii.
|
|
||||||
This is very important: from within that directory, type the following
|
|
||||||
command:
|
|
||||||
|
|
||||||
python runtests.py
|
|
||||||
|
|
||||||
This should take anywhere from a one to ten seconds to execute. No network
|
|
||||||
connection is required, and the script cleans up after itself. If the
|
|
||||||
script completes with an "OK", you are good to go. Otherwise stopping here
|
|
||||||
and inquiring on the mailing list is a good idea as it can save you lots of
|
|
||||||
frustration down the road.
|
|
||||||
|
|
||||||
iii.
|
|
||||||
Make a copy of one of the 'ini' the files in the 'examples' subdirectory,
|
|
||||||
and put them wherever you like; I like to use the Planet's name (so
|
|
||||||
~/planet/debian), but it's really up to you.
|
|
||||||
|
|
||||||
iv.
|
|
||||||
Edit the config.ini file in this directory to taste, it's pretty
|
|
||||||
well documented so you shouldn't have any problems here. Pay
|
|
||||||
particular attention to the 'output_dir' option, which should be
|
|
||||||
readable by your web server. If the directory you specify in your
|
|
||||||
'cache_dir' exists, make sure that it is empty.
|
|
||||||
|
|
||||||
v.
|
|
||||||
Run it: python planet.py pathto/config.ini
|
|
||||||
|
|
||||||
You'll want to add this to cron, make sure you run it from the
|
|
||||||
right directory.
|
|
||||||
|
|
||||||
vi. (Optional)
|
|
||||||
Tell us about it! We'd love to link to you on planetplanet.org :-)
|
|
||||||
|
|
||||||
vii. (Optional)
|
|
||||||
Build your own themes, templates, or filters! And share!
|
|
||||||
|
|
||||||
|
|
||||||
Template files
|
|
||||||
--------------
|
|
||||||
|
|
||||||
The template files used are given as a whitespace separated list in the
|
|
||||||
'template_files' option in config.ini. The extension at the end of the
|
|
||||||
file name indicates what processor to use. Templates may be implemented
|
|
||||||
using htmltmpl, xslt, or any programming language.
|
|
||||||
|
|
||||||
The final extension is removed to form the name of the file placed in the
|
|
||||||
output directory.
|
|
||||||
|
|
||||||
HtmlTmpl files
|
|
||||||
--------------
|
|
||||||
|
|
||||||
Reading through the example templates is recommended, they're designed to
|
|
||||||
pretty much drop straight into your site with little modification
|
|
||||||
anyway.
|
|
||||||
|
|
||||||
Inside these template files, <TMPL_VAR xxx> is replaced with the content
|
|
||||||
of the 'xxx' variable. The variables available are:
|
|
||||||
|
|
||||||
name .... } the value of the equivalent options
|
|
||||||
link .... } from the [Planet] section of your
|
|
||||||
owner_name . } Planet's config.ini file
|
|
||||||
owner_email }
|
|
||||||
|
|
||||||
url .... link with the output filename appended
|
|
||||||
generator .. version of planet being used
|
|
||||||
|
|
||||||
date .... { your date format
|
|
||||||
date_iso ... current date and time in { ISO date format
|
|
||||||
date_822 ... { RFC822 date format
|
|
||||||
|
|
||||||
|
|
||||||
There are also two loops, 'Items' and 'Channels'. All of the lines of
|
|
||||||
the template and variable substitutions are available for each item or
|
|
||||||
channel. Loops are created using <TMPL_LOOP LoopName>...</TMPL_LOOP>
|
|
||||||
and may be used as many times as you wish.
|
|
||||||
|
|
||||||
The 'Channels' loop iterates all of the channels (feeds) defined in the
|
|
||||||
configuration file, within it the following variables are available:
|
|
||||||
|
|
||||||
name .... value of the 'name' option in config.ini, or title
|
|
||||||
title .... title retreived from the channel's feed
|
|
||||||
tagline .... description retreived from the channel's feed
|
|
||||||
link .... link for the human-readable content (from the feed)
|
|
||||||
url .... url of the channel's feed itself
|
|
||||||
|
|
||||||
Additionally the value of any other option specified in config.ini
|
|
||||||
for the feed, or in the [DEFAULT] section, is available as a
|
|
||||||
variable of the same name.
|
|
||||||
|
|
||||||
Depending on the feed, there may be a huge variety of other
|
|
||||||
variables may be available; the best way to find out what you
|
|
||||||
have is using the 'planet-cache' tool to examine your cache files.
|
|
||||||
|
|
||||||
The 'Items' loop iterates all of the blog entries from all of the channels,
|
|
||||||
you do not place it inside a 'Channels' loop. Within it, the following
|
|
||||||
variables are available:
|
|
||||||
|
|
||||||
id .... unique id for this entry (sometimes just the link)
|
|
||||||
link .... link to a human-readable version at the origin site
|
|
||||||
|
|
||||||
title .... title of the entry
|
|
||||||
summary .... a short "first page" summary
|
|
||||||
content .... the full content of the entry
|
|
||||||
|
|
||||||
date .... { your date format
|
|
||||||
date_iso ... date and time of the entry in { ISO date format
|
|
||||||
date_822 ... { RFC822 date format
|
|
||||||
|
|
||||||
If the entry takes place on a date that has no prior entry has
|
|
||||||
taken place on, the 'new_date' variable is set to that date.
|
|
||||||
This allows you to break up the page by day.
|
|
||||||
|
|
||||||
If the entry is from a different channel to the previous entry,
|
|
||||||
or is the first entry from this channel on this day
|
|
||||||
the 'new_channel' variable is set to the same value as the
|
|
||||||
'channel_url' variable. This allows you to collate multiple
|
|
||||||
entries from the same person under the same banner.
|
|
||||||
|
|
||||||
Additionally the value of any variable that would be defined
|
|
||||||
for the channel is available, with 'channel_' prepended to the
|
|
||||||
name (e.g. 'channel_name' and 'channel_link').
|
|
||||||
|
|
||||||
Depending on the feed, there may be a huge variety of other
|
|
||||||
variables may be available; the best way to find out what you
|
|
||||||
have is using the 'planet-cache' tool to examine your cache files.
|
|
||||||
|
|
||||||
|
|
||||||
There are also a couple of other special things you can do in a template.
|
|
||||||
|
|
||||||
- If you want HTML escaping applied to the value of a variable, use the
|
|
||||||
<TMPL_VAR xxx ESCAPE="HTML"> form.
|
|
||||||
|
|
||||||
- If you want URI escaping applied to the value of a variable, use the
|
|
||||||
<TMPL_VAR xxx ESCAPE="URI"> form.
|
|
||||||
|
|
||||||
- To only include a section of the template if the variable has a
|
|
||||||
non-empty value, you can use <TMPL_IF xxx>....</TMPL_IF>. e.g.
|
|
||||||
|
|
||||||
<TMPL_IF new_date>
|
|
||||||
<h1><TMPL_VAR new_date></h1>
|
|
||||||
</TMPL_IF>
|
|
||||||
|
|
||||||
You may place a <TMPL_ELSE> within this block to specify an
|
|
||||||
alternative, or may use <TMPL_UNLESS xxx>...</TMPL_UNLESS> to
|
|
||||||
perform the opposite.
|
|
8
README
8
README
@ -9,11 +9,11 @@ also actively being maintained.
|
|||||||
|
|
||||||
It uses Mark Pilgrim's Universal Feed Parser to read from CDF, RDF, RSS and
|
It uses Mark Pilgrim's Universal Feed Parser to read from CDF, RDF, RSS and
|
||||||
Atom feeds; Leonard Richardson's Beautiful Soup to correct markup issues;
|
Atom feeds; Leonard Richardson's Beautiful Soup to correct markup issues;
|
||||||
and Tomas Styblo's templating engine to output static files in any
|
and either Tomas Styblo's templating engine Daniel Viellard's implementation
|
||||||
format you can dream up.
|
of XSLT to output static files in any format you can dream up.
|
||||||
|
|
||||||
To get started, check out the INSTALL file in this directory. If you have any
|
To get started, check out the documentation in the docs directory. If you have
|
||||||
questions or comments, please don't hesitate to use the planet mailing list:
|
any questions or comments, please don't hesitate to use the planet mailing list:
|
||||||
|
|
||||||
http://lists.planetplanet.org/mailman/listinfo/devel
|
http://lists.planetplanet.org/mailman/listinfo/devel
|
||||||
|
|
||||||
|
7
THANKS
7
THANKS
@ -4,6 +4,13 @@ Elias Torres - FOAF OnlineAccounts
|
|||||||
Jacques Distler - Template patches
|
Jacques Distler - Template patches
|
||||||
Michael Koziarski - HTTP Auth fix
|
Michael Koziarski - HTTP Auth fix
|
||||||
Brian Ewins - Win32 / Portalocker
|
Brian Ewins - Win32 / Portalocker
|
||||||
|
Joe Gregorio - Invoke same version of Python for filters
|
||||||
|
Harry Fuecks - Pipe characters in file names, filter bug
|
||||||
|
Eric van der Vlist - Filters to add language, category information
|
||||||
|
Chris Dolan - mkdir cache; default template_dirs; fix xsltproc
|
||||||
|
David Sifry - rss 2.0 xslt template based on http://atom.geekhood.net/
|
||||||
|
Morten Fredericksen - Support WordPress LinkManager OPML
|
||||||
|
Harry Fuecks - default item date to feed date
|
||||||
|
|
||||||
This codebase represents a radical refactoring of Planet 2.0, which lists
|
This codebase represents a radical refactoring of Planet 2.0, which lists
|
||||||
the following contributors:
|
the following contributors:
|
||||||
|
5
TODO
5
TODO
@ -1,11 +1,6 @@
|
|||||||
TODO
|
TODO
|
||||||
====
|
====
|
||||||
|
|
||||||
* Enable per-feed adjustments
|
|
||||||
|
|
||||||
The goal is to better cope with feeds that don't have dates or ids or
|
|
||||||
consitently encode or escape things incorrectly.
|
|
||||||
|
|
||||||
* Expire feed history
|
* Expire feed history
|
||||||
|
|
||||||
The feed cache doesn't currently expire old entries, so could get
|
The feed cache doesn't currently expire old entries, so could get
|
||||||
|
140
docs/config.html
Normal file
140
docs/config.html
Normal file
@ -0,0 +1,140 @@
|
|||||||
|
<!DOCTYPE html PUBLIC
|
||||||
|
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
|
||||||
|
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
<head>
|
||||||
|
<script type="text/javascript" src="docs.js"></script>
|
||||||
|
<link rel="stylesheet" type="text/css" href="docs.css"/>
|
||||||
|
<title>Venus Configuration</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
|
||||||
|
<h2>Configuration</h2>
|
||||||
|
<p>Configuration files are in <a href="http://docs.python.org/lib/module-
|
||||||
|
ConfigParser.html">ConfigParser</a> format which basically means the same
|
||||||
|
format as INI files, i.e., they consist of a series of
|
||||||
|
<code>[sections]</code>, in square brackets, with each section containing a
|
||||||
|
list of <code>name:value</code> pairs (or <code>name=value</code> pairs, if
|
||||||
|
you prefer).</p>
|
||||||
|
<p>You are welcome to place your entire configuration into one file.
|
||||||
|
Alternately, you may factor out the templating into a "theme", and
|
||||||
|
the list of subscriptions into one or more "reading lists".</p>
|
||||||
|
<h3 id="planet"><code>[planet]</code></h3>
|
||||||
|
<p>This is the only required section, which is a bit odd as none of the
|
||||||
|
parameters listed below are required. Even so, you really do want to
|
||||||
|
provide many of these, especially ones that identify your planet and
|
||||||
|
either (or both) of <code>template_files</code> and <code>theme</code>.</p>
|
||||||
|
<p>Below is a complete list of predefined planet configuration parameters,
|
||||||
|
including <del>ones not (yet) implemented by Venus</del> and <ins>ones that
|
||||||
|
are either new or implemented differently by Venus</ins>.</p>
|
||||||
|
|
||||||
|
<blockquote>
|
||||||
|
<dl class="compact code">
|
||||||
|
<dt>name</dt>
|
||||||
|
<dd>Your planet's name</dd>
|
||||||
|
<dt>link</dt>
|
||||||
|
<dd>Link to the main page</dd>
|
||||||
|
<dt>owner_name</dt>
|
||||||
|
<dd>Your name</dd>
|
||||||
|
<dt>owner_email</dt>
|
||||||
|
<dd>Your e-mail address</dd>
|
||||||
|
|
||||||
|
</dl>
|
||||||
|
<dl class="compact code">
|
||||||
|
|
||||||
|
<dt>cache_directory</dt>
|
||||||
|
<dd>Where cached feeds are stored</dd>
|
||||||
|
<dt>output_dir</dt>
|
||||||
|
<dd>Directory to place output files</dd>
|
||||||
|
|
||||||
|
</dl>
|
||||||
|
<dl class="compact code">
|
||||||
|
|
||||||
|
<dt><ins>output_theme</ins></dt>
|
||||||
|
<dd>Directory containing a <code>config.ini</code> file which is merged
|
||||||
|
with this one. This is typically used to specify templating and bill of
|
||||||
|
material information.</dd>
|
||||||
|
<dt>template_files</dt>
|
||||||
|
<dd>Space-separated list of output template files</dd>
|
||||||
|
<dt><ins>template_directories</ins></dt>
|
||||||
|
<dd>Space-separated list of directories in which <code>template_files</code>
|
||||||
|
can be found</dd>
|
||||||
|
<dt><ins>bill_of_materials</ins></dt>
|
||||||
|
<dd>Space-separated list of files to be copied as is directly from the <code>template_directories</code> to the <code>output_dir</code></dd>
|
||||||
|
<dt><ins>filters</ins></dt>
|
||||||
|
<dd>Space-separated list of filters to apply to each entry</dd>
|
||||||
|
|
||||||
|
</dl>
|
||||||
|
<dl class="compact code">
|
||||||
|
|
||||||
|
<dt>items_per_page</dt>
|
||||||
|
<dd>How many items to put on each page. <ins>Whereas Planet 2.0 allows this to
|
||||||
|
be overridden on a per template basis, Venus currently takes the maximum value
|
||||||
|
for this across all templates.</ins></dd>
|
||||||
|
<dt><del>days_per_page</del></dt>
|
||||||
|
<dd>How many complete days of posts to put on each page This is the absolute, hard limit (over the item limit)</dd>
|
||||||
|
<dt>date_format</dt>
|
||||||
|
<dd><a href="http://docs.python.org/lib/module-time.html#l2h-2816">strftime</a> format for the default 'date' template variable</dd>
|
||||||
|
<dt>new_date_format</dt>
|
||||||
|
<dd><a href="http://docs.python.org/lib/module-time.html#l2h-2816">strftime</a> format for the 'new_date' template variable <ins>only applies to htmltmpl templates</ins></dd>
|
||||||
|
<dt><del>encoding</del></dt>
|
||||||
|
<dd>Output encoding for the file, Python 2.3+ users can use the special "xml" value to output ASCII with XML character references</dd>
|
||||||
|
<dt><del>locale</del></dt>
|
||||||
|
<dd>Locale to use for (e.g.) strings in dates, default is taken from your system</dd>
|
||||||
|
<dt>activity_threshold</dt>
|
||||||
|
<dd>If non-zero, all feeds which have not been updated in the indicated
|
||||||
|
number of days will be marked as inactive</dd>
|
||||||
|
|
||||||
|
</dl>
|
||||||
|
<dl class="compact code">
|
||||||
|
|
||||||
|
<dt>log_level</dt>
|
||||||
|
<dd>One of <code>DEBUG</code>, <code>INFO</code>, <code>WARNING</code>, <code>ERROR</code> or <code>CRITICAL</code></dd>
|
||||||
|
<dt><ins>log_format</ins></dt>
|
||||||
|
<dd><a href="http://docs.python.org/lib/node422.html">format string</a> to
|
||||||
|
use for logging output. Note: this configuration value is processed
|
||||||
|
<a href="http://docs.python.org/lib/ConfigParser-objects.html">raw</a></dd>
|
||||||
|
<dt>feed_timeout</dt>
|
||||||
|
<dd>Number of seconds to wait for any given feed</dd>
|
||||||
|
<dt><del>new_feed_items</del></dt>
|
||||||
|
<dd>Number of items to take from new feeds</dd>
|
||||||
|
</dl>
|
||||||
|
</blockquote>
|
||||||
|
|
||||||
|
<h3 id="default"><code>[DEFAULT]</code></h3>
|
||||||
|
<p>Values placed in this section are used as default values for all sections.
|
||||||
|
While it is true that few values make sense in all sections; in most cases
|
||||||
|
unused parameters cause few problems.</p>
|
||||||
|
|
||||||
|
<h3 id="subscription"><code>[</code><em>subscription</em><code>]</code></h3>
|
||||||
|
<p>All sections other than <code>planet</code>, <code>DEFAULT</code>, or are
|
||||||
|
named in <code>[planet]</code>'s <code>filters</code> or
|
||||||
|
<code>templatefiles</code> parameters
|
||||||
|
are treated as subscriptions and typically take the form of a
|
||||||
|
<acronym title="Uniform Resource Identifier">URI</acronym>.</p>
|
||||||
|
<p>Parameters placed in this section are passed to templates. While
|
||||||
|
you are free to include as few or as many parameters as you like, most of
|
||||||
|
the predefined themes presume that at least <code>name</code> is defined.</p>
|
||||||
|
<p>The <code>content_type</code> parameter can be defined to indicate that
|
||||||
|
this subscription is a <em>reading list</em>, i.e., is an external list
|
||||||
|
of subscriptions. At the moment, two formats of reading lists are supported:
|
||||||
|
<code>opml</code> and <code>foaf</code>. In the future, support for formats
|
||||||
|
like <code>xoxo</code> could be added.</p>
|
||||||
|
<p><a href="normalization.html#overrides">Normalization overrides</a> can
|
||||||
|
also be defined here.</p>
|
||||||
|
|
||||||
|
<h3 id="template"><code>[</code><em>template</em><code>]</code></h3>
|
||||||
|
<p>Sections which are listed in <code>[planet] template_files</code> are
|
||||||
|
processed as <a href="templates.html">templates</a>. With Planet 2.0,
|
||||||
|
it is possible to override parameters like <code>items_per_page</code>
|
||||||
|
on a per template basis, but at the current time Planet Venus doesn't
|
||||||
|
implement this.</p>
|
||||||
|
|
||||||
|
<h3 id="filter"><code>[</code><em>filter</em><code>]</code></h3>
|
||||||
|
<p>Sections which are listed in <code>[planet] filters</code> are
|
||||||
|
processed as <a href="filters.html">filters</a>.</p>
|
||||||
|
<p>Parameters which are listed in this section are passed to the filter
|
||||||
|
in a language specific manner. Given the way defaults work, filters
|
||||||
|
should be prepared to ignore parameters that they didn't expect.</p>
|
||||||
|
</body>
|
||||||
|
</html>
|
104
docs/docs.css
Normal file
104
docs/docs.css
Normal file
@ -0,0 +1,104 @@
|
|||||||
|
body {
|
||||||
|
background-color: #fff;
|
||||||
|
color: #333;
|
||||||
|
font-family: 'Lucida Grande', Verdana, Geneva, Lucida, Helvetica, sans-serif;
|
||||||
|
font-size: small;
|
||||||
|
margin: 40px;
|
||||||
|
padding: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
a:link, a:visited {
|
||||||
|
background-color: transparent;
|
||||||
|
color: #333;
|
||||||
|
text-decoration: none !important;
|
||||||
|
border-bottom: 1px dotted #333 !important;
|
||||||
|
}
|
||||||
|
|
||||||
|
a:hover {
|
||||||
|
background-color: transparent;
|
||||||
|
color: #934;
|
||||||
|
text-decoration: none !important;
|
||||||
|
border-bottom: 1px dotted #993344 !important;
|
||||||
|
}
|
||||||
|
|
||||||
|
pre, code {
|
||||||
|
background-color: #FFF;
|
||||||
|
color: #00F;
|
||||||
|
font-size: large
|
||||||
|
}
|
||||||
|
|
||||||
|
h1 {
|
||||||
|
margin: 8px 0 10px 20px;
|
||||||
|
padding: 0;
|
||||||
|
font-variant: small-caps;
|
||||||
|
letter-spacing: 0.1em;
|
||||||
|
font-family: "Book Antiqua", Georgia, Palatino, Times, "Times New Roman", serif;
|
||||||
|
}
|
||||||
|
|
||||||
|
h2 {
|
||||||
|
clear: both;
|
||||||
|
}
|
||||||
|
|
||||||
|
ul, ul.outer > li {
|
||||||
|
margin: 14px 0 10px 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.z {
|
||||||
|
float:left;
|
||||||
|
background: url(img/shadowAlpha.png) no-repeat bottom right !important;
|
||||||
|
margin: -15px 0 20px -15px !important;
|
||||||
|
}
|
||||||
|
|
||||||
|
.z .logo {
|
||||||
|
color: magenta;
|
||||||
|
}
|
||||||
|
|
||||||
|
.z p {
|
||||||
|
margin: 14px 0 10px 15px !important;
|
||||||
|
}
|
||||||
|
|
||||||
|
.z .sectionInner {
|
||||||
|
width: 730px;
|
||||||
|
background: none !important;
|
||||||
|
padding: 0 !important;
|
||||||
|
}
|
||||||
|
|
||||||
|
.z .sectionInner .sectionInner2 {
|
||||||
|
border: 1px solid #a9a9a9;
|
||||||
|
padding: 4px;
|
||||||
|
margin: -6px 6px 6px -6px !important;
|
||||||
|
}
|
||||||
|
|
||||||
|
ins {
|
||||||
|
background-color: #FFF;
|
||||||
|
color: #F0F;
|
||||||
|
text-decoration: none;
|
||||||
|
}
|
||||||
|
|
||||||
|
dl.compact {
|
||||||
|
margin-bottom: 1em;
|
||||||
|
margin-top: 1em;
|
||||||
|
}
|
||||||
|
|
||||||
|
dl.code > dt {
|
||||||
|
font-family: monospace;
|
||||||
|
font-size: large;
|
||||||
|
}
|
||||||
|
|
||||||
|
dl.compact > dt {
|
||||||
|
float: left;
|
||||||
|
margin-bottom: 0;
|
||||||
|
padding-right: 8px;
|
||||||
|
margin-top: 0;
|
||||||
|
list-style-type: none;
|
||||||
|
}
|
||||||
|
|
||||||
|
dl.compact > dd {
|
||||||
|
margin-bottom: 0;
|
||||||
|
margin-top: 0;
|
||||||
|
margin-left: 10em;
|
||||||
|
}
|
||||||
|
|
||||||
|
th, td {
|
||||||
|
font-size: small;
|
||||||
|
}
|
54
docs/docs.js
Normal file
54
docs/docs.js
Normal file
@ -0,0 +1,54 @@
|
|||||||
|
window.onload=function() {
|
||||||
|
var vindex = document.URL.lastIndexOf('venus/');
|
||||||
|
if (vindex<0) vindex = document.URL.lastIndexOf('planet/');
|
||||||
|
var base = document.URL.substring(0,vindex+6);
|
||||||
|
|
||||||
|
var body = document.getElementsByTagName('body')[0];
|
||||||
|
var div = document.createElement('div');
|
||||||
|
div.setAttribute('class','z');
|
||||||
|
var h1 = document.createElement('h1');
|
||||||
|
var span = document.createElement('span');
|
||||||
|
span.appendChild(document.createTextNode('\u2640'));
|
||||||
|
span.setAttribute('class','logo');
|
||||||
|
h1.appendChild(span);
|
||||||
|
h1.appendChild(document.createTextNode(' Planet Venus'));
|
||||||
|
|
||||||
|
var inner2=document.createElement('div');
|
||||||
|
inner2.setAttribute('class','sectionInner2');
|
||||||
|
inner2.appendChild(h1);
|
||||||
|
|
||||||
|
var p = document.createElement('p');
|
||||||
|
p.appendChild(document.createTextNode("Planet Venus is an awesome \u2018river of news\u2019 feed reader. It downloads news feeds published by web sites and aggregates their content together into a single combined feed, latest news first."));
|
||||||
|
inner2.appendChild(p);
|
||||||
|
|
||||||
|
p = document.createElement('p');
|
||||||
|
var a = document.createElement('a');
|
||||||
|
a.setAttribute('href',base+'index.html');
|
||||||
|
a.appendChild(document.createTextNode('Download'));
|
||||||
|
p.appendChild(a);
|
||||||
|
p.appendChild(document.createTextNode(" \u00b7 "));
|
||||||
|
a = document.createElement('a');
|
||||||
|
a.setAttribute('href',base+'docs/index.html');
|
||||||
|
a.appendChild(document.createTextNode('Documentation'));
|
||||||
|
p.appendChild(a);
|
||||||
|
p.appendChild(document.createTextNode(" \u00b7 "));
|
||||||
|
a = document.createElement('a');
|
||||||
|
a.setAttribute('href',base+'tests/');
|
||||||
|
a.appendChild(document.createTextNode('Unit tests'));
|
||||||
|
p.appendChild(a);
|
||||||
|
p.appendChild(document.createTextNode(" \u00b7 "));
|
||||||
|
a = document.createElement('a');
|
||||||
|
a.setAttribute('href','http://lists.planetplanet.org/mailman/listinfo/devel');
|
||||||
|
a.appendChild(document.createTextNode('Mailing list'));
|
||||||
|
p.appendChild(a);
|
||||||
|
inner2.appendChild(p);
|
||||||
|
|
||||||
|
var inner1=document.createElement('div');
|
||||||
|
inner1.setAttribute('class','sectionInner');
|
||||||
|
inner1.setAttribute('id','inner1');
|
||||||
|
inner1.appendChild(inner2);
|
||||||
|
|
||||||
|
div.appendChild(inner1);
|
||||||
|
|
||||||
|
body.insertBefore(div, body.firstChild);
|
||||||
|
}
|
71
docs/filters.html
Normal file
71
docs/filters.html
Normal file
@ -0,0 +1,71 @@
|
|||||||
|
<!DOCTYPE html PUBLIC
|
||||||
|
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
|
||||||
|
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
<head>
|
||||||
|
<script type="text/javascript" src="docs.js"></script>
|
||||||
|
<link rel="stylesheet" type="text/css" href="docs.css"/>
|
||||||
|
<title>Venus Filters</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h2>Filters</h2>
|
||||||
|
<p>Filters are simple Unix pipes. Input comes in <code>stdin</code>,
|
||||||
|
parameters come from the config file, and output goes to <code>stdout</code>.
|
||||||
|
Anything written to <code>stderr</code> is logged as an ERROR message. If no
|
||||||
|
<code>stdout</code> is produced, the entry is not written to the cache or
|
||||||
|
processed further.</p>
|
||||||
|
|
||||||
|
<p>Input to a filter is a aggressively
|
||||||
|
<a href="normalization.html">normalized</a> entry. For
|
||||||
|
example, if a feed is RSS 1.0 with 10 items, the filter will be called ten
|
||||||
|
times, each with a single Atom 1.0 entry, with all textConstructs
|
||||||
|
expressed as XHTML, and everything encoded as UTF-8.</p>
|
||||||
|
|
||||||
|
<p>You will find a small set of example filters in the <a
|
||||||
|
href="../filters">filters</a> directory. The <a
|
||||||
|
href="../filters/coral_cdn_filter.py">coral cdn filter</a> will change links
|
||||||
|
to images in the entry itself. The filters in the <a
|
||||||
|
href="../filters/stripAd/">stripAd</a> subdirectory will strip specific
|
||||||
|
types of advertisements that you may find in feeds.</p>
|
||||||
|
|
||||||
|
<p>The <a href="../filters/excerpt.py">excerpt</a> filter adds metadata (in
|
||||||
|
the form of a <code>planet:excerpt</code> element) to the feed itself. You
|
||||||
|
can see examples of how parameters are passed to this program in either
|
||||||
|
<a href="../tests/data/filter/excerpt-images.ini">excerpt-images</a> or
|
||||||
|
<a href="../examples/opml-top100.ini">opml-top100.ini</a>.
|
||||||
|
Alternately parameters may be passed
|
||||||
|
<abbr title="Uniform Resource Identifier">URI</abbr> style, for example:
|
||||||
|
<a href="../tests/data/filter/excerpt-images2.ini">excerpt-images2</a>.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>The <a href="../filters/xpath_sifter.py">xpath sifter</a> is a variation of
|
||||||
|
the above, including or excluding feeds based on the presence (or absence) of
|
||||||
|
data specified by <a href="http://www.w3.org/TR/xpath20/">xpath</a>
|
||||||
|
expressions. Again, parameters can be passed as
|
||||||
|
<a href="../tests/data/filter/xpath-sifter.ini">config options</a> or
|
||||||
|
<a href="../tests/data/filter/xpath-sifter2.ini">URI style</a>.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>Notes</h3>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
|
||||||
|
<li>The file extension of the filter is significant. <code>.py</code> invokes
|
||||||
|
python. <code>.xslt</code> involkes XSLT. <code>.sed</code> and
|
||||||
|
<code>.tmpl</code> (a.k.a. htmltmp) are also options. Other languages, like
|
||||||
|
perl or ruby or class/jar (java), aren't supported at the moment, but these
|
||||||
|
would be easy to add.</li>
|
||||||
|
|
||||||
|
<li>Any filters listed in the <code>[planet]</code> section of your config.ini
|
||||||
|
will be invoked on all feeds. Filters listed in individual
|
||||||
|
<code>[feed]</code> sections will only be invoked on those feeds.</li>
|
||||||
|
|
||||||
|
<li>Filters are simply invoked in the order they are listed in the
|
||||||
|
configuration file (think unix pipes). Planet wide filters are executed before
|
||||||
|
feed specific filters.</li>
|
||||||
|
|
||||||
|
<li>Templates written using htmltmpl currently only have access to a fixed set
|
||||||
|
of fields, whereas XSLT templates have access to everything.</li>
|
||||||
|
</ul>
|
||||||
|
</body>
|
||||||
|
</html>
|
BIN
docs/img/shadowAlpha.png
Normal file
BIN
docs/img/shadowAlpha.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 3.3 KiB |
51
docs/index.html
Normal file
51
docs/index.html
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
<!DOCTYPE html PUBLIC
|
||||||
|
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
|
||||||
|
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
<head>
|
||||||
|
<script type="text/javascript" src="docs.js"></script>
|
||||||
|
<link rel="stylesheet" type="text/css" href="docs.css"/>
|
||||||
|
<title>Venus Documentation</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h2>Table of Contents</h2>
|
||||||
|
<ul class="outer">
|
||||||
|
<li><a href="installation.html">Getting started</a></li>
|
||||||
|
<li>Basic Features
|
||||||
|
<ul>
|
||||||
|
<li><a href="config.html">Configuration</a></li>
|
||||||
|
<li><a href="templates.html">Templates</a></li>
|
||||||
|
</ul>
|
||||||
|
</li>
|
||||||
|
<li>Advanced Features
|
||||||
|
<ul>
|
||||||
|
<li><a href="venus.svg">Architecture</a></li>
|
||||||
|
<li><a href="normalization.html">Normalization</a></li>
|
||||||
|
<li><a href="filters.html">Filters</a></li>
|
||||||
|
</ul>
|
||||||
|
</li>
|
||||||
|
<li>Other
|
||||||
|
<ul>
|
||||||
|
<li><a href="migration.html">Migration from Planet 2.0</a></li>
|
||||||
|
</ul>
|
||||||
|
</li>
|
||||||
|
<li>Reference
|
||||||
|
<ul>
|
||||||
|
<li><a href="http://www.planetplanet.org/">Planet</a></li>
|
||||||
|
<li><a href="http://feedparser.org/docs/">Universal Feed Parser</a></li>
|
||||||
|
<li><a href="http://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a></li>
|
||||||
|
<li><a href="http://htmltmpl.sourceforge.net/">htmltmpl</a></li>
|
||||||
|
<li><a href="http://www.w3.org/TR/xslt">XSLT</a></li>
|
||||||
|
<li><a href="http://www.gnu.org/software/sed/manual/html_mono/sed.html">sed</a></li>
|
||||||
|
</ul>
|
||||||
|
</li>
|
||||||
|
<li>Credits and License
|
||||||
|
<ul>
|
||||||
|
<li><a href="../AUTHORS">Authors</a></li>
|
||||||
|
<li><a href="../THANKS">Contributors</a></li>
|
||||||
|
<li><a href="../LICENCE">License</a></li>
|
||||||
|
</ul>
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
</body>
|
||||||
|
</html>
|
112
docs/installation.html
Normal file
112
docs/installation.html
Normal file
@ -0,0 +1,112 @@
|
|||||||
|
<!DOCTYPE html PUBLIC
|
||||||
|
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
|
||||||
|
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
<head>
|
||||||
|
<script type="text/javascript" src="docs.js"></script>
|
||||||
|
<link rel="stylesheet" type="text/css" href="docs.css"/>
|
||||||
|
<title>Venus Installation</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h2>Installation</h2>
|
||||||
|
<p>Venus has been tested on Linux, and Mac OSX, and Windows.</p>
|
||||||
|
|
||||||
|
<p>You'll need at least Python 2.2 installed on your system, we recommend
|
||||||
|
Python 2.4 though as there may be bugs with the earlier libraries.</p>
|
||||||
|
|
||||||
|
<p>Everything Pythonesque Planet needs to provide basic operation should be
|
||||||
|
included in the distribution. Some optional features may require
|
||||||
|
additional libraries, for example:</p>
|
||||||
|
<ul>
|
||||||
|
<li>Usage of XSLT requires either
|
||||||
|
<a href="http://xmlsoft.org/XSLT/xsltproc2.html">xsltproc</a>
|
||||||
|
or <a href="http://xmlsoft.org/XSLT/python.html">python-libxslt</a>.</li>
|
||||||
|
<li>The current interface to filters written in non-templating languages
|
||||||
|
(e.g., python) uses the
|
||||||
|
<a href="http://docs.python.org/lib/module-subprocess.html">subprocess</a>
|
||||||
|
module which was introduced in Python 2.4.</li>
|
||||||
|
<li>Usage of FOAF as a reading list requires
|
||||||
|
<a href="http://librdf.org/">librdf</a>.</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<h3>General Instructions</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
These instructions apply to any platform. Check the instructions
|
||||||
|
below for more specific instructions for your platform.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li><p>If you are reading this online, you will need to
|
||||||
|
<a href="../index.html">download</a> and extract the files into a folder somewhere.
|
||||||
|
You can place this wherever you like, <code>~/planet</code>
|
||||||
|
and <code>~/venus</code> are good
|
||||||
|
choices, but so's anywhere else you prefer.</p></li>
|
||||||
|
<li><p>This is very important: from within that directory, type the following
|
||||||
|
command:</p>
|
||||||
|
<blockquote><code>python runtests.py</code></blockquote>
|
||||||
|
<p>This should take anywhere from a one to ten seconds to execute. No network
|
||||||
|
connection is required, and the script cleans up after itself. If the
|
||||||
|
script completes with an "OK", you are good to go. Otherwise stopping here
|
||||||
|
and inquiring on the
|
||||||
|
<a href="http://lists.planetplanet.org/mailman/listinfo/devel">mailing list</a>
|
||||||
|
is a good idea as it can save you lots of frustration down the road.</p></li>
|
||||||
|
<li><p>Make a copy of one of the <code>ini</code> the files in the
|
||||||
|
<a href="../examples">examples</a> subdirectory,
|
||||||
|
and put it wherever you like; I like to use the Planet's name (so
|
||||||
|
<code>~/planet/debian</code>), but it's really up to you.</p></li>
|
||||||
|
<li><p>Edit the <code>config.ini</code> file in this directory to taste,
|
||||||
|
it's pretty well documented so you shouldn't have any problems here. Pay
|
||||||
|
particular attention to the <code>output_dir</code> option, which should be
|
||||||
|
readable by your web server. If the directory you specify in your
|
||||||
|
<code>cache_dir</code> exists; make sure that it is empty.</p></li>
|
||||||
|
<li><p>Run it: <code>python planet.py pathto/config.ini</code></p>
|
||||||
|
<p>You'll want to add this to cron, make sure you run it from the
|
||||||
|
right directory.</p></li>
|
||||||
|
<li><p>(Optional)</p>
|
||||||
|
<p>Tell us about it! We'd love to link to you on planetplanet.org :-)</p></li>
|
||||||
|
<li><p>(Optional)</p>
|
||||||
|
<p>Build your own themes, templates, or filters! And share!</p></li>
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<h3>Mac OS X and Fink Instructions</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The <a href="http://fink.sourceforge.net/">Fink Project</a> packages
|
||||||
|
various open source software for MacOS. This makes it a little easier
|
||||||
|
to get started with projects like Planet Venus.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Note: in the following, we recommend explicitly
|
||||||
|
using <code>python2.4</code>. As of this writing, Fink is starting to
|
||||||
|
support <code>python2.5</code> but the XML libraries, for example, are
|
||||||
|
not yet ported to the newer python so Venus will be less featureful.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li><p>Install the XCode development tools from your Mac OS X install
|
||||||
|
disks</p></li>
|
||||||
|
<li><p><a href="http://fink.sourceforge.net/download/">Download</a>
|
||||||
|
and install Fink</p></li>
|
||||||
|
<li><p>Tell fink to install the Planet Venus prerequisites:<br />
|
||||||
|
<code>fink install python24 celementtree-py24 bzr-py24 libxslt-py24
|
||||||
|
libxml2-py24</code></p></li>
|
||||||
|
<li><p><a href="../index.html">Download</a> and extract the Venus files into a
|
||||||
|
folder somewhere</p></li>
|
||||||
|
<li><p>Run the tests: <code>python2.4 runtests.py</code><br /> This
|
||||||
|
will warn you that the RDF library is missing, but that's
|
||||||
|
OK.</p></li>
|
||||||
|
<li><p>Continue with the general steps above, starting with Step 3. You
|
||||||
|
may want to explicitly specify <code>python2.4</code>.</p></li>
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<h3>Ubuntu Linux (Edgy Eft) instructions</h3>
|
||||||
|
|
||||||
|
<p>Before starting, issue the following command:</p>
|
||||||
|
<ul>
|
||||||
|
<li><code>sudo apt-get install bzr python2.4-librdf</code></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
</body>
|
||||||
|
</html>
|
42
docs/migration.html
Normal file
42
docs/migration.html
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
<!DOCTYPE html PUBLIC
|
||||||
|
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
|
||||||
|
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
<head>
|
||||||
|
<script type="text/javascript" src="docs.js"></script>
|
||||||
|
<link rel="stylesheet" type="text/css" href="docs.css"/>
|
||||||
|
<title>Venus Migration</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h2>Migration from Planet 2.0</h2>
|
||||||
|
<p>The intent is that existing Planet 2.0 users should be able to reuse
|
||||||
|
their existing <code>config.ini</code> and <code>.tmpl</code> files,
|
||||||
|
but the reality is that users will need to be aware of the following:</p>
|
||||||
|
<ul>
|
||||||
|
<li>You will need to start over with a new cache directory as the format
|
||||||
|
of the cache has changed dramatically.</li>
|
||||||
|
<li>Existing <code>.tmpl</code> and <code>.ini</code> files should work,
|
||||||
|
though some <a href="config.html">configuration</a> options (e.g.,
|
||||||
|
<code>days_per_page</code>) have not yet been implemented</li>
|
||||||
|
<li>No testing has been done on Python 2.1, and it is presumed not to work.</li>
|
||||||
|
<li>To take advantage of all features, you should install the optional
|
||||||
|
XML and RDF libraries described on
|
||||||
|
the <a href="installation.html">Installation</a> page.</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Common changes to config.ini include:
|
||||||
|
</p>
|
||||||
|
<ul>
|
||||||
|
<li><p>Filename changes:</p>
|
||||||
|
<pre>
|
||||||
|
examples/fancy/index.html.tmpl => themes/classic_fancy/index.html.tmpl
|
||||||
|
examples/atom.xml.tmpl => themes/common/atom.xml.xslt
|
||||||
|
examples/rss20.xml.tmpl => themes/common/rss20.xml.tmpl
|
||||||
|
examples/rss10.xml.tmpl => themes/common/rss10.xml.tmpl
|
||||||
|
examples/opml.xml.tmpl => themes/common/opml.xml.xslt
|
||||||
|
examples/foafroll.xml.tmpl => themes/common/foafroll.xml.xslt
|
||||||
|
</pre></li>
|
||||||
|
</ul>
|
||||||
|
</body>
|
||||||
|
</html>
|
92
docs/normalization.html
Normal file
92
docs/normalization.html
Normal file
@ -0,0 +1,92 @@
|
|||||||
|
<!DOCTYPE html PUBLIC
|
||||||
|
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
|
||||||
|
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
<head>
|
||||||
|
<script type="text/javascript" src="docs.js"></script>
|
||||||
|
<link rel="stylesheet" type="text/css" href="docs.css"/>
|
||||||
|
<title>Venus Normalization</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h2>Normalization</h2>
|
||||||
|
<p>Venus builds on, and extends, the <a
|
||||||
|
href="http://www.feedparser.org/">Universal Feed Parser</a> and <a
|
||||||
|
href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a> to
|
||||||
|
convert all feeds into Atom 1.0, with well formed XHTML, and encoded as UTF-8,
|
||||||
|
meaning that you don't have to worry about funky feeds, tag soup, or character
|
||||||
|
encoding.</p>
|
||||||
|
<h3>Encoding</h3>
|
||||||
|
<p>Input data in feeds may be encoded in a variety of formats, most commonly
|
||||||
|
ASCII, ISO-8859-1, WIN-1252, AND UTF-8. Additionally, many feeds make use of
|
||||||
|
the wide range of
|
||||||
|
<a href="http://www.w3.org/TR/html401/sgml/entities.html">character entity
|
||||||
|
references</a> provided by HTML. Each is converted to UTF-8, an encoding
|
||||||
|
which is a proper superset of ASCII, supports the entire range of Unicode
|
||||||
|
characters, and is one of
|
||||||
|
<a href="http://www.w3.org/TR/2006/REC-xml-20060816/#charsets">only two</a>
|
||||||
|
encodings required to be supported by all conformant XML processors.</p>
|
||||||
|
<p>Encoding problems are one of the more common feed errors, and every
|
||||||
|
attempt is made to correct common errors, such as the inclusion of
|
||||||
|
the so-called
|
||||||
|
<a href="http://www.fourmilab.ch/webtools/demoroniser/">moronic</a> versions
|
||||||
|
of smart-quotes. In rare cases where individual characters can not be
|
||||||
|
converted to valid UTF-8 or into
|
||||||
|
<a href="http://www.w3.org/TR/xml/#charsets">characters allowed in XML 1.0
|
||||||
|
documents</a>, such characters will be replaced with the Unicode
|
||||||
|
<a href="http://www.fileformat.info/info/unicode/char/fffd/index.htm">Replacement character</a>, with a title that describes the original character whenever possible.</p>
|
||||||
|
<p>In order to support the widest range of inputs, use of Python 2.3 or later,
|
||||||
|
as well as the installation of the python <code>iconvcodec</code>, is
|
||||||
|
recommended.</p>
|
||||||
|
<h3>HTML</h3>
|
||||||
|
<p>A number of different normalizations of HTML are performed. For starters,
|
||||||
|
the HTML is
|
||||||
|
<a href="http://www.feedparser.org/docs/html-sanitization.html">sanitized</a>,
|
||||||
|
meaning that HTML tags and attributes that could introduce javascript or
|
||||||
|
other security risks are removed.</p>
|
||||||
|
<p>Then,
|
||||||
|
<a href="http://www.feedparser.org/docs/resolving-relative-links.html">relative
|
||||||
|
links are resolved</a> within the HTML. This is also done for links
|
||||||
|
in other areas in the feed too.</p>
|
||||||
|
<p>Finally, unmatched tags are closed. This is done with a
|
||||||
|
<a href="http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing%20HTML">knowledge of the semantics of HTML</a>. Additionally, a
|
||||||
|
<a href="http://golem.ph.utexas.edu/~distler/blog/archives/000165.html#sanitizespec">large
|
||||||
|
subset of MathML</a>, as well as a
|
||||||
|
<a href="http://www.w3.org/TR/SVGMobile/">tiny profile of SVG</a>
|
||||||
|
is also supported.</p>
|
||||||
|
<h3>Atom 1.0</h3>
|
||||||
|
<p>The Universal Feed Parser also
|
||||||
|
<a href="http://www.feedparser.org/docs/content-normalization.html">normalizes the content of feeds</a>. This involves a
|
||||||
|
<a href="http://www.feedparser.org/docs/reference.html">large number of elements</a>; the best place to start is to look at
|
||||||
|
<a href="http://www.feedparser.org/docs/annotated-examples.html">annotated examples</a>. Among other things a wide variety of
|
||||||
|
<a href="http://www.feedparser.org/docs/date-parsing.html">date formats</a>
|
||||||
|
are converted into
|
||||||
|
<a href="http://www.ietf.org/rfc/rfc3339.txt">RFC 3339</a> formatted dates.</p>
|
||||||
|
<p>If no <a href="http://www.feedparser.org/docs/reference-entry-id.html">ids</a> are found in entries, attempts are made to synthesize one using (in order):</p>
|
||||||
|
<ul>
|
||||||
|
<li><a href="http://www.feedparser.org/docs/reference-entry-link.html">link</a></li>
|
||||||
|
<li><a href="http://www.feedparser.org/docs/reference-entry-title.html">title</a></li>
|
||||||
|
<li><a href="http://www.feedparser.org/docs/reference-entry-summary.html">summary</a></li>
|
||||||
|
<li><a href="http://www.feedparser.org/docs/reference-entry-content.html">content</a></li>
|
||||||
|
</ul>
|
||||||
|
<p>If no <a href="http://www.feedparser.org/docs/reference-feed-
|
||||||
|
updated.html">updated</a> dates are found in an entry, or if the dates found
|
||||||
|
are in the future, the current time is substituted.</p>
|
||||||
|
<h3 id="overrides">Overrides</h3>
|
||||||
|
<p>All of the above describes what Venus does automatically, either directly
|
||||||
|
or through its dependencies. There are a number of errors which can not
|
||||||
|
be corrected automatically, and for these, there are configuration parameters
|
||||||
|
that can be used to help.</p>
|
||||||
|
<ul>
|
||||||
|
<li><code>ignore_in_feed</code> allows you to list any number of elements
|
||||||
|
or attributes which are to be ignored in feeds. This is often handy in the
|
||||||
|
case of feeds where the <code>id</code>, <code>updated</code> or
|
||||||
|
<code>xml:lang</code> values can't be trusted.</li>
|
||||||
|
<li><code>title_type</code>, <code>summary_type</code>,
|
||||||
|
<code>content_type</code> allow you to override the
|
||||||
|
<a href="http://www.feedparser.org/docs/reference-entry-title_detail.html#reference.entry.title_detail.type"><code>type</code></a>
|
||||||
|
attributes on these elements.</li>
|
||||||
|
<li><code>name_type</code> does something similar for
|
||||||
|
<a href="http://www.feedparser.org/docs/reference-entry-author_detail.html#reference.entry.author_detail.name">author names</a></li>
|
||||||
|
</ul>
|
||||||
|
</body>
|
||||||
|
</html>
|
129
docs/templates.html
Normal file
129
docs/templates.html
Normal file
@ -0,0 +1,129 @@
|
|||||||
|
<!DOCTYPE html PUBLIC
|
||||||
|
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
|
||||||
|
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
<head>
|
||||||
|
<script type="text/javascript" src="docs.js"></script>
|
||||||
|
<link rel="stylesheet" type="text/css" href="docs.css"/>
|
||||||
|
<title>Venus Templates</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h2>Templates</h2>
|
||||||
|
<p>Template names take the form
|
||||||
|
<em>name</em><code>.</code><em>ext</em><code>.</code><em>type</em>, where
|
||||||
|
<em>name</em><code>.</code><em>ext</em> identifies the name of the output file
|
||||||
|
to be created in the <code>output_directory</code>, and <em>type</em>
|
||||||
|
indicates which language processor to use for the template.</p>
|
||||||
|
<p>Like with <a href="filter.html">filters</a>, templates may be written
|
||||||
|
in a variety of languages and are based on the standard Unix pipe convention
|
||||||
|
of producing <code>stdout</code> from <code>stdin</code>, but in practice
|
||||||
|
two languages are used more than others:</p>
|
||||||
|
<h3>htmltmpl</h3>
|
||||||
|
<p>Many find <a href="http://htmltmpl.sourceforge.net/">htmltmpl</a>
|
||||||
|
easier to get started with as you can take a simple example of your
|
||||||
|
output file, sprinkle in a few <code><TMPL_VAR></code>s and
|
||||||
|
<code><TMPL_LOOP></code>s and you are done. Eventually, however,
|
||||||
|
you may find that your template involves <code><TMPL_IF></code>
|
||||||
|
blocks inside of attribute values, and you may find the result difficult
|
||||||
|
to read and create correctly.</p>
|
||||||
|
<p>It is also important to note that htmltmpl based templates do not
|
||||||
|
have access to the full set of information available in the feed, just
|
||||||
|
the following (rather substantial) subset:</p>
|
||||||
|
|
||||||
|
<blockquote>
|
||||||
|
<table border="1" cellpadding="5" cellspacing="0">
|
||||||
|
<tr><th>VAR</th><th>type</th><th>source</th></tr>
|
||||||
|
<tr><td>author</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-author.html">author</a></td></tr>
|
||||||
|
<tr><td>author_name</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-author_detail.html#reference.feed.author_detail.name">author_detail.name</a></td></tr>
|
||||||
|
<tr><td>generator</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-generator.html">generator</a></td></tr>
|
||||||
|
<tr><td>id</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-id.html">id</a></td></tr>
|
||||||
|
<tr><td>icon</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-icon.html">icon</a></td></tr>
|
||||||
|
<tr><td>last_updated_822</td><td>Rfc822</td><td><a href="http://feedparser.org/docs/reference-feed-icon.html">updated_parsed</a></td></tr>
|
||||||
|
<tr><td>last_updated_iso</td><td>Rfc3399</td><td><a href="http://feedparser.org/docs/reference-feed-icon.html">updated_parsed</a></td></tr>
|
||||||
|
<tr><td>last_updated</td><td>PlanetDate</td><td><a href="http://feedparser.org/docs/reference-feed-icon.html">updated_parsed</a></td></tr>
|
||||||
|
<tr><td>link</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-link.html">link</a></td></tr>
|
||||||
|
<tr><td>logo</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-logo.html">logo</a></td></tr>
|
||||||
|
<tr><td>rights</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-rights_detail.html#reference.feed.rights_detail.value">rights_detail.value</a></td></tr>
|
||||||
|
<tr><td>subtitle</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-subtitle_detail.html#reference.feed.subtitle_detail.value">subtitle_detail.value</a></td></tr>
|
||||||
|
<tr><td>title</td><td>String</td><td><a href="http://feedparser.org/docs/reference-feed-title_detail.html#reference.feed.title_detail.value">title_detail.value</a></td></tr>
|
||||||
|
<tr><td>title_plain</td><td>Plain</td><td><a href="http://feedparser.org/docs/reference-feed-title_detail.html#reference.feed.title_detail.value">title_detail.value</a></td></tr>
|
||||||
|
<tr><td rowspan="2">url</td><td rowspan="2">String</td><td><a href="http://feedparser.org/docs/reference-feed-links.html#reference.feed.links.href">links[rel='self'].href</a></td></tr>
|
||||||
|
<tr><td><a href="http://feedparser.org/docs/reference-headers.html">headers['location']</a></td></tr>
|
||||||
|
</table>
|
||||||
|
</blockquote>
|
||||||
|
|
||||||
|
<p>Note: when multiple sources are listed, the last one wins</p>
|
||||||
|
<p>In addition to these variables, Planet Venus makes available two
|
||||||
|
arrays, <code>Channels</code> and <code>Items</code>, with one entry
|
||||||
|
per subscription and per output entry respectively. The data values
|
||||||
|
within the <code>Channels</code> array exactly match the above list.
|
||||||
|
The data values within the <code>Items</code> array are as follows:</p>
|
||||||
|
|
||||||
|
<blockquote>
|
||||||
|
<table border="1" cellpadding="5" cellspacing="0">
|
||||||
|
<tr><th>VAR</th><th>type</th><th>source</th></tr>
|
||||||
|
<tr><td>author</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-author.html">author</a></td></tr>
|
||||||
|
<tr><td>author_email</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-author_detail.html#reference.entry.author_detail.email">author_detail.email</a></td></tr>
|
||||||
|
<tr><td>author_name</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-author_detail.html#reference.entry.author_detail.name">author_detail.name</a></td></tr>
|
||||||
|
<tr><td>author_uri</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-author_detail.html#reference.entry.author_detail.href">author_detail.href</a></td></tr>
|
||||||
|
<tr><td>content_language</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-content.html#reference.entry.content.language">content[0].language</a></td></tr>
|
||||||
|
<tr><td rowspan="2">content</td><td rowspan="2">String</td><td><a href="http://feedparser.org/docs/reference-entry-summary_detail.html#reference.entry.summary_detail.value">summary_detail.value</a></td></tr>
|
||||||
|
<tr><td><a href="http://feedparser.org/docs/reference-entry-content.html#reference.entry.content.value">content[0].value</a></td></tr>
|
||||||
|
<tr><td rowspan="2">date</td><td rowspan="2">PlanetDate</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
|
||||||
|
<tr><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
|
||||||
|
<tr><td rowspan="2">date_822</td><td rowspan="2">Rfc822</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
|
||||||
|
<tr><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
|
||||||
|
<tr><td rowspan="2">date_iso</td><td rowspan="2">Rfc3399</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
|
||||||
|
<tr><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
|
||||||
|
<tr><td><ins>enclosure_href</ins></td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-enclosures.html#reference.entry.enclosures.href">enclosures[0].href</a></td></tr>
|
||||||
|
<tr><td><ins>enclosure_length</ins></td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-enclosures.html#reference.entry.enclosures.length">enclosures[0].length</a></td></tr>
|
||||||
|
<tr><td><ins>enclosure_type</ins></td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-enclosures.html#reference.entry.enclosures.type">enclosures[0].type</a></td></tr>
|
||||||
|
<tr><td><ins>guid_isPermaLink</ins></td><td>String</td><td><a href="http://blogs.law.harvard.edu/tech/rss#ltguidgtSubelementOfLtitemgt">isPermaLink</a></td></tr>
|
||||||
|
<tr><td>id</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-id.html">id</a></td></tr>
|
||||||
|
<tr><td>link</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-links.html#reference.entry.links.href">links[rel='alternate'].href</a></td></tr>
|
||||||
|
<tr><td>new_channel</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-id.html">id</a></td></tr>
|
||||||
|
<tr><td rowspan="2">new_date</td><td rowspan="2">NewDate</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
|
||||||
|
<tr><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
|
||||||
|
<tr><td>rights</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-rights_detail.html#reference.entry.rights_detail.value">rights_detail.value</a></td></tr>
|
||||||
|
<tr><td>title_language</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-title_detail.html#reference.entry.title_detail.language">title_detail.language</a></td></tr>
|
||||||
|
<tr><td>title_plain</td><td>Plain</td><td><a href="http://feedparser.org/docs/reference-entry-title_detail.html#reference.entry.title_detail.value">title_detail.value</a></td></tr>
|
||||||
|
<tr><td>title</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-title_detail.html#reference.entry.title_detail.value">title_detail.value</a></td></tr>
|
||||||
|
<tr><td>summary_language</td><td>String</td><td><a href="http://feedparser.org/docs/reference-entry-summary_detail.html#reference.entry.summary_detail.language">summary_detail.language</a></td></tr>
|
||||||
|
<tr><td>updated</td><td>PlanetDate</td><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
|
||||||
|
<tr><td>updated_822</td><td>Rfc822</td><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
|
||||||
|
<tr><td>updated_iso</td><td>Rfc3399</td><td><a href="http://feedparser.org/docs/reference-entry-updated_parsed.html">updated_parsed</a></td></tr>
|
||||||
|
<tr><td>published</td><td>PlanetDate</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
|
||||||
|
<tr><td>published_822</td><td>Rfc822</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
|
||||||
|
<tr><td>published_iso</td><td>Rfc3399</td><td><a href="http://feedparser.org/docs/reference-entry-published_parsed.html">published_parsed</a></td></tr>
|
||||||
|
</table>
|
||||||
|
</blockquote>
|
||||||
|
<p>Note: variables above which start with
|
||||||
|
<code>new_</code> are only set if their values differ from the previous
|
||||||
|
Item.</p>
|
||||||
|
|
||||||
|
<h3>xslt</h3>
|
||||||
|
<p><a href="http://www.w3.org/TR/xslt">XSLT</a> is a paradox: it actually
|
||||||
|
makes some simple things easier to do than htmltmpl, and certainly can
|
||||||
|
make more difficult things possible; but it is fair to say that many
|
||||||
|
find XSLT less approachable than htmltmpl.</p>
|
||||||
|
<p>But in any case, the XSLT support is easier to document as the
|
||||||
|
input is a <a href="normalization.html">highly normalized</a> feed,
|
||||||
|
with a few extension elements.</p>
|
||||||
|
<ul>
|
||||||
|
<li><code>atom:feed</code> will have the following child elements:
|
||||||
|
<ul>
|
||||||
|
<li>A <code>planet:source</code> element per subscription, with the same child elements as <a href="http://www.atomenabled.org/developers/syndication/atom-format-spec.php#element.source"><code>atom:source</code></a>, as well as
|
||||||
|
an additional child element in the planet namespace for each
|
||||||
|
<a href="config.html#subscription">configuration parameter</a> that applies to
|
||||||
|
this subscription.</li>
|
||||||
|
<li><a href="http://www.feedparser.org/docs/reference-version.html"><code>planet:format</code></a> indicating the format and version of the source feed.</li>
|
||||||
|
<li><a href="http://www.feedparser.org/docs/reference-bozo.html"><code>planet:bozo</code></a> which is either <code>true</code> or <code>false</code>.</li>
|
||||||
|
</ul>
|
||||||
|
</li>
|
||||||
|
<li><code>atom:updated</code> and <code>atom:published</code> will have
|
||||||
|
a <code>planet:format</code> attribute containing the referenced date
|
||||||
|
formatted according to the <code>[planet] date_format</code> specified
|
||||||
|
in the configuration</li>
|
||||||
|
</ul>
|
||||||
|
</body>
|
||||||
|
</html>
|
82
examples/filters/categories/categories.xslt
Normal file
82
examples/filters/categories/categories.xslt
Normal file
@ -0,0 +1,82 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE xsl:stylesheet [
|
||||||
|
<!ENTITY categoryTerm "WebSemantique">
|
||||||
|
]>
|
||||||
|
<!--
|
||||||
|
|
||||||
|
This transformation is released under the same licence as Python
|
||||||
|
see http://www.intertwingly.net/code/venus/LICENCE.
|
||||||
|
|
||||||
|
Author: Eric van der Vlist <vdv@dyomedea.com>
|
||||||
|
|
||||||
|
This transformation is meant to be used as a filter that determines if
|
||||||
|
Atom entries are relevant to a specific topic and adds the corresonding
|
||||||
|
<category/> element when it is the case.
|
||||||
|
|
||||||
|
This is done by a simple keyword matching mechanism.
|
||||||
|
|
||||||
|
To customize this filter to your needs:
|
||||||
|
|
||||||
|
1) Replace WebSemantique by your own category name in the definition of
|
||||||
|
the categoryTerm entity above.
|
||||||
|
2) Review the "upper" and "lower" variables that are used to convert text
|
||||||
|
nodes to lower case and replace common ponctuation signs into spaces
|
||||||
|
to check that they meet your needs.
|
||||||
|
3) Define your own list of keywords in <d:keyword/> elements. Note that
|
||||||
|
the leading and trailing spaces are significant: "> rdf <" will match rdf
|
||||||
|
as en entier word while ">rdf<" would match the substring "rdf" and
|
||||||
|
"> rdf<" would match words starting by rdf. Also note that the test is done
|
||||||
|
after conversion to lowercase.
|
||||||
|
|
||||||
|
To use it with venus, just add this filter to the list of filters, for instance:
|
||||||
|
|
||||||
|
filters= categories.xslt guess_language.py
|
||||||
|
|
||||||
|
-->
|
||||||
|
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
|
||||||
|
xmlns:atom="http://www.w3.org/2005/Atom" xmlns="http://www.w3.org/2005/Atom"
|
||||||
|
xmlns:d="http://ns.websemantique.org/data/" exclude-result-prefixes="d atom" version="1.0">
|
||||||
|
<xsl:variable name="upper"
|
||||||
|
>,.;AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZzÀàÁáÂâÃãÄäÅ寿ÇçÈèÉéÊêËëÌìÍíÎîÏïÐðÑñÒòÓóÔôÕõÖöØøÙùÚúÛûÜüÝýÞþ</xsl:variable>
|
||||||
|
<xsl:variable name="lower"
|
||||||
|
> aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzzaaaaaaaaaaaaææcceeeeeeeeiiiiiiiiððnnooooooooooøøuuuuuuuuyyþþ</xsl:variable>
|
||||||
|
<d:keywords>
|
||||||
|
<d:keyword> wiki semantique </d:keyword>
|
||||||
|
<d:keyword> wikis semantiques </d:keyword>
|
||||||
|
<d:keyword> web semantique </d:keyword>
|
||||||
|
<d:keyword> websemantique </d:keyword>
|
||||||
|
<d:keyword> semantic web</d:keyword>
|
||||||
|
<d:keyword> semweb</d:keyword>
|
||||||
|
<d:keyword> rdf</d:keyword>
|
||||||
|
<d:keyword> owl </d:keyword>
|
||||||
|
<d:keyword> sparql </d:keyword>
|
||||||
|
<d:keyword> topic map</d:keyword>
|
||||||
|
<d:keyword> doap </d:keyword>
|
||||||
|
<d:keyword> foaf </d:keyword>
|
||||||
|
<d:keyword> sioc </d:keyword>
|
||||||
|
<d:keyword> ontology </d:keyword>
|
||||||
|
<d:keyword> ontologie</d:keyword>
|
||||||
|
<d:keyword> dublin core </d:keyword>
|
||||||
|
</d:keywords>
|
||||||
|
<xsl:template match="@*|node()">
|
||||||
|
<xsl:copy>
|
||||||
|
<xsl:apply-templates select="@*|node()"/>
|
||||||
|
</xsl:copy>
|
||||||
|
</xsl:template>
|
||||||
|
<xsl:template match="atom:entry/atom:updated">
|
||||||
|
<xsl:copy>
|
||||||
|
<xsl:apply-templates select="@*|node()"/>
|
||||||
|
</xsl:copy>
|
||||||
|
<xsl:variable name="concatenatedText">
|
||||||
|
<xsl:for-each select="../atom:title|../atom:summary|../atom:content|../atom:category/@term">
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
|
<xsl:value-of select="translate(., $upper, $lower)"/>
|
||||||
|
</xsl:for-each>
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
|
</xsl:variable>
|
||||||
|
<xsl:if test="document('')/*/d:keywords/d:keyword[contains($concatenatedText, .)]">
|
||||||
|
<category term="WebSemantique"/>
|
||||||
|
</xsl:if>
|
||||||
|
</xsl:template>
|
||||||
|
<xsl:template match="atom:category[@term='&categoryTerm;']"/>
|
||||||
|
</xsl:stylesheet>
|
37
examples/filters/guess-language/README
Normal file
37
examples/filters/guess-language/README
Normal file
@ -0,0 +1,37 @@
|
|||||||
|
This filter is released under the same licence as Python
|
||||||
|
see http://www.intertwingly.net/code/venus/LICENCE.
|
||||||
|
|
||||||
|
Author: Eric van der Vlist <vdv@dyomedea.com>
|
||||||
|
|
||||||
|
This filter guesses whether an Atom entry is written
|
||||||
|
in English or French. It should be trivial to chose between
|
||||||
|
two other languages, easy to extend to more than two languages
|
||||||
|
and useful to pass these languages as Venus configuration
|
||||||
|
parameters.
|
||||||
|
|
||||||
|
The code used to guess the language is the one that has been
|
||||||
|
described by Douglas Bagnall as the Python recipe titled
|
||||||
|
"Language detection using character trigrams"
|
||||||
|
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/326576.
|
||||||
|
|
||||||
|
To add support for a new language, this language must first be
|
||||||
|
"learned" using learn-language.py. This learning phase is nothing
|
||||||
|
more than saving a pickled version of the Trigram object for this
|
||||||
|
language.
|
||||||
|
|
||||||
|
To learn Finnish, you would execute:
|
||||||
|
|
||||||
|
$ ./learn-language.py http://gutenberg.net/dirs/1/0/4/9/10492/10492-8.txt fi.data
|
||||||
|
|
||||||
|
where http://gutenberg.net/dirs/1/0/4/9/10492/10492-8.txt is a text
|
||||||
|
representative of the Finnish language and "fi.data" is the name of the
|
||||||
|
data file for "fi" (ISO code for Finnish).
|
||||||
|
|
||||||
|
To install this filter, copy this directory under the Venus
|
||||||
|
filter directory and declare it in your filters list, for instance:
|
||||||
|
|
||||||
|
filters= categories.xslt guess-language/guess-language.py
|
||||||
|
|
||||||
|
NOTE: this filter depends on Amara
|
||||||
|
(http://uche.ogbuji.net/tech/4suite/amara/)
|
||||||
|
|
15131
examples/filters/guess-language/en.data
Normal file
15131
examples/filters/guess-language/en.data
Normal file
File diff suppressed because it is too large
Load Diff
22710
examples/filters/guess-language/fr.data
Normal file
22710
examples/filters/guess-language/fr.data
Normal file
File diff suppressed because it is too large
Load Diff
58
examples/filters/guess-language/guess-language.py
Normal file
58
examples/filters/guess-language/guess-language.py
Normal file
@ -0,0 +1,58 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
"""A filter to guess languages.
|
||||||
|
|
||||||
|
This filter guesses whether an Atom entry is written
|
||||||
|
in English or French. It should be trivial to chose between
|
||||||
|
two other languages, easy to extend to more than two languages
|
||||||
|
and useful to pass these languages as Venus configuration
|
||||||
|
parameters.
|
||||||
|
|
||||||
|
(See the REAME file for more details).
|
||||||
|
|
||||||
|
Requires Python 2.1, recommends 2.4.
|
||||||
|
"""
|
||||||
|
__authors__ = [ "Eric van der Vlist <vdv@dyomedea.com>"]
|
||||||
|
__license__ = "Python"
|
||||||
|
|
||||||
|
import amara
|
||||||
|
from sys import stdin, stdout
|
||||||
|
from trigram import Trigram
|
||||||
|
from xml.dom import XML_NAMESPACE as XML_NS
|
||||||
|
import cPickle
|
||||||
|
|
||||||
|
ATOM_NSS = {
|
||||||
|
u'atom': u'http://www.w3.org/2005/Atom',
|
||||||
|
u'xml': XML_NS
|
||||||
|
}
|
||||||
|
|
||||||
|
langs = {}
|
||||||
|
|
||||||
|
def tri(lang):
|
||||||
|
if not langs.has_key(lang):
|
||||||
|
f = open('filters/guess-language/%s.data' % lang, 'r')
|
||||||
|
t = cPickle.load(f)
|
||||||
|
f.close()
|
||||||
|
langs[lang] = t
|
||||||
|
return langs[lang]
|
||||||
|
|
||||||
|
|
||||||
|
def guess_language(entry):
|
||||||
|
text = u'';
|
||||||
|
for child in entry.xml_xpath(u'atom:title|atom:summary|atom:content'):
|
||||||
|
text = text + u' '+ child.__unicode__()
|
||||||
|
t = Trigram()
|
||||||
|
t.parseString(text)
|
||||||
|
if tri('fr') - t > tri('en') - t:
|
||||||
|
lang=u'en'
|
||||||
|
else:
|
||||||
|
lang=u'fr'
|
||||||
|
entry.xml_set_attribute((u'xml:lang', XML_NS), lang)
|
||||||
|
|
||||||
|
def main():
|
||||||
|
feed = amara.parse(stdin, prefixes=ATOM_NSS)
|
||||||
|
for entry in feed.xml_xpath(u'//atom:entry[not(@xml:lang)]'):
|
||||||
|
guess_language(entry)
|
||||||
|
feed.xml(stdout)
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
25
examples/filters/guess-language/learn-language.py
Executable file
25
examples/filters/guess-language/learn-language.py
Executable file
@ -0,0 +1,25 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
"""A filter to guess languages.
|
||||||
|
|
||||||
|
This utility saves a Trigram object on file.
|
||||||
|
|
||||||
|
(See the REAME file for more details).
|
||||||
|
|
||||||
|
Requires Python 2.1, recommends 2.4.
|
||||||
|
"""
|
||||||
|
__authors__ = [ "Eric van der Vlist <vdv@dyomedea.com>"]
|
||||||
|
__license__ = "Python"
|
||||||
|
|
||||||
|
from trigram import Trigram
|
||||||
|
from sys import argv
|
||||||
|
from cPickle import dump
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
tri = Trigram(argv[1])
|
||||||
|
out = open(argv[2], 'w')
|
||||||
|
dump(tri, out)
|
||||||
|
out.close()
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
188
examples/filters/guess-language/trigram.py
Normal file
188
examples/filters/guess-language/trigram.py
Normal file
@ -0,0 +1,188 @@
|
|||||||
|
#!/usr/bin/python
|
||||||
|
# -*- coding: UTF-8 -*-
|
||||||
|
"""
|
||||||
|
This class is based on the Python recipe titled
|
||||||
|
"Language detection using character trigrams"
|
||||||
|
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/326576
|
||||||
|
by Douglas Bagnall.
|
||||||
|
It has been (slightly) adapted by Eric van der Vlist to support
|
||||||
|
Unicode and accept a method to parse strings.
|
||||||
|
"""
|
||||||
|
__authors__ = [ "Douglas Bagnall", "Eric van der Vlist <vdv@dyomedea.com>"]
|
||||||
|
__license__ = "Python"
|
||||||
|
|
||||||
|
import random
|
||||||
|
from urllib import urlopen
|
||||||
|
|
||||||
|
class Trigram:
|
||||||
|
"""
|
||||||
|
From one or more text files, the frequency of three character
|
||||||
|
sequences is calculated. When treated as a vector, this information
|
||||||
|
can be compared to other trigrams, and the difference between them
|
||||||
|
seen as an angle. The cosine of this angle varies between 1 for
|
||||||
|
complete similarity, and 0 for utter difference. Since letter
|
||||||
|
combinations are characteristic to a language, this can be used to
|
||||||
|
determine the language of a body of text. For example:
|
||||||
|
|
||||||
|
>>> reference_en = Trigram('/path/to/reference/text/english')
|
||||||
|
>>> reference_de = Trigram('/path/to/reference/text/german')
|
||||||
|
>>> unknown = Trigram('url://pointing/to/unknown/text')
|
||||||
|
>>> unknown.similarity(reference_de)
|
||||||
|
0.4
|
||||||
|
>>> unknown.similarity(reference_en)
|
||||||
|
0.95
|
||||||
|
|
||||||
|
would indicate the unknown text is almost cetrtainly English. As
|
||||||
|
syntax sugar, the minus sign is overloaded to return the difference
|
||||||
|
between texts, so the above objects would give you:
|
||||||
|
|
||||||
|
>>> unknown - reference_de
|
||||||
|
0.6
|
||||||
|
>>> reference_en - unknown # order doesn't matter.
|
||||||
|
0.05
|
||||||
|
|
||||||
|
As it stands, the Trigram ignores character set information, which
|
||||||
|
means you can only accurately compare within a single encoding
|
||||||
|
(iso-8859-1 in the examples). A more complete implementation might
|
||||||
|
convert to unicode first.
|
||||||
|
|
||||||
|
As an extra bonus, there is a method to make up nonsense words in the
|
||||||
|
style of the Trigram's text.
|
||||||
|
|
||||||
|
>>> reference_en.makeWords(30)
|
||||||
|
My withillonquiver and ald, by now wittlectionsurper, may sequia,
|
||||||
|
tory, I ad my notter. Marriusbabilly She lady for rachalle spen
|
||||||
|
hat knong al elf
|
||||||
|
|
||||||
|
Beware when using urls: HTML won't be parsed out.
|
||||||
|
|
||||||
|
Most methods chatter away to standard output, to let you know they're
|
||||||
|
still there.
|
||||||
|
"""
|
||||||
|
|
||||||
|
length = 0
|
||||||
|
|
||||||
|
def __init__(self, fn=None):
|
||||||
|
self.lut = {}
|
||||||
|
if fn is not None:
|
||||||
|
self.parseFile(fn)
|
||||||
|
|
||||||
|
def _parseAFragment(self, line, pair=' '):
|
||||||
|
for letter in line:
|
||||||
|
d = self.lut.setdefault(pair, {})
|
||||||
|
d[letter] = d.get(letter, 0) + 1
|
||||||
|
pair = pair[1] + letter
|
||||||
|
return pair
|
||||||
|
|
||||||
|
def parseString(self, string):
|
||||||
|
self._parseAFragment(string)
|
||||||
|
self.measure()
|
||||||
|
|
||||||
|
def parseFile(self, fn, encoding="iso-8859-1"):
|
||||||
|
pair = ' '
|
||||||
|
if '://' in fn:
|
||||||
|
#print "trying to fetch url, may take time..."
|
||||||
|
f = urlopen(fn)
|
||||||
|
else:
|
||||||
|
f = open(fn)
|
||||||
|
for z, line in enumerate(f):
|
||||||
|
#if not z % 1000:
|
||||||
|
# print "line %s" % z
|
||||||
|
# \n's are spurious in a prose context
|
||||||
|
pair = self._parseAFragment(line.strip().decode(encoding) + ' ')
|
||||||
|
f.close()
|
||||||
|
self.measure()
|
||||||
|
|
||||||
|
|
||||||
|
def measure(self):
|
||||||
|
"""calculates the scalar length of the trigram vector and
|
||||||
|
stores it in self.length."""
|
||||||
|
total = 0
|
||||||
|
for y in self.lut.values():
|
||||||
|
total += sum([ x * x for x in y.values() ])
|
||||||
|
self.length = total ** 0.5
|
||||||
|
|
||||||
|
def similarity(self, other):
|
||||||
|
"""returns a number between 0 and 1 indicating similarity.
|
||||||
|
1 means an identical ratio of trigrams;
|
||||||
|
0 means no trigrams in common.
|
||||||
|
"""
|
||||||
|
if not isinstance(other, Trigram):
|
||||||
|
raise TypeError("can't compare Trigram with non-Trigram")
|
||||||
|
lut1 = self.lut
|
||||||
|
lut2 = other.lut
|
||||||
|
total = 0
|
||||||
|
for k in lut1.keys():
|
||||||
|
if k in lut2:
|
||||||
|
a = lut1[k]
|
||||||
|
b = lut2[k]
|
||||||
|
for x in a:
|
||||||
|
if x in b:
|
||||||
|
total += a[x] * b[x]
|
||||||
|
|
||||||
|
return float(total) / (self.length * other.length)
|
||||||
|
|
||||||
|
def __sub__(self, other):
|
||||||
|
"""indicates difference between trigram sets; 1 is entirely
|
||||||
|
different, 0 is entirely the same."""
|
||||||
|
return 1 - self.similarity(other)
|
||||||
|
|
||||||
|
|
||||||
|
def makeWords(self, count):
|
||||||
|
"""returns a string of made-up words based on the known text."""
|
||||||
|
text = []
|
||||||
|
k = ' '
|
||||||
|
while count:
|
||||||
|
n = self.likely(k)
|
||||||
|
text.append(n)
|
||||||
|
k = k[1] + n
|
||||||
|
if n in ' \t':
|
||||||
|
count -= 1
|
||||||
|
return ''.join(text)
|
||||||
|
|
||||||
|
|
||||||
|
def likely(self, k):
|
||||||
|
"""Returns a character likely to follow the given string
|
||||||
|
two character string, or a space if nothing is found."""
|
||||||
|
if k not in self.lut:
|
||||||
|
return ' '
|
||||||
|
# if you were using this a lot, caching would a good idea.
|
||||||
|
letters = []
|
||||||
|
for k, v in self.lut[k].items():
|
||||||
|
letters.append(k * v)
|
||||||
|
letters = ''.join(letters)
|
||||||
|
return random.choice(letters)
|
||||||
|
|
||||||
|
|
||||||
|
def test():
|
||||||
|
en = Trigram('http://gutenberg.net/dirs/etext97/lsusn11.txt')
|
||||||
|
#NB fr and some others have English license text.
|
||||||
|
# no has english excerpts.
|
||||||
|
fr = Trigram('http://gutenberg.net/dirs/etext03/candi10.txt')
|
||||||
|
fi = Trigram('http://gutenberg.net/dirs/1/0/4/9/10492/10492-8.txt')
|
||||||
|
no = Trigram('http://gutenberg.net/dirs/1/2/8/4/12844/12844-8.txt')
|
||||||
|
se = Trigram('http://gutenberg.net/dirs/1/0/1/1/10117/10117-8.txt')
|
||||||
|
no2 = Trigram('http://gutenberg.net/dirs/1/3/0/4/13041/13041-8.txt')
|
||||||
|
en2 = Trigram('http://gutenberg.net/dirs/etext05/cfgsh10.txt')
|
||||||
|
fr2 = Trigram('http://gutenberg.net/dirs/1/3/7/0/13704/13704-8.txt')
|
||||||
|
print "calculating difference:"
|
||||||
|
print "en - fr is %s" % (en - fr)
|
||||||
|
print "fr - en is %s" % (fr - en)
|
||||||
|
print "en - en2 is %s" % (en - en2)
|
||||||
|
print "en - fr2 is %s" % (en - fr2)
|
||||||
|
print "fr - en2 is %s" % (fr - en2)
|
||||||
|
print "fr - fr2 is %s" % (fr - fr2)
|
||||||
|
print "fr2 - en2 is %s" % (fr2 - en2)
|
||||||
|
print "fi - fr is %s" % (fi - fr)
|
||||||
|
print "fi - en is %s" % (fi - en)
|
||||||
|
print "fi - se is %s" % (fi - se)
|
||||||
|
print "no - se is %s" % (no - se)
|
||||||
|
print "en - no is %s" % (en - no)
|
||||||
|
print "no - no2 is %s" % (no - no2)
|
||||||
|
print "se - no2 is %s" % (se - no2)
|
||||||
|
print "en - no2 is %s" % (en - no2)
|
||||||
|
print "fr - no2 is %s" % (fr - no2)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
test()
|
@ -20,6 +20,7 @@ if __name__ == "__main__":
|
|||||||
config_file = "config.ini"
|
config_file = "config.ini"
|
||||||
offline = 0
|
offline = 0
|
||||||
verbose = 0
|
verbose = 0
|
||||||
|
only_if_new = 0
|
||||||
|
|
||||||
for arg in sys.argv[1:]:
|
for arg in sys.argv[1:]:
|
||||||
if arg == "-h" or arg == "--help":
|
if arg == "-h" or arg == "--help":
|
||||||
@ -29,12 +30,15 @@ if __name__ == "__main__":
|
|||||||
print " -v, --verbose DEBUG level logging during update"
|
print " -v, --verbose DEBUG level logging during update"
|
||||||
print " -o, --offline Update the Planet from the cache only"
|
print " -o, --offline Update the Planet from the cache only"
|
||||||
print " -h, --help Display this help message and exit"
|
print " -h, --help Display this help message and exit"
|
||||||
|
print " -n, --only-if-new Only spider new feeds"
|
||||||
print
|
print
|
||||||
sys.exit(0)
|
sys.exit(0)
|
||||||
elif arg == "-v" or arg == "--verbose":
|
elif arg == "-v" or arg == "--verbose":
|
||||||
verbose = 1
|
verbose = 1
|
||||||
elif arg == "-o" or arg == "--offline":
|
elif arg == "-o" or arg == "--offline":
|
||||||
offline = 1
|
offline = 1
|
||||||
|
elif arg == "-n" or arg == "--only-if-new":
|
||||||
|
only_if_new = 1
|
||||||
elif arg.startswith("-"):
|
elif arg.startswith("-"):
|
||||||
print >>sys.stderr, "Unknown option:", arg
|
print >>sys.stderr, "Unknown option:", arg
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
@ -46,11 +50,11 @@ if __name__ == "__main__":
|
|||||||
|
|
||||||
if verbose:
|
if verbose:
|
||||||
import planet
|
import planet
|
||||||
planet.getLogger('DEBUG')
|
planet.getLogger('DEBUG',config.log_format())
|
||||||
|
|
||||||
if not offline:
|
if not offline:
|
||||||
from planet import spider
|
from planet import spider
|
||||||
spider.spiderPlanet()
|
spider.spiderPlanet(only_if_new=only_if_new)
|
||||||
|
|
||||||
from planet import splice
|
from planet import splice
|
||||||
doc = splice.splice()
|
doc = splice.splice()
|
||||||
|
@ -9,7 +9,7 @@ config.__init__()
|
|||||||
from ConfigParser import ConfigParser
|
from ConfigParser import ConfigParser
|
||||||
from urlparse import urljoin
|
from urlparse import urljoin
|
||||||
|
|
||||||
def getLogger(level):
|
def getLogger(level, format):
|
||||||
""" get a logger with the specified log level """
|
""" get a logger with the specified log level """
|
||||||
global logger
|
global logger
|
||||||
if logger: return logger
|
if logger: return logger
|
||||||
@ -19,7 +19,7 @@ def getLogger(level):
|
|||||||
except:
|
except:
|
||||||
import compat_logging as logging
|
import compat_logging as logging
|
||||||
|
|
||||||
logging.basicConfig()
|
logging.basicConfig(format=format)
|
||||||
logging.getLogger().setLevel(logging.getLevelName(level))
|
logging.getLogger().setLevel(logging.getLevelName(level))
|
||||||
logger = logging.getLogger("planet.runner")
|
logger = logging.getLogger("planet.runner")
|
||||||
try:
|
try:
|
||||||
|
@ -1090,7 +1090,7 @@ Logger.manager = Manager(Logger.root)
|
|||||||
|
|
||||||
BASIC_FORMAT = "%(levelname)s:%(name)s:%(message)s"
|
BASIC_FORMAT = "%(levelname)s:%(name)s:%(message)s"
|
||||||
|
|
||||||
def basicConfig():
|
def basicConfig(format=BASIC_FORMAT):
|
||||||
"""
|
"""
|
||||||
Do basic configuration for the logging system by creating a
|
Do basic configuration for the logging system by creating a
|
||||||
StreamHandler with a default Formatter and adding it to the
|
StreamHandler with a default Formatter and adding it to the
|
||||||
@ -1098,7 +1098,7 @@ def basicConfig():
|
|||||||
"""
|
"""
|
||||||
if len(root.handlers) == 0:
|
if len(root.handlers) == 0:
|
||||||
hdlr = StreamHandler()
|
hdlr = StreamHandler()
|
||||||
fmt = Formatter(BASIC_FORMAT)
|
fmt = Formatter(format)
|
||||||
hdlr.setFormatter(fmt)
|
hdlr.setFormatter(fmt)
|
||||||
root.addHandler(hdlr)
|
root.addHandler(hdlr)
|
||||||
|
|
||||||
|
@ -32,7 +32,7 @@ from urlparse import urljoin
|
|||||||
|
|
||||||
parser = ConfigParser()
|
parser = ConfigParser()
|
||||||
|
|
||||||
planet_predefined_options = []
|
planet_predefined_options = ['filters']
|
||||||
|
|
||||||
def __init__():
|
def __init__():
|
||||||
"""define the struture of an ini file"""
|
"""define the struture of an ini file"""
|
||||||
@ -43,6 +43,8 @@ def __init__():
|
|||||||
if section and parser.has_option(section, option):
|
if section and parser.has_option(section, option):
|
||||||
return parser.get(section, option)
|
return parser.get(section, option)
|
||||||
elif parser.has_option('Planet', option):
|
elif parser.has_option('Planet', option):
|
||||||
|
if option == 'log_format':
|
||||||
|
return parser.get('Planet', option, raw=True)
|
||||||
return parser.get('Planet', option)
|
return parser.get('Planet', option)
|
||||||
else:
|
else:
|
||||||
return default
|
return default
|
||||||
@ -69,8 +71,8 @@ def __init__():
|
|||||||
planet_predefined_options.append(name)
|
planet_predefined_options.append(name)
|
||||||
|
|
||||||
# define a list planet-level variable
|
# define a list planet-level variable
|
||||||
def define_planet_list(name):
|
def define_planet_list(name, default=''):
|
||||||
setattr(config, name, lambda : expand(get(None,name,'')))
|
setattr(config, name, lambda : expand(get(None,name,default)))
|
||||||
planet_predefined_options.append(name)
|
planet_predefined_options.append(name)
|
||||||
|
|
||||||
# define a string template-level variable
|
# define a string template-level variable
|
||||||
@ -88,6 +90,7 @@ def __init__():
|
|||||||
define_planet('link', '')
|
define_planet('link', '')
|
||||||
define_planet('cache_directory', "cache")
|
define_planet('cache_directory', "cache")
|
||||||
define_planet('log_level', "WARNING")
|
define_planet('log_level', "WARNING")
|
||||||
|
define_planet('log_format', "%(levelname)s:%(name)s:%(message)s")
|
||||||
define_planet('feed_timeout', 20)
|
define_planet('feed_timeout', 20)
|
||||||
define_planet('date_format', "%B %d, %Y %I:%M %p")
|
define_planet('date_format', "%B %d, %Y %I:%M %p")
|
||||||
define_planet('new_date_format', "%B %d, %Y")
|
define_planet('new_date_format', "%B %d, %Y")
|
||||||
@ -100,7 +103,7 @@ def __init__():
|
|||||||
|
|
||||||
define_planet_list('template_files')
|
define_planet_list('template_files')
|
||||||
define_planet_list('bill_of_materials')
|
define_planet_list('bill_of_materials')
|
||||||
define_planet_list('template_directories')
|
define_planet_list('template_directories', '.')
|
||||||
define_planet_list('filter_directories')
|
define_planet_list('filter_directories')
|
||||||
|
|
||||||
# template options
|
# template options
|
||||||
@ -123,7 +126,7 @@ def load(config_file):
|
|||||||
|
|
||||||
import config, planet
|
import config, planet
|
||||||
from planet import opml, foaf
|
from planet import opml, foaf
|
||||||
log = planet.getLogger(config.log_level())
|
log = planet.getLogger(config.log_level(),config.log_format())
|
||||||
|
|
||||||
# Theme support
|
# Theme support
|
||||||
theme = config.output_theme()
|
theme = config.output_theme()
|
||||||
@ -146,10 +149,11 @@ def load(config_file):
|
|||||||
|
|
||||||
# complete search list for theme directories
|
# complete search list for theme directories
|
||||||
dirs += [os.path.join(theme_dir,dir) for dir in
|
dirs += [os.path.join(theme_dir,dir) for dir in
|
||||||
config.template_directories()]
|
config.template_directories() if dir not in dirs]
|
||||||
|
|
||||||
# merge configurations, allowing current one to override theme
|
# merge configurations, allowing current one to override theme
|
||||||
template_files = config.template_files()
|
template_files = config.template_files()
|
||||||
|
parser.set('Planet','template_files','')
|
||||||
parser.read(config_file)
|
parser.read(config_file)
|
||||||
for file in config.bill_of_materials():
|
for file in config.bill_of_materials():
|
||||||
if not file in bom: bom.append(file)
|
if not file in bom: bom.append(file)
|
||||||
@ -178,6 +182,12 @@ def load(config_file):
|
|||||||
opml.opml2config(data, cached_config)
|
opml.opml2config(data, cached_config)
|
||||||
elif content_type(list).find('foaf')>=0:
|
elif content_type(list).find('foaf')>=0:
|
||||||
foaf.foaf2config(data, cached_config)
|
foaf.foaf2config(data, cached_config)
|
||||||
|
else:
|
||||||
|
from planet import shell
|
||||||
|
import StringIO
|
||||||
|
cached_config.readfp(StringIO.StringIO(shell.run(
|
||||||
|
content_type(list), data.getvalue(), mode="filter")))
|
||||||
|
|
||||||
if cached_config.sections() in [[], [list]]:
|
if cached_config.sections() in [[], [list]]:
|
||||||
raise Exception
|
raise Exception
|
||||||
|
|
||||||
@ -314,7 +324,7 @@ def reading_lists():
|
|||||||
for section in parser.sections():
|
for section in parser.sections():
|
||||||
if parser.has_option(section, 'content_type'):
|
if parser.has_option(section, 'content_type'):
|
||||||
type = parser.get(section, 'content_type')
|
type = parser.get(section, 'content_type')
|
||||||
if type.find('opml')>=0 or type.find('foaf')>=0:
|
if type.find('opml')>=0 or type.find('foaf')>=0 or type.find('.')>=0:
|
||||||
result.append(section)
|
result.append(section)
|
||||||
return result
|
return result
|
||||||
|
|
||||||
@ -328,7 +338,8 @@ def filters(section=None):
|
|||||||
|
|
||||||
def planet_options():
|
def planet_options():
|
||||||
""" dictionary of planet wide options"""
|
""" dictionary of planet wide options"""
|
||||||
return dict(map(lambda opt: (opt, parser.get('Planet',opt)),
|
return dict(map(lambda opt: (opt,
|
||||||
|
parser.get('Planet', opt, raw=(opt=="log_format"))),
|
||||||
parser.options('Planet')))
|
parser.options('Planet')))
|
||||||
|
|
||||||
def feed_options(section):
|
def feed_options(section):
|
||||||
|
@ -11,7 +11,7 @@ Recommended: Python 2.3 or later
|
|||||||
Recommended: CJKCodecs and iconv_codec <http://cjkpython.i18n.org/>
|
Recommended: CJKCodecs and iconv_codec <http://cjkpython.i18n.org/>
|
||||||
"""
|
"""
|
||||||
|
|
||||||
__version__ = "4.2-pre-" + "$Revision: 1.142 $"[11:16] + "-cvs"
|
__version__ = "4.2-pre-" + "$Revision: 1.144 $"[11:16] + "-cvs"
|
||||||
__license__ = """Copyright (c) 2002-2006, Mark Pilgrim, All rights reserved.
|
__license__ = """Copyright (c) 2002-2006, Mark Pilgrim, All rights reserved.
|
||||||
|
|
||||||
Redistribution and use in source and binary forms, with or without modification,
|
Redistribution and use in source and binary forms, with or without modification,
|
||||||
@ -218,6 +218,9 @@ class FeedParserDict(UserDict):
|
|||||||
def __getitem__(self, key):
|
def __getitem__(self, key):
|
||||||
if key == 'category':
|
if key == 'category':
|
||||||
return UserDict.__getitem__(self, 'tags')[0]['term']
|
return UserDict.__getitem__(self, 'tags')[0]['term']
|
||||||
|
if key == 'enclosures':
|
||||||
|
norel = lambda link: FeedParserDict([(name,value) for (name,value) in link.items() if name!='rel'])
|
||||||
|
return [norel(link) for link in UserDict.__getitem__(self, 'links') if link['rel']=='enclosure']
|
||||||
if key == 'categories':
|
if key == 'categories':
|
||||||
return [(tag['scheme'], tag['term']) for tag in UserDict.__getitem__(self, 'tags')]
|
return [(tag['scheme'], tag['term']) for tag in UserDict.__getitem__(self, 'tags')]
|
||||||
realkey = self.keymap.get(key, key)
|
realkey = self.keymap.get(key, key)
|
||||||
@ -1303,15 +1306,15 @@ class _FeedParserMixin:
|
|||||||
attrsD.setdefault('type', 'application/atom+xml')
|
attrsD.setdefault('type', 'application/atom+xml')
|
||||||
else:
|
else:
|
||||||
attrsD.setdefault('type', 'text/html')
|
attrsD.setdefault('type', 'text/html')
|
||||||
|
context = self._getContext()
|
||||||
attrsD = self._itsAnHrefDamnIt(attrsD)
|
attrsD = self._itsAnHrefDamnIt(attrsD)
|
||||||
if attrsD.has_key('href'):
|
if attrsD.has_key('href'):
|
||||||
attrsD['href'] = self.resolveURI(attrsD['href'])
|
attrsD['href'] = self.resolveURI(attrsD['href'])
|
||||||
|
if attrsD.get('rel')=='enclosure' and not context.get('id'):
|
||||||
|
context['id'] = attrsD.get('href')
|
||||||
expectingText = self.infeed or self.inentry or self.insource
|
expectingText = self.infeed or self.inentry or self.insource
|
||||||
context = self._getContext()
|
|
||||||
context.setdefault('links', [])
|
context.setdefault('links', [])
|
||||||
context['links'].append(FeedParserDict(attrsD))
|
context['links'].append(FeedParserDict(attrsD))
|
||||||
if attrsD['rel'] == 'enclosure':
|
|
||||||
self._start_enclosure(attrsD)
|
|
||||||
if attrsD.has_key('href'):
|
if attrsD.has_key('href'):
|
||||||
expectingText = 0
|
expectingText = 0
|
||||||
if (attrsD.get('rel') == 'alternate') and (self.mapContentType(attrsD.get('type')) in self.html_types):
|
if (attrsD.get('rel') == 'alternate') and (self.mapContentType(attrsD.get('type')) in self.html_types):
|
||||||
@ -1357,6 +1360,7 @@ class _FeedParserMixin:
|
|||||||
self._start_content(attrsD)
|
self._start_content(attrsD)
|
||||||
else:
|
else:
|
||||||
self.pushContent('description', attrsD, 'text/html', self.infeed or self.inentry or self.insource)
|
self.pushContent('description', attrsD, 'text/html', self.infeed or self.inentry or self.insource)
|
||||||
|
_start_dc_description = _start_description
|
||||||
|
|
||||||
def _start_abstract(self, attrsD):
|
def _start_abstract(self, attrsD):
|
||||||
self.pushContent('description', attrsD, 'text/plain', self.infeed or self.inentry or self.insource)
|
self.pushContent('description', attrsD, 'text/plain', self.infeed or self.inentry or self.insource)
|
||||||
@ -1368,6 +1372,7 @@ class _FeedParserMixin:
|
|||||||
value = self.popContent('description')
|
value = self.popContent('description')
|
||||||
self._summaryKey = None
|
self._summaryKey = None
|
||||||
_end_abstract = _end_description
|
_end_abstract = _end_description
|
||||||
|
_end_dc_description = _end_description
|
||||||
|
|
||||||
def _start_info(self, attrsD):
|
def _start_info(self, attrsD):
|
||||||
self.pushContent('info', attrsD, 'text/plain', 1)
|
self.pushContent('info', attrsD, 'text/plain', 1)
|
||||||
@ -1427,7 +1432,8 @@ class _FeedParserMixin:
|
|||||||
def _start_enclosure(self, attrsD):
|
def _start_enclosure(self, attrsD):
|
||||||
attrsD = self._itsAnHrefDamnIt(attrsD)
|
attrsD = self._itsAnHrefDamnIt(attrsD)
|
||||||
context = self._getContext()
|
context = self._getContext()
|
||||||
context.setdefault('enclosures', []).append(FeedParserDict(attrsD))
|
attrsD['rel']='enclosure'
|
||||||
|
context.setdefault('links', []).append(FeedParserDict(attrsD))
|
||||||
href = attrsD.get('href')
|
href = attrsD.get('href')
|
||||||
if href and not context.get('id'):
|
if href and not context.get('id'):
|
||||||
context['id'] = href
|
context['id'] = href
|
||||||
|
97
planet/idindex.py
Normal file
97
planet/idindex.py
Normal file
@ -0,0 +1,97 @@
|
|||||||
|
from glob import glob
|
||||||
|
import os, sys, dbhash
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
rootdir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||||
|
sys.path.insert(0, rootdir)
|
||||||
|
|
||||||
|
from planet.spider import filename
|
||||||
|
from planet import config
|
||||||
|
|
||||||
|
def open():
|
||||||
|
try:
|
||||||
|
cache = config.cache_directory()
|
||||||
|
index=os.path.join(cache,'index')
|
||||||
|
if not os.path.exists(index): return None
|
||||||
|
return dbhash.open(filename(index, 'id'),'w')
|
||||||
|
except Exception, e:
|
||||||
|
if e.__class__.__name__ == 'DBError': e = e.args[-1]
|
||||||
|
from planet import logger as log
|
||||||
|
log.error(str(e))
|
||||||
|
|
||||||
|
def destroy():
|
||||||
|
from planet import logger as log
|
||||||
|
cache = config.cache_directory()
|
||||||
|
index=os.path.join(cache,'index')
|
||||||
|
if not os.path.exists(index): return None
|
||||||
|
idindex = filename(index, 'id')
|
||||||
|
if os.path.exists(idindex): os.unlink(idindex)
|
||||||
|
os.removedirs(index)
|
||||||
|
log.info(idindex + " deleted")
|
||||||
|
|
||||||
|
def create():
|
||||||
|
from planet import logger as log
|
||||||
|
cache = config.cache_directory()
|
||||||
|
index=os.path.join(cache,'index')
|
||||||
|
if not os.path.exists(index): os.makedirs(index)
|
||||||
|
index = dbhash.open(filename(index, 'id'),'c')
|
||||||
|
|
||||||
|
try:
|
||||||
|
import libxml2
|
||||||
|
except:
|
||||||
|
libxml2 = False
|
||||||
|
from xml.dom import minidom
|
||||||
|
|
||||||
|
for file in glob(cache+"/*"):
|
||||||
|
if os.path.isdir(file):
|
||||||
|
continue
|
||||||
|
elif libxml2:
|
||||||
|
try:
|
||||||
|
doc = libxml2.parseFile(file)
|
||||||
|
ctxt = doc.xpathNewContext()
|
||||||
|
ctxt.xpathRegisterNs('atom','http://www.w3.org/2005/Atom')
|
||||||
|
entry = ctxt.xpathEval('/atom:entry/atom:id')
|
||||||
|
source = ctxt.xpathEval('/atom:entry/atom:source/atom:id')
|
||||||
|
if entry and source:
|
||||||
|
index[filename('',entry[0].content)] = source[0].content
|
||||||
|
doc.freeDoc()
|
||||||
|
except:
|
||||||
|
log.error(file)
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
doc = minidom.parse(file)
|
||||||
|
doc.normalize()
|
||||||
|
ids = doc.getElementsByTagName('id')
|
||||||
|
entry = [e for e in ids if e.parentNode.nodeName == 'entry']
|
||||||
|
source = [e for e in ids if e.parentNode.nodeName == 'source']
|
||||||
|
if entry and source:
|
||||||
|
index[filename('',entry[0].childNodes[0].nodeValue)] = \
|
||||||
|
source[0].childNodes[0].nodeValue
|
||||||
|
doc.freeDoc()
|
||||||
|
except:
|
||||||
|
log.error(file)
|
||||||
|
|
||||||
|
log.info(str(len(index.keys())) + " entries indexed")
|
||||||
|
index.close()
|
||||||
|
|
||||||
|
return open()
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
print 'Usage: %s [-c|-d]' % sys.argv[0]
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
config.load(sys.argv[1])
|
||||||
|
|
||||||
|
if len(sys.argv) > 2 and sys.argv[2] == '-c':
|
||||||
|
create()
|
||||||
|
elif len(sys.argv) > 2 and sys.argv[2] == '-d':
|
||||||
|
destroy()
|
||||||
|
else:
|
||||||
|
from planet import logger as log
|
||||||
|
index = open()
|
||||||
|
if index:
|
||||||
|
log.info(str(len(index.keys())) + " entries indexed")
|
||||||
|
index.close()
|
||||||
|
else:
|
||||||
|
log.info("no entries indexed")
|
@ -48,6 +48,10 @@ class OpmlParser(ContentHandler,SGMLParser):
|
|||||||
# this is an entry in a subscription list, but some leave this
|
# this is an entry in a subscription list, but some leave this
|
||||||
# attribute off, and others have placed 'atom' in here
|
# attribute off, and others have placed 'atom' in here
|
||||||
if attrs.has_key('type'):
|
if attrs.has_key('type'):
|
||||||
|
if attrs['type'] == 'link' and not attrs.has_key('url'):
|
||||||
|
# Auto-correct WordPress link manager OPML files
|
||||||
|
attrs = dict(attrs.items())
|
||||||
|
attrs['type'] = 'rss'
|
||||||
if attrs['type'].lower() not in['rss','atom']: return
|
if attrs['type'].lower() not in['rss','atom']: return
|
||||||
|
|
||||||
# The feed itself is supposed to be in an attribute named 'xmlUrl'
|
# The feed itself is supposed to be in an attribute named 'xmlUrl'
|
||||||
|
@ -25,7 +25,11 @@ illegal_xml_chars = re.compile("[\x01-\x08\x0B\x0C\x0E-\x1F]")
|
|||||||
def createTextElement(parent, name, value):
|
def createTextElement(parent, name, value):
|
||||||
""" utility function to create a child element with the specified text"""
|
""" utility function to create a child element with the specified text"""
|
||||||
if not value: return
|
if not value: return
|
||||||
if isinstance(value,str): value=value.decode('utf-8')
|
if isinstance(value,str):
|
||||||
|
try:
|
||||||
|
value=value.decode('utf-8')
|
||||||
|
except:
|
||||||
|
value=value.decode('iso-8859-1')
|
||||||
xdoc = parent.ownerDocument
|
xdoc = parent.ownerDocument
|
||||||
xelement = xdoc.createElement(name)
|
xelement = xdoc.createElement(name)
|
||||||
xelement.appendChild(xdoc.createTextNode(value))
|
xelement.appendChild(xdoc.createTextNode(value))
|
||||||
@ -100,6 +104,8 @@ def links(xentry, entry):
|
|||||||
xlink.setAttribute('type', link.get('type'))
|
xlink.setAttribute('type', link.get('type'))
|
||||||
if link.has_key('rel'):
|
if link.has_key('rel'):
|
||||||
xlink.setAttribute('rel', link.get('rel',None))
|
xlink.setAttribute('rel', link.get('rel',None))
|
||||||
|
if link.has_key('length'):
|
||||||
|
xlink.setAttribute('length', link.get('length'))
|
||||||
xentry.appendChild(xlink)
|
xentry.appendChild(xlink)
|
||||||
|
|
||||||
def date(xentry, name, parsed):
|
def date(xentry, name, parsed):
|
||||||
@ -157,7 +163,7 @@ def content(xentry, name, detail, bozo):
|
|||||||
xcontent.setAttribute('type', 'html')
|
xcontent.setAttribute('type', 'html')
|
||||||
xcontent.appendChild(xdoc.createTextNode(detail.value.decode('utf-8')))
|
xcontent.appendChild(xdoc.createTextNode(detail.value.decode('utf-8')))
|
||||||
|
|
||||||
if detail.language:
|
if detail.get("language"):
|
||||||
xcontent.setAttribute('xml:lang', detail.language)
|
xcontent.setAttribute('xml:lang', detail.language)
|
||||||
|
|
||||||
xentry.appendChild(xcontent)
|
xentry.appendChild(xcontent)
|
||||||
@ -170,13 +176,13 @@ def source(xsource, source, bozo, format):
|
|||||||
createTextElement(xsource, 'icon', source.get('icon', None))
|
createTextElement(xsource, 'icon', source.get('icon', None))
|
||||||
createTextElement(xsource, 'logo', source.get('logo', None))
|
createTextElement(xsource, 'logo', source.get('logo', None))
|
||||||
|
|
||||||
|
if not source.has_key('logo') and source.has_key('image'):
|
||||||
|
createTextElement(xsource, 'logo', source.image.get('href',None))
|
||||||
|
|
||||||
for tag in source.get('tags',[]):
|
for tag in source.get('tags',[]):
|
||||||
category(xsource, tag)
|
category(xsource, tag)
|
||||||
|
|
||||||
author_detail = source.get('author_detail',{})
|
author(xsource, 'author', source.get('author_detail',{}))
|
||||||
if not author_detail.has_key('name') and source.has_key('planet_name'):
|
|
||||||
author_detail['name'] = source['planet_name']
|
|
||||||
author(xsource, 'author', author_detail)
|
|
||||||
for contributor in source.get('contributors',[]):
|
for contributor in source.get('contributors',[]):
|
||||||
author(xsource, 'contributor', contributor)
|
author(xsource, 'contributor', contributor)
|
||||||
|
|
||||||
@ -204,6 +210,8 @@ def reconstitute(feed, entry):
|
|||||||
|
|
||||||
if entry.has_key('language'):
|
if entry.has_key('language'):
|
||||||
xentry.setAttribute('xml:lang', entry.language)
|
xentry.setAttribute('xml:lang', entry.language)
|
||||||
|
elif feed.feed.has_key('language'):
|
||||||
|
xentry.setAttribute('xml:lang', feed.feed.language)
|
||||||
|
|
||||||
id(xentry, entry)
|
id(xentry, entry)
|
||||||
links(xentry, entry)
|
links(xentry, entry)
|
||||||
@ -217,18 +225,46 @@ def reconstitute(feed, entry):
|
|||||||
content(xentry, 'content', entry.get('content',[None])[0], bozo)
|
content(xentry, 'content', entry.get('content',[None])[0], bozo)
|
||||||
content(xentry, 'rights', entry.get('rights_detail',None), bozo)
|
content(xentry, 'rights', entry.get('rights_detail',None), bozo)
|
||||||
|
|
||||||
date(xentry, 'updated', entry.get('updated_parsed',time.gmtime()))
|
date(xentry, 'updated', entry_updated(feed.feed, entry, time.gmtime()))
|
||||||
date(xentry, 'published', entry.get('published_parsed',None))
|
date(xentry, 'published', entry.get('published_parsed',None))
|
||||||
|
|
||||||
for tag in entry.get('tags',[]):
|
for tag in entry.get('tags',[]):
|
||||||
category(xentry, tag)
|
category(xentry, tag)
|
||||||
|
|
||||||
author(xentry, 'author', entry.get('author_detail',None))
|
# known, simple text extensions
|
||||||
|
for ns,name in [('feedburner','origlink')]:
|
||||||
|
if entry.has_key('%s_%s' % (ns,name)) and \
|
||||||
|
feed.namespaces.has_key(ns):
|
||||||
|
xoriglink = createTextElement(xentry, '%s:%s' % (ns,name),
|
||||||
|
entry['%s_%s' % (ns,name)])
|
||||||
|
xoriglink.setAttribute('xmlns:%s' % ns, feed.namespaces[ns])
|
||||||
|
|
||||||
|
author_detail = entry.get('author_detail',{})
|
||||||
|
if author_detail and not author_detail.has_key('name') and \
|
||||||
|
feed.feed.has_key('planet_name'):
|
||||||
|
author_detail['name'] = feed.feed['planet_name']
|
||||||
|
author(xentry, 'author', author_detail)
|
||||||
for contributor in entry.get('contributors',[]):
|
for contributor in entry.get('contributors',[]):
|
||||||
author(xentry, 'contributor', contributor)
|
author(xentry, 'contributor', contributor)
|
||||||
|
|
||||||
xsource = xdoc.createElement('source')
|
xsource = xdoc.createElement('source')
|
||||||
source(xsource, entry.get('source') or feed.feed, bozo, feed.version)
|
src = entry.get('source') or feed.feed
|
||||||
|
src_author = src.get('author_detail',{})
|
||||||
|
if (not author_detail or not author_detail.has_key('name')) and \
|
||||||
|
not src_author.has_key('name') and feed.feed.has_key('planet_name'):
|
||||||
|
if src_author: src_author = src_author.__class__(src_author.copy())
|
||||||
|
src['author_detail'] = src_author
|
||||||
|
src_author['name'] = feed.feed['planet_name']
|
||||||
|
source(xsource, src, bozo, feed.version)
|
||||||
xentry.appendChild(xsource)
|
xentry.appendChild(xsource)
|
||||||
|
|
||||||
return xdoc
|
return xdoc
|
||||||
|
|
||||||
|
def entry_updated(feed, entry, default = None):
|
||||||
|
chks = ((entry, 'updated_parsed'),
|
||||||
|
(entry, 'published_parsed'),
|
||||||
|
(feed, 'updated_parsed'),)
|
||||||
|
for node, field in chks:
|
||||||
|
if node.has_key(field) and node[field]:
|
||||||
|
return node[field]
|
||||||
|
return default
|
||||||
|
@ -6,13 +6,21 @@ logged_modes = []
|
|||||||
|
|
||||||
def run(template_file, doc, mode='template'):
|
def run(template_file, doc, mode='template'):
|
||||||
""" select a template module based on file extension and execute it """
|
""" select a template module based on file extension and execute it """
|
||||||
log = planet.getLogger(planet.config.log_level())
|
log = planet.getLogger(planet.config.log_level(),planet.config.log_format())
|
||||||
|
|
||||||
if mode == 'template':
|
if mode == 'template':
|
||||||
dirs = planet.config.template_directories()
|
dirs = planet.config.template_directories()
|
||||||
else:
|
else:
|
||||||
dirs = planet.config.filter_directories()
|
dirs = planet.config.filter_directories()
|
||||||
|
|
||||||
|
# parse out "extra" options
|
||||||
|
if template_file.find('?') < 0:
|
||||||
|
extra_options = {}
|
||||||
|
else:
|
||||||
|
import cgi
|
||||||
|
template_file, extra_options = template_file.split('?',1)
|
||||||
|
extra_options = dict(cgi.parse_qsl(extra_options))
|
||||||
|
|
||||||
# see if the template can be located
|
# see if the template can be located
|
||||||
for template_dir in dirs:
|
for template_dir in dirs:
|
||||||
template_resolved = os.path.join(template_dir, template_file)
|
template_resolved = os.path.join(template_dir, template_file)
|
||||||
@ -43,6 +51,7 @@ def run(template_file, doc, mode='template'):
|
|||||||
|
|
||||||
# Execute the shell module
|
# Execute the shell module
|
||||||
options = planet.config.template_options(template_file)
|
options = planet.config.template_options(template_file)
|
||||||
|
options.update(extra_options)
|
||||||
log.debug("Processing %s %s using %s", mode,
|
log.debug("Processing %s %s using %s", mode,
|
||||||
os.path.realpath(template_resolved), module_name)
|
os.path.realpath(template_resolved), module_name)
|
||||||
if mode == 'filter':
|
if mode == 'filter':
|
||||||
|
@ -97,6 +97,9 @@ Items = [
|
|||||||
['date_822', Rfc822, 'updated_parsed'],
|
['date_822', Rfc822, 'updated_parsed'],
|
||||||
['date_iso', Rfc3399, 'published_parsed'],
|
['date_iso', Rfc3399, 'published_parsed'],
|
||||||
['date_iso', Rfc3399, 'updated_parsed'],
|
['date_iso', Rfc3399, 'updated_parsed'],
|
||||||
|
['enclosure_href', String, 'links', {'rel': 'enclosure'}, 'href'],
|
||||||
|
['enclosure_length', String, 'links', {'rel': 'enclosure'}, 'length'],
|
||||||
|
['enclosure_type', String, 'links', {'rel': 'enclosure'}, 'type'],
|
||||||
['id', String, 'id'],
|
['id', String, 'id'],
|
||||||
['link', String, 'links', {'rel': 'alternate'}, 'href'],
|
['link', String, 'links', {'rel': 'alternate'}, 'href'],
|
||||||
['new_channel', String, 'id'],
|
['new_channel', String, 'id'],
|
||||||
@ -190,6 +193,13 @@ def template_info(source):
|
|||||||
for entry in data.entries:
|
for entry in data.entries:
|
||||||
output['Items'].append(tmpl_mapper(entry, Items))
|
output['Items'].append(tmpl_mapper(entry, Items))
|
||||||
|
|
||||||
|
# synthesize isPermaLink attribute
|
||||||
|
for item in output['Items']:
|
||||||
|
if item.get('id') == item.get('link'):
|
||||||
|
item['guid_isPermaLink']='true'
|
||||||
|
else:
|
||||||
|
item['guid_isPermaLink']='false'
|
||||||
|
|
||||||
# feed level information
|
# feed level information
|
||||||
output['generator'] = config.generator_uri()
|
output['generator'] = config.generator_uri()
|
||||||
output['name'] = config.name()
|
output['name'] = config.name()
|
||||||
|
@ -1,5 +1,19 @@
|
|||||||
import os
|
import os
|
||||||
|
|
||||||
|
def quote(string, apos):
|
||||||
|
""" quote a string so that it can be passed as a parameter """
|
||||||
|
if type(string) == unicode:
|
||||||
|
string=string.encode('utf-8')
|
||||||
|
if apos.startswith("\\"): string.replace('\\','\\\\')
|
||||||
|
|
||||||
|
if string.find("'")<0:
|
||||||
|
return "'" + string + "'"
|
||||||
|
elif string.find("'")<0:
|
||||||
|
return '"' + string + '"'
|
||||||
|
else:
|
||||||
|
# unclear how to quote strings with both types of quotes for libxslt
|
||||||
|
return "'" + string.replace("'",apos) + "'"
|
||||||
|
|
||||||
def run(script, doc, output_file=None, options={}):
|
def run(script, doc, output_file=None, options={}):
|
||||||
""" process an XSLT stylesheet """
|
""" process an XSLT stylesheet """
|
||||||
|
|
||||||
@ -12,6 +26,22 @@ def run(script, doc, output_file=None, options={}):
|
|||||||
except:
|
except:
|
||||||
# otherwise, use the command line interface
|
# otherwise, use the command line interface
|
||||||
dom = None
|
dom = None
|
||||||
|
|
||||||
|
# do it
|
||||||
|
result = None
|
||||||
|
if dom:
|
||||||
|
styledoc = libxml2.parseFile(script)
|
||||||
|
style = libxslt.parseStylesheetDoc(styledoc)
|
||||||
|
for key in options.keys():
|
||||||
|
options[key] = quote(options[key], apos="\xe2\x80\x99")
|
||||||
|
output = style.applyStylesheet(dom, options)
|
||||||
|
if output_file:
|
||||||
|
style.saveResultToFilename(output_file, output, 0)
|
||||||
|
else:
|
||||||
|
result = str(output)
|
||||||
|
style.freeStylesheet()
|
||||||
|
output.freeDoc()
|
||||||
|
elif output_file:
|
||||||
import warnings
|
import warnings
|
||||||
if hasattr(warnings, 'simplefilter'):
|
if hasattr(warnings, 'simplefilter'):
|
||||||
warnings.simplefilter('ignore', RuntimeWarning)
|
warnings.simplefilter('ignore', RuntimeWarning)
|
||||||
@ -20,16 +50,28 @@ def run(script, doc, output_file=None, options={}):
|
|||||||
file.write(doc)
|
file.write(doc)
|
||||||
file.close()
|
file.close()
|
||||||
|
|
||||||
# do it
|
cmdopts = []
|
||||||
if dom:
|
for key,value in options.items():
|
||||||
styledoc = libxml2.parseFile(script)
|
cmdopts += ['--stringparam', key, quote(value, apos=r"\'")]
|
||||||
style = libxslt.parseStylesheetDoc(styledoc)
|
|
||||||
result = style.applyStylesheet(dom, None)
|
os.system('xsltproc %s %s %s > %s' %
|
||||||
style.saveResultToFilename(output_file, result, 0)
|
(' '.join(cmdopts), script, docfile, output_file))
|
||||||
style.freeStylesheet()
|
os.unlink(docfile)
|
||||||
result.freeDoc()
|
|
||||||
else:
|
else:
|
||||||
os.system('xsltproc %s %s > %s' % (script, docfile, output_file))
|
import sys
|
||||||
|
from subprocess import Popen, PIPE
|
||||||
|
|
||||||
|
options = sum([['--stringparam', key, value]
|
||||||
|
for key,value in options.items()], [])
|
||||||
|
|
||||||
|
proc = Popen(['xsltproc'] + options + [script, '-'],
|
||||||
|
stdin=PIPE, stdout=PIPE, stderr=PIPE)
|
||||||
|
|
||||||
|
result, stderr = proc.communicate(doc)
|
||||||
|
if stderr:
|
||||||
|
import planet
|
||||||
|
planet.logger.error(stderr)
|
||||||
|
|
||||||
if dom: dom.freeDoc()
|
if dom: dom.freeDoc()
|
||||||
if docfile: os.unlink(docfile)
|
|
||||||
|
return result
|
||||||
|
@ -11,10 +11,12 @@ import planet, config, feedparser, reconstitute, shell
|
|||||||
|
|
||||||
# Regular expressions to sanitise cache filenames
|
# Regular expressions to sanitise cache filenames
|
||||||
re_url_scheme = re.compile(r'^\w+:/*(\w+:|www\.)?')
|
re_url_scheme = re.compile(r'^\w+:/*(\w+:|www\.)?')
|
||||||
re_slash = re.compile(r'[?/:]+')
|
re_slash = re.compile(r'[?/:|]+')
|
||||||
re_initial_cruft = re.compile(r'^[,.]*')
|
re_initial_cruft = re.compile(r'^[,.]*')
|
||||||
re_final_cruft = re.compile(r'[,.]*$')
|
re_final_cruft = re.compile(r'[,.]*$')
|
||||||
|
|
||||||
|
index = True
|
||||||
|
|
||||||
def filename(directory, filename):
|
def filename(directory, filename):
|
||||||
"""Return a filename suitable for the cache.
|
"""Return a filename suitable for the cache.
|
||||||
|
|
||||||
@ -29,6 +31,8 @@ def filename(directory, filename):
|
|||||||
filename=filename.encode('idna')
|
filename=filename.encode('idna')
|
||||||
except:
|
except:
|
||||||
pass
|
pass
|
||||||
|
if isinstance(filename,unicode):
|
||||||
|
filename=filename.encode('utf-8')
|
||||||
filename = re_url_scheme.sub("", filename)
|
filename = re_url_scheme.sub("", filename)
|
||||||
filename = re_slash.sub(",", filename)
|
filename = re_slash.sub(",", filename)
|
||||||
filename = re_initial_cruft.sub("", filename)
|
filename = re_initial_cruft.sub("", filename)
|
||||||
@ -59,10 +63,16 @@ def scrub(feed, data):
|
|||||||
|
|
||||||
# some data is not trustworthy
|
# some data is not trustworthy
|
||||||
for tag in config.ignore_in_feed(feed).split():
|
for tag in config.ignore_in_feed(feed).split():
|
||||||
|
if tag.find('lang')>=0: tag='language'
|
||||||
|
if data.feed.has_key(tag): del data.feed[tag]
|
||||||
for entry in data.entries:
|
for entry in data.entries:
|
||||||
if entry.has_key(tag): del entry[tag]
|
if entry.has_key(tag): del entry[tag]
|
||||||
if entry.has_key(tag + "_detail"): del entry[tag + "_detail"]
|
if entry.has_key(tag + "_detail"): del entry[tag + "_detail"]
|
||||||
if entry.has_key(tag + "_parsed"): del entry[tag + "_parsed"]
|
if entry.has_key(tag + "_parsed"): del entry[tag + "_parsed"]
|
||||||
|
for key in entry.keys():
|
||||||
|
if not key.endswith('_detail'): continue
|
||||||
|
for detail in entry[key].copy():
|
||||||
|
if detail == tag: del entry[key][detail]
|
||||||
|
|
||||||
# adjust title types
|
# adjust title types
|
||||||
if config.title_type(feed):
|
if config.title_type(feed):
|
||||||
@ -107,15 +117,22 @@ def scrub(feed, data):
|
|||||||
source.author_detail['name'] = \
|
source.author_detail['name'] = \
|
||||||
str(stripHtml(source.author_detail.name))
|
str(stripHtml(source.author_detail.name))
|
||||||
|
|
||||||
def spiderFeed(feed):
|
def spiderFeed(feed, only_if_new=0):
|
||||||
""" Spider (fetch) a single feed """
|
""" Spider (fetch) a single feed """
|
||||||
log = planet.logger
|
log = planet.logger
|
||||||
|
|
||||||
# read cached feed info
|
# read cached feed info
|
||||||
sources = config.cache_sources_directory()
|
sources = config.cache_sources_directory()
|
||||||
|
if not os.path.exists(sources):
|
||||||
|
os.makedirs(sources, 0700)
|
||||||
feed_source = filename(sources, feed)
|
feed_source = filename(sources, feed)
|
||||||
feed_info = feedparser.parse(feed_source)
|
feed_info = feedparser.parse(feed_source)
|
||||||
if feed_info.feed.get('planet_http_status',None) == '410': return
|
if feed_info.feed and only_if_new:
|
||||||
|
log.info("Feed %s already in cache", feed)
|
||||||
|
return
|
||||||
|
if feed_info.feed.get('planet_http_status',None) == '410':
|
||||||
|
log.info("Feed %s gone", feed)
|
||||||
|
return
|
||||||
|
|
||||||
# read feed itself
|
# read feed itself
|
||||||
modified = None
|
modified = None
|
||||||
@ -142,6 +159,10 @@ def spiderFeed(feed):
|
|||||||
# process based on the HTTP status code
|
# process based on the HTTP status code
|
||||||
if data.status == 200 and data.has_key("url"):
|
if data.status == 200 and data.has_key("url"):
|
||||||
data.feed['planet_http_location'] = data.url
|
data.feed['planet_http_location'] = data.url
|
||||||
|
if feed == data.url:
|
||||||
|
log.info("Updating feed %s", feed)
|
||||||
|
else:
|
||||||
|
log.info("Updating feed %s @ %s", feed, data.url)
|
||||||
elif data.status == 301 and data.has_key("entries") and len(data.entries)>0:
|
elif data.status == 301 and data.has_key("entries") and len(data.entries)>0:
|
||||||
log.warning("Feed has moved from <%s> to <%s>", feed, data.url)
|
log.warning("Feed has moved from <%s> to <%s>", feed, data.url)
|
||||||
data.feed['planet_http_location'] = data.url
|
data.feed['planet_http_location'] = data.url
|
||||||
@ -171,6 +192,7 @@ def spiderFeed(feed):
|
|||||||
if not data.version and feed_info.version:
|
if not data.version and feed_info.version:
|
||||||
data.feed = feed_info.feed
|
data.feed = feed_info.feed
|
||||||
data.bozo = feed_info.feed.get('planet_bozo','true') == 'true'
|
data.bozo = feed_info.feed.get('planet_bozo','true') == 'true'
|
||||||
|
data.version = feed_info.feed.get('planet_format')
|
||||||
data.feed['planet_http_status'] = str(data.status)
|
data.feed['planet_http_status'] = str(data.status)
|
||||||
|
|
||||||
# capture etag and last-modified information
|
# capture etag and last-modified information
|
||||||
@ -184,18 +206,28 @@ def spiderFeed(feed):
|
|||||||
data.feed['planet_http_last_modified'])
|
data.feed['planet_http_last_modified'])
|
||||||
|
|
||||||
# capture feed and data from the planet configuration file
|
# capture feed and data from the planet configuration file
|
||||||
if not data.feed.has_key('links'): data.feed['links'] = list()
|
if data.version:
|
||||||
for link in data.feed.links:
|
if not data.feed.has_key('links'): data.feed['links'] = list()
|
||||||
if link.rel == 'self': break
|
feedtype = 'application/atom+xml'
|
||||||
else:
|
if data.version.startswith('rss'): feedtype = 'application/rss+xml'
|
||||||
data.feed.links.append(feedparser.FeedParserDict(
|
if data.version in ['rss090','rss10']: feedtype = 'application/rdf+xml'
|
||||||
{'rel':'self', 'type':'application/atom+xml', 'href':feed}))
|
for link in data.feed.links:
|
||||||
|
if link.rel == 'self':
|
||||||
|
link['type'] = feedtype
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
data.feed.links.append(feedparser.FeedParserDict(
|
||||||
|
{'rel':'self', 'type':feedtype, 'href':feed}))
|
||||||
for name, value in config.feed_options(feed).items():
|
for name, value in config.feed_options(feed).items():
|
||||||
data.feed['planet_'+name] = value
|
data.feed['planet_'+name] = value
|
||||||
|
|
||||||
# perform user configured scrub operations on the data
|
# perform user configured scrub operations on the data
|
||||||
scrub(feed, data)
|
scrub(feed, data)
|
||||||
|
|
||||||
|
from planet import idindex
|
||||||
|
global index
|
||||||
|
if index != None: index = idindex.open()
|
||||||
|
|
||||||
# write each entry to the cache
|
# write each entry to the cache
|
||||||
cache = config.cache_directory()
|
cache = config.cache_directory()
|
||||||
for entry in data.entries:
|
for entry in data.entries:
|
||||||
@ -211,16 +243,20 @@ def spiderFeed(feed):
|
|||||||
mtime = None
|
mtime = None
|
||||||
if not entry.has_key('updated_parsed'):
|
if not entry.has_key('updated_parsed'):
|
||||||
if entry.has_key('published_parsed'):
|
if entry.has_key('published_parsed'):
|
||||||
entry['updated_parsed'] = entry.published_parsed
|
entry['updated_parsed'] = entry['published_parsed']
|
||||||
if entry.has_key('updated_parsed'):
|
if not entry.has_key('updated_parsed'):
|
||||||
mtime = calendar.timegm(entry.updated_parsed)
|
try:
|
||||||
if mtime > time.time(): mtime = None
|
mtime = calendar.timegm(entry.updated_parsed)
|
||||||
|
except:
|
||||||
|
pass
|
||||||
if not mtime:
|
if not mtime:
|
||||||
try:
|
try:
|
||||||
mtime = os.stat(cache_file).st_mtime
|
mtime = os.stat(cache_file).st_mtime
|
||||||
except:
|
except:
|
||||||
mtime = time.time()
|
if data.feed.has_key('updated_parsed'):
|
||||||
entry['updated_parsed'] = time.gmtime(mtime)
|
mtime = calendar.timegm(data.feed.updated_parsed)
|
||||||
|
if not mtime or mtime > time.time(): mtime = time.time()
|
||||||
|
entry['updated_parsed'] = time.gmtime(mtime)
|
||||||
|
|
||||||
# apply any filters
|
# apply any filters
|
||||||
xdoc = reconstitute.reconstitute(data, entry)
|
xdoc = reconstitute.reconstitute(data, entry)
|
||||||
@ -228,12 +264,22 @@ def spiderFeed(feed):
|
|||||||
xdoc.unlink()
|
xdoc.unlink()
|
||||||
for filter in config.filters(feed):
|
for filter in config.filters(feed):
|
||||||
output = shell.run(filter, output, mode="filter")
|
output = shell.run(filter, output, mode="filter")
|
||||||
if not output: return
|
if not output: break
|
||||||
|
if not output: continue
|
||||||
|
|
||||||
# write out and timestamp the results
|
# write out and timestamp the results
|
||||||
write(output, cache_file)
|
write(output, cache_file)
|
||||||
os.utime(cache_file, (mtime, mtime))
|
os.utime(cache_file, (mtime, mtime))
|
||||||
|
|
||||||
|
# optionally index
|
||||||
|
if index != None:
|
||||||
|
feedid = data.feed.get('id', data.feed.get('link',None))
|
||||||
|
if feedid:
|
||||||
|
if type(feedid) == unicode: feedid = feedid.encode('utf-8')
|
||||||
|
index[filename('', entry.id)] = feedid
|
||||||
|
|
||||||
|
if index: index.close()
|
||||||
|
|
||||||
# identify inactive feeds
|
# identify inactive feeds
|
||||||
if config.activity_threshold(feed):
|
if config.activity_threshold(feed):
|
||||||
updated = [entry.updated_parsed for entry in data.entries
|
updated = [entry.updated_parsed for entry in data.entries
|
||||||
@ -254,6 +300,8 @@ def spiderFeed(feed):
|
|||||||
# report channel level errors
|
# report channel level errors
|
||||||
if data.status == 226:
|
if data.status == 226:
|
||||||
if data.feed.has_key('planet_message'): del data.feed['planet_message']
|
if data.feed.has_key('planet_message'): del data.feed['planet_message']
|
||||||
|
if feed_info.feed.has_key('planet_updated'):
|
||||||
|
data.feed['planet_updated'] = feed_info.feed['planet_updated']
|
||||||
elif data.status == 403:
|
elif data.status == 403:
|
||||||
data.feed['planet_message'] = "403: forbidden"
|
data.feed['planet_message'] = "403: forbidden"
|
||||||
elif data.status == 404:
|
elif data.status == 404:
|
||||||
@ -275,14 +323,17 @@ def spiderFeed(feed):
|
|||||||
write(xdoc.toxml('utf-8'), filename(sources, feed))
|
write(xdoc.toxml('utf-8'), filename(sources, feed))
|
||||||
xdoc.unlink()
|
xdoc.unlink()
|
||||||
|
|
||||||
def spiderPlanet():
|
def spiderPlanet(only_if_new = False):
|
||||||
""" Spider (fetch) an entire planet """
|
""" Spider (fetch) an entire planet """
|
||||||
log = planet.getLogger(config.log_level())
|
log = planet.getLogger(config.log_level(),config.log_format())
|
||||||
planet.setTimeout(config.feed_timeout())
|
planet.setTimeout(config.feed_timeout())
|
||||||
|
|
||||||
|
global index
|
||||||
|
index = True
|
||||||
|
|
||||||
for feed in config.subscriptions():
|
for feed in config.subscriptions():
|
||||||
try:
|
try:
|
||||||
spiderFeed(feed)
|
spiderFeed(feed, only_if_new=only_if_new)
|
||||||
except Exception,e:
|
except Exception,e:
|
||||||
import sys, traceback
|
import sys, traceback
|
||||||
type, value, tb = sys.exc_info()
|
type, value, tb = sys.exc_info()
|
||||||
|
@ -4,11 +4,12 @@ from xml.dom import minidom
|
|||||||
import planet, config, feedparser, reconstitute, shell
|
import planet, config, feedparser, reconstitute, shell
|
||||||
from reconstitute import createTextElement, date
|
from reconstitute import createTextElement, date
|
||||||
from spider import filename
|
from spider import filename
|
||||||
|
from planet import idindex
|
||||||
|
|
||||||
def splice():
|
def splice():
|
||||||
""" Splice together a planet from a cache of entries """
|
""" Splice together a planet from a cache of entries """
|
||||||
import planet
|
import planet
|
||||||
log = planet.getLogger(config.log_level())
|
log = planet.getLogger(config.log_level(),config.log_format())
|
||||||
|
|
||||||
log.info("Loading cached data")
|
log.info("Loading cached data")
|
||||||
cache = config.cache_directory()
|
cache = config.cache_directory()
|
||||||
@ -62,9 +63,15 @@ def splice():
|
|||||||
reconstitute.source(xdoc.documentElement, data.feed, None, None)
|
reconstitute.source(xdoc.documentElement, data.feed, None, None)
|
||||||
feed.appendChild(xdoc.documentElement)
|
feed.appendChild(xdoc.documentElement)
|
||||||
|
|
||||||
|
index = idindex.open()
|
||||||
|
|
||||||
# insert entry information
|
# insert entry information
|
||||||
items = 0
|
items = 0
|
||||||
for mtime,file in dir:
|
for mtime,file in dir:
|
||||||
|
if index:
|
||||||
|
base = file.split('/')[-1]
|
||||||
|
if index.has_key(base) and index[base] not in sub_ids: continue
|
||||||
|
|
||||||
try:
|
try:
|
||||||
entry=minidom.parse(file)
|
entry=minidom.parse(file)
|
||||||
|
|
||||||
@ -83,12 +90,14 @@ def splice():
|
|||||||
except:
|
except:
|
||||||
log.error("Error parsing %s", file)
|
log.error("Error parsing %s", file)
|
||||||
|
|
||||||
|
if index: index.close()
|
||||||
|
|
||||||
return doc
|
return doc
|
||||||
|
|
||||||
def apply(doc):
|
def apply(doc):
|
||||||
output_dir = config.output_dir()
|
output_dir = config.output_dir()
|
||||||
if not os.path.exists(output_dir): os.makedirs(output_dir)
|
if not os.path.exists(output_dir): os.makedirs(output_dir)
|
||||||
log = planet.getLogger(config.log_level())
|
log = planet.getLogger(config.log_level(),config.log_format())
|
||||||
|
|
||||||
# Go-go-gadget-template
|
# Go-go-gadget-template
|
||||||
for template_file in config.template_files():
|
for template_file in config.template_files():
|
||||||
|
10
runtests.py
10
runtests.py
@ -23,7 +23,7 @@ modules = map(fullmodname, glob.glob(os.path.join('tests', 'test_*.py')))
|
|||||||
|
|
||||||
# enable warnings
|
# enable warnings
|
||||||
import planet
|
import planet
|
||||||
planet.getLogger("WARNING")
|
planet.getLogger("WARNING",None)
|
||||||
|
|
||||||
# load all of the tests into a suite
|
# load all of the tests into a suite
|
||||||
try:
|
try:
|
||||||
@ -33,5 +33,11 @@ except Exception, exception:
|
|||||||
for module in modules: __import__(module)
|
for module in modules: __import__(module)
|
||||||
raise
|
raise
|
||||||
|
|
||||||
|
verbosity = 1
|
||||||
|
if "-q" in sys.argv or '--quiet' in sys.argv:
|
||||||
|
verbosity = 0
|
||||||
|
if "-v" in sys.argv or '--verbose' in sys.argv:
|
||||||
|
verbosity = 2
|
||||||
|
|
||||||
# run test suite
|
# run test suite
|
||||||
unittest.TextTestRunner().run(suite)
|
unittest.TextTestRunner(verbosity=verbosity).run(suite)
|
||||||
|
@ -18,9 +18,10 @@ os.chdir(sys.path[0])
|
|||||||
# copy spider output to splice input
|
# copy spider output to splice input
|
||||||
import planet
|
import planet
|
||||||
from planet import spider, config
|
from planet import spider, config
|
||||||
planet.getLogger('CRITICAL')
|
planet.getLogger('CRITICAL',None)
|
||||||
|
|
||||||
spider.spiderPlanet('tests/data/spider/config.ini')
|
config.load('tests/data/spider/config.ini')
|
||||||
|
spider.spiderPlanet()
|
||||||
if os.path.exists('tests/data/splice/cache'):
|
if os.path.exists('tests/data/splice/cache'):
|
||||||
shutil.rmtree('tests/data/splice/cache')
|
shutil.rmtree('tests/data/splice/cache')
|
||||||
shutil.move('tests/work/spider/cache', 'tests/data/splice/cache')
|
shutil.move('tests/work/spider/cache', 'tests/data/splice/cache')
|
||||||
@ -31,7 +32,7 @@ dest1.write(source.read().replace('/work/spider/', '/data/splice/'))
|
|||||||
dest1.close()
|
dest1.close()
|
||||||
|
|
||||||
source.seek(0)
|
source.seek(0)
|
||||||
dest2=open('tests/data/apply/config.ini', 'w')
|
dest2=open('tests/work/apply_config.ini', 'w')
|
||||||
dest2.write(source.read().replace('[Planet]', '''[Planet]
|
dest2.write(source.read().replace('[Planet]', '''[Planet]
|
||||||
output_theme = asf
|
output_theme = asf
|
||||||
output_dir = tests/work/apply'''))
|
output_dir = tests/work/apply'''))
|
||||||
@ -41,12 +42,13 @@ source.close()
|
|||||||
# copy splice output to apply input
|
# copy splice output to apply input
|
||||||
from planet import splice
|
from planet import splice
|
||||||
file=open('tests/data/apply/feed.xml', 'w')
|
file=open('tests/data/apply/feed.xml', 'w')
|
||||||
data=splice.splice('tests/data/splice/config.ini').toxml('utf-8')
|
config.load('tests/data/splice/config.ini')
|
||||||
|
data=splice.splice().toxml('utf-8')
|
||||||
file.write(data)
|
file.write(data)
|
||||||
file.close()
|
file.close()
|
||||||
|
|
||||||
# copy apply output to config/reading-list input
|
# copy apply output to config/reading-list input
|
||||||
config.load('tests/data/apply/config.ini')
|
config.load('tests/work/apply_config.ini')
|
||||||
splice.apply(data)
|
splice.apply(data)
|
||||||
shutil.move('tests/work/apply/opml.xml', 'tests/data/config')
|
shutil.move('tests/work/apply/opml.xml', 'tests/data/config')
|
||||||
|
|
||||||
|
File diff suppressed because one or more lines are too long
@ -1,8 +1,8 @@
|
|||||||
<?xml version="1.0"?>
|
<?xml version="1.0"?>
|
||||||
<opml xmlns="http://www.w3.org/1999/xhtml" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/" version="1.1">
|
<opml version="1.1">
|
||||||
<head>
|
<head>
|
||||||
<title>test planet</title>
|
<title>test planet</title>
|
||||||
<dateModified>August 25, 2006 01:41 PM</dateModified>
|
<dateModified>October 14, 2006 01:02 PM</dateModified>
|
||||||
<ownerName>Anonymous Coward</ownerName>
|
<ownerName>Anonymous Coward</ownerName>
|
||||||
<ownerEmail></ownerEmail>
|
<ownerEmail></ownerEmail>
|
||||||
</head>
|
</head>
|
||||||
|
2
tests/data/filter/excerpt-images2.ini
Normal file
2
tests/data/filter/excerpt-images2.ini
Normal file
@ -0,0 +1,2 @@
|
|||||||
|
[Planet]
|
||||||
|
filters = excerpt.py?omit=img
|
11
tests/data/filter/tmpl/enclosure_href.xml
Normal file
11
tests/data/filter/tmpl/enclosure_href.xml
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
<!--
|
||||||
|
Description: link relationship
|
||||||
|
Expect: Items[0]['enclosure_href'] == 'http://example.com/music.mp3'
|
||||||
|
-->
|
||||||
|
|
||||||
|
<feed xmlns="http://www.w3.org/2005/Atom">
|
||||||
|
<entry>
|
||||||
|
<link rel="enclosure" href="http://example.com/music.mp3"/>
|
||||||
|
</entry>
|
||||||
|
</feed>
|
||||||
|
|
11
tests/data/filter/tmpl/enclosure_length.xml
Normal file
11
tests/data/filter/tmpl/enclosure_length.xml
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
<!--
|
||||||
|
Description: link relationship
|
||||||
|
Expect: Items[0]['enclosure_length'] == '100'
|
||||||
|
-->
|
||||||
|
|
||||||
|
<feed xmlns="http://www.w3.org/2005/Atom">
|
||||||
|
<entry>
|
||||||
|
<link rel="enclosure" length="100"/>
|
||||||
|
</entry>
|
||||||
|
</feed>
|
||||||
|
|
11
tests/data/filter/tmpl/enclosure_type.xml
Normal file
11
tests/data/filter/tmpl/enclosure_type.xml
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
<!--
|
||||||
|
Description: link relationship
|
||||||
|
Expect: Items[0]['enclosure_type'] == 'audio/mpeg'
|
||||||
|
-->
|
||||||
|
|
||||||
|
<feed xmlns="http://www.w3.org/2005/Atom">
|
||||||
|
<entry>
|
||||||
|
<link rel="enclosure" type="audio/mpeg"/>
|
||||||
|
</entry>
|
||||||
|
</feed>
|
||||||
|
|
7
tests/data/filter/translate.ini
Normal file
7
tests/data/filter/translate.ini
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
[Planet]
|
||||||
|
filters = translate.xslt
|
||||||
|
filter_directories = tests/data/filter
|
||||||
|
|
||||||
|
[translate.xslt]
|
||||||
|
in = aeiou
|
||||||
|
out = AEIOU
|
20
tests/data/filter/translate.xslt
Normal file
20
tests/data/filter/translate.xslt
Normal file
@ -0,0 +1,20 @@
|
|||||||
|
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
|
||||||
|
<xsl:param name="in"/>
|
||||||
|
<xsl:param name="out"/>
|
||||||
|
|
||||||
|
<!-- translate $in characters to $out in attribute values -->
|
||||||
|
<xsl:template match="@*">
|
||||||
|
<xsl:attribute name="{name()}">
|
||||||
|
<xsl:value-of select="translate(.,$in,$out)"/>
|
||||||
|
</xsl:attribute>
|
||||||
|
</xsl:template>
|
||||||
|
|
||||||
|
<!-- pass through everything else -->
|
||||||
|
<xsl:template match="node()">
|
||||||
|
<xsl:copy>
|
||||||
|
<xsl:apply-templates select="@*|node()"/>
|
||||||
|
</xsl:copy>
|
||||||
|
</xsl:template>
|
||||||
|
|
||||||
|
</xsl:stylesheet>
|
||||||
|
|
2
tests/data/filter/xpath-sifter2.ini
Normal file
2
tests/data/filter/xpath-sifter2.ini
Normal file
@ -0,0 +1,2 @@
|
|||||||
|
[Planet]
|
||||||
|
filters = xpath_sifter.py?require=//atom%3Acategory%5B%40term%3D%27two%27%5D
|
40
tests/data/reconstitute.xslt
Normal file
40
tests/data/reconstitute.xslt
Normal file
@ -0,0 +1,40 @@
|
|||||||
|
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
|
||||||
|
xmlns:atom="http://www.w3.org/2005/Atom"
|
||||||
|
xmlns:planet="http://planet.intertwingly.net/"
|
||||||
|
xmlns:xhtml="http://www.w3.org/1999/xhtml"
|
||||||
|
xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
|
||||||
|
<!-- indent atom and planet elements -->
|
||||||
|
<xsl:template match="atom:*|planet:*">
|
||||||
|
<!-- double space before atom:entries and planet:source -->
|
||||||
|
<xsl:if test="self::atom:entry | self::planet:source">
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
|
</xsl:if>
|
||||||
|
|
||||||
|
<!-- indent start tag -->
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
|
<xsl:for-each select="ancestor::*">
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
|
</xsl:for-each>
|
||||||
|
|
||||||
|
<xsl:copy>
|
||||||
|
<xsl:apply-templates select="@*|node()"/>
|
||||||
|
|
||||||
|
<!-- indent end tag if there are element children -->
|
||||||
|
<xsl:if test="*">
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
|
<xsl:for-each select="ancestor::*">
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
|
</xsl:for-each>
|
||||||
|
</xsl:if>
|
||||||
|
</xsl:copy>
|
||||||
|
</xsl:template>
|
||||||
|
|
||||||
|
<!-- pass through everything else -->
|
||||||
|
<xsl:template match="@*|node()">
|
||||||
|
<xsl:copy>
|
||||||
|
<xsl:apply-templates select="@*|node()"/>
|
||||||
|
</xsl:copy>
|
||||||
|
</xsl:template>
|
||||||
|
|
||||||
|
</xsl:stylesheet>
|
13
tests/data/reconstitute/enclosure.xml
Normal file
13
tests/data/reconstitute/enclosure.xml
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
<!--
|
||||||
|
Description: enclosure
|
||||||
|
Expect: links[0].rel == 'enclosure' and id == 'http://example.com/1'
|
||||||
|
-->
|
||||||
|
|
||||||
|
<rss>
|
||||||
|
<channel>
|
||||||
|
<item>
|
||||||
|
<enclosure href="http://example.com/1"/>
|
||||||
|
</item>
|
||||||
|
</channel>
|
||||||
|
</rss>
|
||||||
|
|
12
tests/data/reconstitute/feedburner_origlink.xml
Normal file
12
tests/data/reconstitute/feedburner_origlink.xml
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
<!--
|
||||||
|
Description: feedburner origlink relationship
|
||||||
|
Expect: feedburner_origlink == 'http://example.com/1'
|
||||||
|
-->
|
||||||
|
|
||||||
|
<feed xmlns="http://www.w3.org/2005/Atom"
|
||||||
|
xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
|
||||||
|
<entry>
|
||||||
|
<feedburner:origlink>http://example.com/1</feedburner:origlink>
|
||||||
|
</entry>
|
||||||
|
</feed>
|
||||||
|
|
11
tests/data/reconstitute/link_length.xml
Normal file
11
tests/data/reconstitute/link_length.xml
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
<!--
|
||||||
|
Description: link relationship
|
||||||
|
Expect: links[0].length == '4000000'
|
||||||
|
-->
|
||||||
|
|
||||||
|
<feed xmlns="http://www.w3.org/2005/Atom">
|
||||||
|
<entry>
|
||||||
|
<link rel="enclosure" href="http://example.com/music.mp3" length="4000000"/>
|
||||||
|
</entry>
|
||||||
|
</feed>
|
||||||
|
|
14
tests/data/reconstitute/missing_item_pubDate.xml
Normal file
14
tests/data/reconstitute/missing_item_pubDate.xml
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
<!--
|
||||||
|
Description: if item pubdate is missing, use to channel level date
|
||||||
|
Expect: updated_parsed == (2006, 6, 21, 13, 16, 41, 2, 172, 0)
|
||||||
|
-->
|
||||||
|
|
||||||
|
<rss version="0.91">
|
||||||
|
<channel>
|
||||||
|
<pubDate>Wed, 21 Jun 2006 14:16:41 +0100</pubDate>
|
||||||
|
<item/>
|
||||||
|
</channel>
|
||||||
|
</rss>
|
||||||
|
|
||||||
|
|
||||||
|
|
12
tests/data/reconstitute/rss_image.xml
Normal file
12
tests/data/reconstitute/rss_image.xml
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
<!--
|
||||||
|
Description: logo
|
||||||
|
Expect: source.logo == 'http://example.com/logo.jpg'
|
||||||
|
-->
|
||||||
|
|
||||||
|
<rss version="2.0">
|
||||||
|
<channel>
|
||||||
|
<image><url>http://example.com/logo.jpg</url></image>
|
||||||
|
<item/>
|
||||||
|
</channel>
|
||||||
|
</rss>
|
||||||
|
|
14
tests/data/reconstitute/rss_lang.xml
Normal file
14
tests/data/reconstitute/rss_lang.xml
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
<!--
|
||||||
|
Description: link relationship
|
||||||
|
Expect: title_detail.language == 'en'
|
||||||
|
-->
|
||||||
|
|
||||||
|
<rss version="2.0">
|
||||||
|
<channel>
|
||||||
|
<language>en</language>
|
||||||
|
<item>
|
||||||
|
<title>foo</title>
|
||||||
|
</item>
|
||||||
|
</channel>
|
||||||
|
</rss>
|
||||||
|
|
2
tests/data/splice/cache/example.com,3
vendored
2
tests/data/splice/cache/example.com,3
vendored
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>http://example.com/3</id><link href="http://example.com/3" rel="alternate" type="text/html"/><title>Earth</title><summary>the Blue Planet</summary><updated planet:format="January 03, 2006 12:00 AM">2006-01-03T00:00:00Z</updated><source><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><planet:name>three</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>http://example.com/3</id><link href="http://example.com/3" rel="alternate" type="text/html"/><title>Earth</title><summary>the Blue Planet</summary><updated planet:format="January 03, 2006 12:00 AM">2006-01-03T00:00:00Z</updated><source><id>http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss</id><author><name>three</name></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><planet:format>rss20</planet:format><planet:name>three</planet:name><planet:bozo>true</planet:bozo><planet:http_status>200</planet:http_status></source></entry>
|
2
tests/data/splice/cache/example.com,4
vendored
2
tests/data/splice/cache/example.com,4
vendored
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>http://example.com/4</id><link href="http://example.com/4" rel="alternate" type="text/html"/><title>Mars</title><summary>the Red Planet</summary><updated planet:format="August 25, 2006 01:41 PM">2006-08-25T13:41:22Z</updated><source><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><planet:name>three</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>http://example.com/4</id><link href="http://example.com/4" rel="alternate" type="text/html"/><title>Mars</title><summary>the Red Planet</summary><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><source><id>http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss</id><author><name>three</name></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><planet:format>rss20</planet:format><planet:name>three</planet:name><planet:bozo>true</planet:bozo><planet:http_status>200</planet:http_status></source></entry>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/1</id><link href="http://example.com/1" rel="alternate" type="text/html"/><title>Mercury</title><content>Messenger of the Roman Gods</content><updated planet:format="January 01, 2006 12:00 AM">2006-01-01T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/1</id><link href="http://example.com/1" rel="alternate" type="text/html"/><title>Mercury</title><content>Messenger of the Roman Gods</content><updated planet:format="January 01, 2006 12:00 AM">2006-01-01T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/2</id><link href="http://example.com/2" rel="alternate" type="text/html"/><title>Venus</title><content>the Jewel of the Sky</content><updated planet:format="February 02, 2006 12:00 AM">2006-02-02T00:00:00Z</updated><published planet:format="January 02, 2006 12:00 AM">2006-01-02T00:00:00Z</published><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/2</id><link href="http://example.com/2" rel="alternate" type="text/html"/><title>Venus</title><content>the Jewel of the Sky</content><updated planet:format="February 02, 2006 12:00 AM">2006-02-02T00:00:00Z</updated><published planet:format="January 02, 2006 12:00 AM">2006-01-02T00:00:00Z</published><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/3</id><link href="http://example.com/3" rel="alternate" type="text/html"/><title>Earth</title><content>the Blue Planet</content><updated planet:format="January 03, 2006 12:00 AM">2006-01-03T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/3</id><link href="http://example.com/3" rel="alternate" type="text/html"/><title>Earth</title><content>the Blue Planet</content><updated planet:format="January 03, 2006 12:00 AM">2006-01-03T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/4</id><link href="http://example.com/4" rel="alternate" type="text/html"/><title>Mars</title><content>the Red Planet</content><updated planet:format="January 04, 2006 12:00 AM">2006-01-04T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1/4</id><link href="http://example.com/4" rel="alternate" type="text/html"/><title>Mars</title><content>the Red Planet</content><updated planet:format="January 04, 2006 12:00 AM">2006-01-04T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>one</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/1</id><link href="http://example.com/1" rel="alternate" type="text/html"/><title xml:lang="en-us">Mercury</title><content xml:lang="en-us">Messenger of the Roman Gods</content><updated planet:format="January 01, 2006 12:00 AM">2006-01-01T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/1</id><link href="http://example.com/1" rel="alternate" type="text/html"/><title xml:lang="en-us">Mercury</title><content xml:lang="en-us">Messenger of the Roman Gods</content><updated planet:format="January 01, 2006 12:00 AM">2006-01-01T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/2</id><link href="http://example.com/2" rel="alternate" type="text/html"/><title xml:lang="en-us">Venus</title><content xml:lang="en-us">the Morning Star</content><updated planet:format="January 02, 2006 12:00 AM">2006-01-02T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/2</id><link href="http://example.com/2" rel="alternate" type="text/html"/><title xml:lang="en-us">Venus</title><content xml:lang="en-us">the Morning Star</content><updated planet:format="January 02, 2006 12:00 AM">2006-01-02T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/3</id><link href="http://example.com/3" rel="alternate" type="text/html"/><title>Earth</title><content xml:lang="en-us">the Blue Planet</content><updated planet:format="January 03, 2006 12:00 AM">2006-01-03T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/3</id><link href="http://example.com/3" rel="alternate" type="text/html"/><title>Earth</title><content xml:lang="en-us">the Blue Planet</content><updated planet:format="January 03, 2006 12:00 AM">2006-01-03T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/4</id><link href="http://example.com/4" rel="alternate" type="text/html"/><title>Mars</title><content>the Red Planet</content><updated planet:format="January 04, 2006 12:00 AM">2006-01-04T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2/4</id><link href="http://example.com/4" rel="alternate" type="text/html"/><title>Mars</title><content>the Red Planet</content><updated planet:format="January 04, 2006 12:00 AM">2006-01-04T00:00:00Z</updated><source><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>two</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed3/1</id><link href="http://example.com/1" rel="alternate" type="text/html"/><title>Mercury</title><summary>Messenger of the Roman Gods</summary><updated planet:format="January 01, 2006 12:00 AM">2006-01-01T00:00:00Z</updated><source><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><planet:name>three</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed3/1</id><link href="http://example.com/1" rel="alternate" type="text/html"/><title>Mercury</title><summary>Messenger of the Roman Gods</summary><updated planet:format="January 01, 2006 12:00 AM">2006-01-01T00:00:00Z</updated><source><id>http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss</id><author><name>three</name></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><planet:format>rss20</planet:format><planet:name>three</planet:name><planet:bozo>true</planet:bozo><planet:http_status>200</planet:http_status></source></entry>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed3/2</id><link href="http://example.com/2" rel="alternate" type="text/html"/><title>Venus</title><summary>the Morning Star</summary><updated planet:format="August 25, 2006 01:41 PM">2006-08-25T13:41:22Z</updated><source><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><planet:name>three</planet:name><planet:http_status>200</planet:http_status></source></entry>
|
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed3/2</id><link href="http://example.com/2" rel="alternate" type="text/html"/><title>Venus</title><summary>the Morning Star</summary><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><source><id>http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss</id><author><name>three</name></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><planet:format>rss20</planet:format><planet:name>three</planet:name><planet:bozo>true</planet:bozo><planet:http_status>200</planet:http_status></source></entry>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><link href="tests/data/spider/testfeed0.atom" rel="self" type="application/atom+xml"/><planet:name>not found</planet:name><planet:http_status>500</planet:http_status></feed>
|
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><author><name>not found</name></author><link href="tests/data/spider/testfeed0.atom" rel="self" type="application/atom+xml"/><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><planet:message>internal server error</planet:message><planet:bozo>true</planet:bozo><planet:http_status>500</planet:http_status><planet:name>not found</planet:name></feed>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>one</planet:name><planet:http_status>200</planet:http_status></feed>
|
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed1</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed1a.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>one</planet:name><planet:http_status>200</planet:http_status></feed>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:name>two</planet:name><planet:http_status>200</planet:http_status></feed>
|
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>tag:planet.intertwingly.net,2006:testfeed2</id><author><name>Sam Ruby</name><email>rubys@intertwingly.net</email><uri>http://www.intertwingly.net/blog/</uri></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed2.atom" rel="self" type="application/atom+xml"/><link href="http://www.intertwingly.net/blog/" rel="alternate" type="text/html"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="June 17, 2006 12:15 AM">2006-06-17T00:15:18Z</updated><planet:bozo>false</planet:bozo><planet:format>atom10</planet:format><planet:name>two</planet:name><planet:http_status>200</planet:http_status></feed>
|
@ -1,2 +1,2 @@
|
|||||||
<?xml version="1.0" encoding="utf-8"?>
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><planet:name>three</planet:name><planet:http_status>200</planet:http_status></feed>
|
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/"><id>http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss</id><author><name>three</name></author><link href="http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss" rel="alternate" type="text/html"/><link href="tests/data/spider/testfeed3.rss" rel="self" type="application/atom+xml"/><subtitle>It’s just data</subtitle><title>Sam Ruby</title><updated planet:format="October 14, 2006 01:02 PM">2006-10-14T13:02:18Z</updated><planet:format>rss20</planet:format><planet:name>three</planet:name><planet:bozo>true</planet:bozo><planet:http_status>200</planet:http_status></feed>
|
87
tests/reconstitute.py
Normal file
87
tests/reconstitute.py
Normal file
@ -0,0 +1,87 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
import os, sys, ConfigParser, shutil, glob
|
||||||
|
venus_base = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||||
|
sys.path.insert(0,venus_base)
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
|
||||||
|
hide_planet_ns = True
|
||||||
|
|
||||||
|
while len(sys.argv) > 1:
|
||||||
|
if sys.argv[1] == '-v' or sys.argv[1] == '--verbose':
|
||||||
|
import planet
|
||||||
|
planet.getLogger('DEBUG',None)
|
||||||
|
del sys.argv[1]
|
||||||
|
elif sys.argv[1] == '-p' or sys.argv[1] == '--planet':
|
||||||
|
hide_planet_ns = False
|
||||||
|
del sys.argv[1]
|
||||||
|
else:
|
||||||
|
break
|
||||||
|
|
||||||
|
parser = ConfigParser.ConfigParser()
|
||||||
|
parser.add_section('Planet')
|
||||||
|
parser.add_section(sys.argv[1])
|
||||||
|
work = reduce(os.path.join, ['tests','work','reconsititute'], venus_base)
|
||||||
|
output = os.path.join(work, 'output')
|
||||||
|
filters = os.path.join(venus_base,'filters')
|
||||||
|
parser.set('Planet','cache_directory',work)
|
||||||
|
parser.set('Planet','output_dir',output)
|
||||||
|
parser.set('Planet','filter_directories',filters)
|
||||||
|
if hide_planet_ns:
|
||||||
|
parser.set('Planet','template_files','themes/common/atom.xml.xslt')
|
||||||
|
else:
|
||||||
|
parser.set('Planet','template_files','tests/data/reconstitute.xslt')
|
||||||
|
|
||||||
|
for name, value in zip(sys.argv[2::2],sys.argv[3::2]):
|
||||||
|
parser.set(sys.argv[1], name.lstrip('-'), value)
|
||||||
|
|
||||||
|
from planet import config
|
||||||
|
config.parser = parser
|
||||||
|
|
||||||
|
from planet import spider
|
||||||
|
spider.spiderPlanet(only_if_new=False)
|
||||||
|
|
||||||
|
from planet import feedparser
|
||||||
|
for source in glob.glob(os.path.join(work, 'sources/*')):
|
||||||
|
feed = feedparser.parse(source).feed
|
||||||
|
if feed.has_key('title'):
|
||||||
|
config.parser.set('Planet','name',feed.title_detail.value)
|
||||||
|
if feed.has_key('link'):
|
||||||
|
config.parser.set('Planet','link',feed.link)
|
||||||
|
if feed.has_key('author_detail'):
|
||||||
|
if feed.author_detail.has_key('name'):
|
||||||
|
config.parser.set('Planet','owner_name',feed.author_detail.name)
|
||||||
|
if feed.author_detail.has_key('email'):
|
||||||
|
config.parser.set('Planet','owner_email',feed.author_detail.email)
|
||||||
|
|
||||||
|
from planet import splice
|
||||||
|
doc = splice.splice()
|
||||||
|
|
||||||
|
sources = doc.getElementsByTagName('planet:source')
|
||||||
|
if hide_planet_ns and len(sources) == 1:
|
||||||
|
source = sources[0]
|
||||||
|
feed = source.parentNode
|
||||||
|
child = feed.firstChild
|
||||||
|
while child:
|
||||||
|
next = child.nextSibling
|
||||||
|
if child.nodeName not in ['planet:source','entry']:
|
||||||
|
feed.removeChild(child)
|
||||||
|
child = next
|
||||||
|
while source.hasChildNodes():
|
||||||
|
child = source.firstChild
|
||||||
|
source.removeChild(child)
|
||||||
|
feed.insertBefore(child, source)
|
||||||
|
for source in doc.getElementsByTagName('source'):
|
||||||
|
source.parentNode.removeChild(source)
|
||||||
|
|
||||||
|
splice.apply(doc.toxml('utf-8'))
|
||||||
|
|
||||||
|
if hide_planet_ns:
|
||||||
|
atom = open(os.path.join(output,'atom.xml')).read()
|
||||||
|
else:
|
||||||
|
atom = open(os.path.join(output,'reconstitute')).read()
|
||||||
|
|
||||||
|
shutil.rmtree(work)
|
||||||
|
os.removedirs(os.path.dirname(work))
|
||||||
|
|
||||||
|
print atom
|
28
tests/test_filter_xslt.py
Normal file
28
tests/test_filter_xslt.py
Normal file
@ -0,0 +1,28 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
import unittest, xml.dom.minidom
|
||||||
|
from planet import shell, config, logger
|
||||||
|
|
||||||
|
class XsltFilterTests(unittest.TestCase):
|
||||||
|
|
||||||
|
def test_xslt_filter(self):
|
||||||
|
config.load('tests/data/filter/translate.ini')
|
||||||
|
testfile = 'tests/data/filter/category-one.xml'
|
||||||
|
|
||||||
|
input = open(testfile).read()
|
||||||
|
output = shell.run(config.filters()[0], input, mode="filter")
|
||||||
|
dom = xml.dom.minidom.parseString(output)
|
||||||
|
catterm = dom.getElementsByTagName('category')[0].getAttribute('term')
|
||||||
|
self.assertEqual('OnE', catterm)
|
||||||
|
|
||||||
|
try:
|
||||||
|
import libxslt
|
||||||
|
except:
|
||||||
|
try:
|
||||||
|
from subprocess import Popen, PIPE
|
||||||
|
xsltproc=Popen(['xsltproc','--version'],stdout=PIPE,stderr=PIPE)
|
||||||
|
xsltproc.communicate()
|
||||||
|
if xsltproc.returncode != 0: raise ImportError
|
||||||
|
except:
|
||||||
|
logger.warn("libxslt is not available => can't test xslt filters")
|
||||||
|
del XsltFilterTests.test_xslt_filter
|
@ -14,10 +14,16 @@ class FilterTests(unittest.TestCase):
|
|||||||
imgsrc = dom.getElementsByTagName('img')[0].getAttribute('src')
|
imgsrc = dom.getElementsByTagName('img')[0].getAttribute('src')
|
||||||
self.assertEqual('http://example.com.nyud.net:8080/foo.png', imgsrc)
|
self.assertEqual('http://example.com.nyud.net:8080/foo.png', imgsrc)
|
||||||
|
|
||||||
def test_excerpt_images(self):
|
def test_excerpt_images1(self):
|
||||||
testfile = 'tests/data/filter/excerpt-images.xml'
|
|
||||||
config.load('tests/data/filter/excerpt-images.ini')
|
config.load('tests/data/filter/excerpt-images.ini')
|
||||||
|
self.verify_images()
|
||||||
|
|
||||||
|
def test_excerpt_images2(self):
|
||||||
|
config.load('tests/data/filter/excerpt-images2.ini')
|
||||||
|
self.verify_images()
|
||||||
|
|
||||||
|
def verify_images(self):
|
||||||
|
testfile = 'tests/data/filter/excerpt-images.xml'
|
||||||
output = open(testfile).read()
|
output = open(testfile).read()
|
||||||
for filter in config.filters():
|
for filter in config.filters():
|
||||||
output = shell.run(filter, output, mode="filter")
|
output = shell.run(filter, output, mode="filter")
|
||||||
@ -58,8 +64,15 @@ class FilterTests(unittest.TestCase):
|
|||||||
self.assertEqual(u'before--after',
|
self.assertEqual(u'before--after',
|
||||||
excerpt.firstChild.firstChild.nodeValue)
|
excerpt.firstChild.firstChild.nodeValue)
|
||||||
|
|
||||||
def test_xpath_filter(self):
|
def test_xpath_filter1(self):
|
||||||
config.load('tests/data/filter/xpath-sifter.ini')
|
config.load('tests/data/filter/xpath-sifter.ini')
|
||||||
|
self.verify_xpath()
|
||||||
|
|
||||||
|
def test_xpath_filter2(self):
|
||||||
|
config.load('tests/data/filter/xpath-sifter2.ini')
|
||||||
|
self.verify_xpath()
|
||||||
|
|
||||||
|
def verify_xpath(self):
|
||||||
testfile = 'tests/data/filter/category-one.xml'
|
testfile = 'tests/data/filter/category-one.xml'
|
||||||
|
|
||||||
output = open(testfile).read()
|
output = open(testfile).read()
|
||||||
@ -89,9 +102,10 @@ try:
|
|||||||
import libxml2
|
import libxml2
|
||||||
except:
|
except:
|
||||||
logger.warn("libxml2 is not available => can't test xpath_sifter")
|
logger.warn("libxml2 is not available => can't test xpath_sifter")
|
||||||
del FilterTests.test_xpath_filter
|
del FilterTests.test_xpath_filter1
|
||||||
|
del FilterTests.test_xpath_filter2
|
||||||
|
|
||||||
except ImportError:
|
except ImportError:
|
||||||
logger.warn("Popen is not available => can't test filters")
|
logger.warn("Popen is not available => can't test standard filters")
|
||||||
for method in dir(FilterTests):
|
for method in dir(FilterTests):
|
||||||
if method.startswith('test_'): delattr(FilterTests,method)
|
if method.startswith('test_'): delattr(FilterTests,method)
|
||||||
|
74
tests/test_idindex.py
Normal file
74
tests/test_idindex.py
Normal file
@ -0,0 +1,74 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
import unittest
|
||||||
|
from planet import idindex, config, logger
|
||||||
|
|
||||||
|
class idIndexTest(unittest.TestCase):
|
||||||
|
|
||||||
|
def setUp(self):
|
||||||
|
# silence errors
|
||||||
|
import planet
|
||||||
|
planet.logger = None
|
||||||
|
planet.getLogger('CRITICAL',None)
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
idindex.destroy()
|
||||||
|
|
||||||
|
def test_unicode(self):
|
||||||
|
from planet.spider import filename
|
||||||
|
index = idindex.create()
|
||||||
|
iri = 'http://www.\xe8\xa9\xb9\xe5\xa7\x86\xe6\x96\xaf.com/'
|
||||||
|
index[filename('', iri)] = 'data'
|
||||||
|
index[filename('', iri.decode('utf-8'))] = 'data'
|
||||||
|
index[filename('', u'1234')] = 'data'
|
||||||
|
index.close()
|
||||||
|
|
||||||
|
def test_index_spider(self):
|
||||||
|
import test_spider
|
||||||
|
config.load(test_spider.configfile)
|
||||||
|
|
||||||
|
index = idindex.create()
|
||||||
|
self.assertEqual(0, len(index))
|
||||||
|
index.close()
|
||||||
|
|
||||||
|
from planet.spider import spiderPlanet
|
||||||
|
try:
|
||||||
|
spiderPlanet()
|
||||||
|
|
||||||
|
index = idindex.open()
|
||||||
|
self.assertEqual(12, len(index))
|
||||||
|
self.assertEqual('tag:planet.intertwingly.net,2006:testfeed1', index['planet.intertwingly.net,2006,testfeed1,1'])
|
||||||
|
self.assertEqual('http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss', index['planet.intertwingly.net,2006,testfeed3,1'])
|
||||||
|
index.close()
|
||||||
|
finally:
|
||||||
|
import os, shutil
|
||||||
|
shutil.rmtree(test_spider.workdir)
|
||||||
|
os.removedirs(os.path.split(test_spider.workdir)[0])
|
||||||
|
|
||||||
|
def test_index_splice(self):
|
||||||
|
import test_splice
|
||||||
|
config.load(test_splice.configfile)
|
||||||
|
index = idindex.create()
|
||||||
|
|
||||||
|
self.assertEqual(12, len(index))
|
||||||
|
self.assertEqual('tag:planet.intertwingly.net,2006:testfeed1', index['planet.intertwingly.net,2006,testfeed1,1'])
|
||||||
|
self.assertEqual('http://intertwingly.net/code/venus/tests/data/spider/testfeed3.rss', index['planet.intertwingly.net,2006,testfeed3,1'])
|
||||||
|
|
||||||
|
for key in index.keys():
|
||||||
|
value = index[key]
|
||||||
|
if value.find('testfeed2')>0: index[key] = value.swapcase()
|
||||||
|
index.close()
|
||||||
|
|
||||||
|
from planet.splice import splice
|
||||||
|
doc = splice()
|
||||||
|
|
||||||
|
self.assertEqual(8,len(doc.getElementsByTagName('entry')))
|
||||||
|
self.assertEqual(4,len(doc.getElementsByTagName('planet:source')))
|
||||||
|
self.assertEqual(12,len(doc.getElementsByTagName('planet:name')))
|
||||||
|
|
||||||
|
try:
|
||||||
|
module = 'dbhash'
|
||||||
|
except ImportError:
|
||||||
|
logger.warn("dbhash is not available => can't test id index")
|
||||||
|
for method in dir(idIndexTest):
|
||||||
|
if method.startswith('test_'): delattr(idIndexTest,method)
|
@ -76,6 +76,14 @@ class OpmlTest(unittest.TestCase):
|
|||||||
text="sample feed"/>''', self.config)
|
text="sample feed"/>''', self.config)
|
||||||
self.assertFalse(self.config.has_section("http://example.com/feed.xml"))
|
self.assertFalse(self.config.has_section("http://example.com/feed.xml"))
|
||||||
|
|
||||||
|
def test_WordPress_link_manager(self):
|
||||||
|
# http://www.wasab.dk/morten/blog/archives/2006/10/22/wp-venus
|
||||||
|
opml2config('''<outline type="link"
|
||||||
|
xmlUrl="http://example.com/feed.xml"
|
||||||
|
text="sample feed"/>''', self.config)
|
||||||
|
self.assertEqual('sample feed',
|
||||||
|
self.config.get("http://example.com/feed.xml", 'name'))
|
||||||
|
|
||||||
#
|
#
|
||||||
# xmlUrl
|
# xmlUrl
|
||||||
#
|
#
|
||||||
|
@ -7,7 +7,7 @@ from planet import feedparser, config
|
|||||||
feed = '''
|
feed = '''
|
||||||
<feed xmlns='http://www.w3.org/2005/Atom'>
|
<feed xmlns='http://www.w3.org/2005/Atom'>
|
||||||
<author><name>F&ouml;o</name></author>
|
<author><name>F&ouml;o</name></author>
|
||||||
<entry>
|
<entry xml:lang="en">
|
||||||
<id>ignoreme</id>
|
<id>ignoreme</id>
|
||||||
<author><name>F&ouml;o</name></author>
|
<author><name>F&ouml;o</name></author>
|
||||||
<updated>2000-01-01T00:00:00Z</updated>
|
<updated>2000-01-01T00:00:00Z</updated>
|
||||||
@ -23,7 +23,7 @@ feed = '''
|
|||||||
|
|
||||||
configData = '''
|
configData = '''
|
||||||
[testfeed]
|
[testfeed]
|
||||||
ignore_in_feed = id updated
|
ignore_in_feed = id updated xml:lang
|
||||||
name_type = html
|
name_type = html
|
||||||
title_type = html
|
title_type = html
|
||||||
summary_type = html
|
summary_type = html
|
||||||
@ -40,12 +40,14 @@ class ScrubTest(unittest.TestCase):
|
|||||||
self.assertTrue(data.entries[0].has_key('id'))
|
self.assertTrue(data.entries[0].has_key('id'))
|
||||||
self.assertTrue(data.entries[0].has_key('updated'))
|
self.assertTrue(data.entries[0].has_key('updated'))
|
||||||
self.assertTrue(data.entries[0].has_key('updated_parsed'))
|
self.assertTrue(data.entries[0].has_key('updated_parsed'))
|
||||||
|
self.assertTrue(data.entries[0].summary_detail.has_key('language'))
|
||||||
|
|
||||||
scrub('testfeed', data)
|
scrub('testfeed', data)
|
||||||
|
|
||||||
self.assertFalse(data.entries[0].has_key('id'))
|
self.assertFalse(data.entries[0].has_key('id'))
|
||||||
self.assertFalse(data.entries[0].has_key('updated'))
|
self.assertFalse(data.entries[0].has_key('updated'))
|
||||||
self.assertFalse(data.entries[0].has_key('updated_parsed'))
|
self.assertFalse(data.entries[0].has_key('updated_parsed'))
|
||||||
|
self.assertFalse(data.entries[0].summary_detail.has_key('language'))
|
||||||
|
|
||||||
self.assertEqual('F\xc3\xb6o', data.feed.author_detail.name)
|
self.assertEqual('F\xc3\xb6o', data.feed.author_detail.name)
|
||||||
self.assertEqual('F\xc3\xb6o', data.entries[0].author_detail.name)
|
self.assertEqual('F\xc3\xb6o', data.entries[0].author_detail.name)
|
||||||
|
@ -13,7 +13,7 @@ class SpiderTest(unittest.TestCase):
|
|||||||
def setUp(self):
|
def setUp(self):
|
||||||
# silence errors
|
# silence errors
|
||||||
planet.logger = None
|
planet.logger = None
|
||||||
planet.getLogger('CRITICAL')
|
planet.getLogger('CRITICAL',None)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
os.makedirs(workdir)
|
os.makedirs(workdir)
|
||||||
@ -58,6 +58,8 @@ class SpiderTest(unittest.TestCase):
|
|||||||
|
|
||||||
# verify that the file timestamps match atom:updated
|
# verify that the file timestamps match atom:updated
|
||||||
data = feedparser.parse(files[2])
|
data = feedparser.parse(files[2])
|
||||||
|
self.assertEqual(['application/atom+xml'], [link.type
|
||||||
|
for link in data.entries[0].source.links if link.rel=='self'])
|
||||||
self.assertEqual('one', data.entries[0].source.planet_name)
|
self.assertEqual('one', data.entries[0].source.planet_name)
|
||||||
self.assertEqual(os.stat(files[2]).st_mtime,
|
self.assertEqual(os.stat(files[2]).st_mtime,
|
||||||
calendar.timegm(data.entries[0].updated_parsed))
|
calendar.timegm(data.entries[0].updated_parsed))
|
||||||
@ -82,5 +84,7 @@ class SpiderTest(unittest.TestCase):
|
|||||||
|
|
||||||
data = feedparser.parse(workdir +
|
data = feedparser.parse(workdir +
|
||||||
'/planet.intertwingly.net,2006,testfeed3,1')
|
'/planet.intertwingly.net,2006,testfeed3,1')
|
||||||
|
self.assertEqual(['application/rss+xml'], [link.type
|
||||||
|
for link in data.entries[0].source.links if link.rel=='self'])
|
||||||
self.assertEqual('three', data.entries[0].source.author_detail.name)
|
self.assertEqual('three', data.entries[0].source.author_detail.name)
|
||||||
|
|
||||||
|
@ -4,7 +4,7 @@ import unittest
|
|||||||
from planet import config
|
from planet import config
|
||||||
from os.path import split
|
from os.path import split
|
||||||
|
|
||||||
class ConfigTest(unittest.TestCase):
|
class ThemesTest(unittest.TestCase):
|
||||||
def setUp(self):
|
def setUp(self):
|
||||||
config.load('tests/data/config/themed.ini')
|
config.load('tests/data/config/themed.ini')
|
||||||
|
|
||||||
@ -17,7 +17,8 @@ class ConfigTest(unittest.TestCase):
|
|||||||
# administrivia
|
# administrivia
|
||||||
|
|
||||||
def test_template(self):
|
def test_template(self):
|
||||||
self.assertTrue('index.html.xslt' in config.template_files())
|
self.assertEqual(1, len([1 for file in config.template_files()
|
||||||
|
if file == 'index.html.xslt']))
|
||||||
|
|
||||||
def test_feeds(self):
|
def test_feeds(self):
|
||||||
feeds = config.subscriptions()
|
feeds = config.subscriptions()
|
||||||
|
@ -7,6 +7,7 @@ template_files:
|
|||||||
foafroll.xml.xslt
|
foafroll.xml.xslt
|
||||||
index.html.xslt
|
index.html.xslt
|
||||||
opml.xml.xslt
|
opml.xml.xslt
|
||||||
|
validate.html.xslt
|
||||||
|
|
||||||
template_directories:
|
template_directories:
|
||||||
../common
|
../common
|
||||||
|
@ -56,6 +56,7 @@
|
|||||||
</xsl:choose>
|
</xsl:choose>
|
||||||
<img src="images/feed-icon-10x10.png" alt="(feed)"/>
|
<img src="images/feed-icon-10x10.png" alt="(feed)"/>
|
||||||
</a>
|
</a>
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
|
|
||||||
<!-- name -->
|
<!-- name -->
|
||||||
<a href="{atom:link[@rel='alternate']/@href}">
|
<a href="{atom:link[@rel='alternate']/@href}">
|
||||||
@ -153,7 +154,9 @@
|
|||||||
<img src="{atom:source/atom:icon}" class="icon"/>
|
<img src="{atom:source/atom:icon}" class="icon"/>
|
||||||
</xsl:if>
|
</xsl:if>
|
||||||
<a href="{atom:source/atom:link[@rel='alternate']/@href}">
|
<a href="{atom:source/atom:link[@rel='alternate']/@href}">
|
||||||
<xsl:attribute name="title" select="{atom:source/atom:title}"/>
|
<xsl:attribute name="title">
|
||||||
|
<xsl:value-of select="atom:source/atom:title"/>
|
||||||
|
</xsl:attribute>
|
||||||
<xsl:value-of select="atom:source/planet:name"/>
|
<xsl:value-of select="atom:source/planet:name"/>
|
||||||
</a>
|
</a>
|
||||||
<xsl:if test="string-length(atom:title) > 0">
|
<xsl:if test="string-length(atom:title) > 0">
|
||||||
@ -236,6 +239,9 @@
|
|||||||
<!-- Feedburner detritus -->
|
<!-- Feedburner detritus -->
|
||||||
<xsl:template match="xhtml:div[@class='feedflare']"/>
|
<xsl:template match="xhtml:div[@class='feedflare']"/>
|
||||||
|
|
||||||
|
<!-- Strip site meter -->
|
||||||
|
<xsl:template match="xhtml:div[comment()[. = ' Site Meter ']]"/>
|
||||||
|
|
||||||
<!-- pass through everything else -->
|
<!-- pass through everything else -->
|
||||||
<xsl:template match="@*|node()">
|
<xsl:template match="@*|node()">
|
||||||
<xsl:copy>
|
<xsl:copy>
|
||||||
|
@ -14,14 +14,18 @@
|
|||||||
<xsl:template match="atom:link[@rel='service.post']"/>
|
<xsl:template match="atom:link[@rel='service.post']"/>
|
||||||
<xsl:template match="atom:link[@rel='service.feed']"/>
|
<xsl:template match="atom:link[@rel='service.feed']"/>
|
||||||
|
|
||||||
<!-- Feedburner detritus -->
|
<!-- Feedburner detritus -->
|
||||||
<xsl:template match="xhtml:div[@class='feedflare']"/>
|
<xsl:template match="xhtml:div[@class='feedflare']"/>
|
||||||
|
|
||||||
|
<!-- Strip site meter -->
|
||||||
|
<xsl:template match="xhtml:div[comment()[. = ' Site Meter ']]"/>
|
||||||
|
|
||||||
<!-- add Google/LiveJournal-esque noindex directive -->
|
<!-- add Google/LiveJournal-esque noindex directive -->
|
||||||
<xsl:template match="atom:feed">
|
<xsl:template match="atom:feed">
|
||||||
<xsl:copy>
|
<xsl:copy>
|
||||||
<xsl:attribute name="indexing:index">no</xsl:attribute>
|
<xsl:attribute name="indexing:index">no</xsl:attribute>
|
||||||
<xsl:apply-templates select="@*|node()"/>
|
<xsl:apply-templates select="@*|node()"/>
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
</xsl:copy>
|
</xsl:copy>
|
||||||
</xsl:template>
|
</xsl:template>
|
||||||
|
|
||||||
|
@ -10,7 +10,7 @@
|
|||||||
<TMPL_LOOP Items>
|
<TMPL_LOOP Items>
|
||||||
<item>
|
<item>
|
||||||
<title><TMPL_VAR channel_name ESCAPE="HTML"><TMPL_IF title>: <TMPL_VAR title_plain ESCAPE="HTML"></TMPL_IF></title>
|
<title><TMPL_VAR channel_name ESCAPE="HTML"><TMPL_IF title>: <TMPL_VAR title_plain ESCAPE="HTML"></TMPL_IF></title>
|
||||||
<guid><TMPL_VAR id ESCAPE="HTML"></guid>
|
<guid isPermaLink="<TMPL_VAR guid_isPermaLink>"><TMPL_VAR id ESCAPE="HTML"></guid>
|
||||||
<link><TMPL_VAR link ESCAPE="HTML"></link>
|
<link><TMPL_VAR link ESCAPE="HTML"></link>
|
||||||
<TMPL_IF content>
|
<TMPL_IF content>
|
||||||
<description><TMPL_VAR content ESCAPE="HTML"></description>
|
<description><TMPL_VAR content ESCAPE="HTML"></description>
|
||||||
@ -23,6 +23,9 @@
|
|||||||
<author><TMPL_VAR author_email></author>
|
<author><TMPL_VAR author_email></author>
|
||||||
</TMPL_IF>
|
</TMPL_IF>
|
||||||
</TMPL_IF>
|
</TMPL_IF>
|
||||||
|
<TMPL_IF enclosure_href>
|
||||||
|
<enclosure url="<TMPL_VAR enclosure_href ESCAPE="HTML">" length="<TMPL_VAR enclosure_length>" type="<TMPL_VAR enclosure_type>"/>
|
||||||
|
</TMPL_IF>
|
||||||
</item>
|
</item>
|
||||||
</TMPL_LOOP>
|
</TMPL_LOOP>
|
||||||
|
|
||||||
|
146
themes/common/validate.html.xslt
Normal file
146
themes/common/validate.html.xslt
Normal file
@ -0,0 +1,146 @@
|
|||||||
|
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
|
||||||
|
xmlns:atom="http://www.w3.org/2005/Atom"
|
||||||
|
xmlns:xhtml="http://www.w3.org/1999/xhtml"
|
||||||
|
xmlns:planet="http://planet.intertwingly.net/"
|
||||||
|
xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
|
||||||
|
<xsl:template match="atom:feed">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
|
||||||
|
<!-- head -->
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
|
<head>
|
||||||
|
<title><xsl:value-of select="atom:title"/></title>
|
||||||
|
<meta name="robots" content="noindex,nofollow" />
|
||||||
|
<meta name="generator" content="{atom:generator}" />
|
||||||
|
<link rel="shortcut icon" href="/favicon.ico" />
|
||||||
|
<style type="text/css">
|
||||||
|
img{border:0}
|
||||||
|
a{text-decoration:none}
|
||||||
|
a:hover{text-decoration:underline}
|
||||||
|
.message{border-bottom:1px dashed red} a.message:hover{cursor: help;text-decoration: none}
|
||||||
|
dl{margin:0}
|
||||||
|
dt{float:left;width:9em}
|
||||||
|
dt:after{content:':'}
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
|
||||||
|
<!-- body -->
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
|
<body>
|
||||||
|
<table border="1" cellpadding="3" cellspacing="0">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th></th>
|
||||||
|
<th>Name</th>
|
||||||
|
<th>Format</th>
|
||||||
|
<xsl:if test="//planet:ignore_in_feed | //planet:filters |
|
||||||
|
//planet:*[contains(local-name(),'_type')]">
|
||||||
|
<th>Notes</th>
|
||||||
|
</xsl:if>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<xsl:apply-templates select="planet:source">
|
||||||
|
<xsl:sort select="planet:name"/>
|
||||||
|
</xsl:apply-templates>
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
|
</table>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
</xsl:template>
|
||||||
|
|
||||||
|
<xsl:template match="planet:source">
|
||||||
|
<xsl:variable name="validome_format">
|
||||||
|
<xsl:choose>
|
||||||
|
<xsl:when test="planet:format = 'rss090'">rss_0_90</xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'rss091n'">rss_0_91</xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'rss091u'">rss_0_91</xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'rss10'">rss_1_0</xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'rss092'">rss_0_90</xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'rss093'"></xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'rss094'">rss_0_90</xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'rss20'">rss_2_0</xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'rss'">rss_2_0</xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'atom01'"></xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'atom02'"></xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'atom03'">atom_0_3</xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'atom10'">atom_1_0</xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'atom'">atom_1_0</xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'cdf'"></xsl:when>
|
||||||
|
<xsl:when test="planet:format = 'hotrss'"></xsl:when>
|
||||||
|
</xsl:choose>
|
||||||
|
</xsl:variable>
|
||||||
|
|
||||||
|
<xsl:text> </xsl:text>
|
||||||
|
<tr>
|
||||||
|
<xsl:if test="planet:bozo='true'">
|
||||||
|
<xsl:attribute name="bgcolor">#FCC</xsl:attribute>
|
||||||
|
</xsl:if>
|
||||||
|
<td>
|
||||||
|
<a title="feed validator">
|
||||||
|
<xsl:attribute name="href">
|
||||||
|
<xsl:text>http://feedvalidator.org/check?url=</xsl:text>
|
||||||
|
<xsl:choose>
|
||||||
|
<xsl:when test="planet:http_location">
|
||||||
|
<xsl:value-of select="planet:http_location"/>
|
||||||
|
</xsl:when>
|
||||||
|
<xsl:when test="atom:link[@rel='self']/@href">
|
||||||
|
<xsl:value-of select="atom:link[@rel='self']/@href"/>
|
||||||
|
</xsl:when>
|
||||||
|
</xsl:choose>
|
||||||
|
</xsl:attribute>
|
||||||
|
<img src="http://feedvalidator.org/favicon.ico" hspace='2' vspace='1'/>
|
||||||
|
</a>
|
||||||
|
<a title="validome">
|
||||||
|
<xsl:attribute name="href">
|
||||||
|
<xsl:text>http://www.validome.org/rss-atom/validate?</xsl:text>
|
||||||
|
<xsl:text>viewSourceCode=1&version=</xsl:text>
|
||||||
|
<xsl:value-of select="$validome_format"/>
|
||||||
|
<xsl:text>&url=</xsl:text>
|
||||||
|
<xsl:choose>
|
||||||
|
<xsl:when test="planet:http_location">
|
||||||
|
<xsl:value-of select="planet:http_location"/>
|
||||||
|
</xsl:when>
|
||||||
|
<xsl:when test="atom:link[@rel='self']/@href">
|
||||||
|
<xsl:value-of select="atom:link[@rel='self']/@href"/>
|
||||||
|
</xsl:when>
|
||||||
|
</xsl:choose>
|
||||||
|
</xsl:attribute>
|
||||||
|
<img src="http://validome.org/favicon.ico" hspace='2' vspace='1'/>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
<td>
|
||||||
|
<a href="{atom:link[@rel='alternate']/@href}">
|
||||||
|
<xsl:choose>
|
||||||
|
<xsl:when test="planet:message">
|
||||||
|
<xsl:attribute name="class">message</xsl:attribute>
|
||||||
|
<xsl:attribute name="title">
|
||||||
|
<xsl:value-of select="planet:message"/>
|
||||||
|
</xsl:attribute>
|
||||||
|
</xsl:when>
|
||||||
|
<xsl:when test="atom:title">
|
||||||
|
<xsl:attribute name="title">
|
||||||
|
<xsl:value-of select="atom:title"/>
|
||||||
|
</xsl:attribute>
|
||||||
|
</xsl:when>
|
||||||
|
</xsl:choose>
|
||||||
|
<xsl:value-of select="planet:name"/>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
<td><xsl:value-of select="planet:format"/></td>
|
||||||
|
<xsl:if test="planet:ignore_in_feed | planet:filters |
|
||||||
|
planet:*[contains(local-name(),'_type')]">
|
||||||
|
<td>
|
||||||
|
<dl>
|
||||||
|
<xsl:for-each select="planet:ignore_in_feed | planet:filters |
|
||||||
|
planet:*[contains(local-name(),'_type')]">
|
||||||
|
<xsl:sort select="local-name()"/>
|
||||||
|
<dt><xsl:value-of select="local-name()"/></dt>
|
||||||
|
<dd><xsl:value-of select="."/></dd>
|
||||||
|
</xsl:for-each>
|
||||||
|
</dl>
|
||||||
|
</td>
|
||||||
|
</xsl:if>
|
||||||
|
</tr>
|
||||||
|
</xsl:template>
|
||||||
|
</xsl:stylesheet>
|
@ -9,6 +9,7 @@ template_files:
|
|||||||
index.html.xslt
|
index.html.xslt
|
||||||
mobile.html.xslt
|
mobile.html.xslt
|
||||||
opml.xml.xslt
|
opml.xml.xslt
|
||||||
|
validate.html.xslt
|
||||||
|
|
||||||
template_directories:
|
template_directories:
|
||||||
../asf
|
../asf
|
||||||
|
Loading…
x
Reference in New Issue
Block a user