72 lines
3.1 KiB
HTML
72 lines
3.1 KiB
HTML
<!DOCTYPE html PUBLIC
|
|
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
|
|
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<script type="text/javascript" src="docs.js"></script>
|
|
<link rel="stylesheet" type="text/css" href="docs.css"/>
|
|
<title>Venus Filters</title>
|
|
</head>
|
|
<body>
|
|
<h2>Filters</h2>
|
|
<p>Filters are simple Unix pipes. Input comes in <code>stdin</code>,
|
|
parameters come from the config file, and output goes to <code>stdout</code>.
|
|
Anything written to <code>stderr</code> is logged as an ERROR message. If no
|
|
<code>stdout</code> is produced, the entry is not written to the cache or
|
|
processed further.</p>
|
|
|
|
<p>Input to a filter is a aggressively
|
|
<a href="normalization.html">normalized</a> entry. For
|
|
example, if a feed is RSS 1.0 with 10 items, the filter will be called ten
|
|
times, each with a single Atom 1.0 entry, with all textConstructs
|
|
expressed as XHTML, and everything encoded as UTF-8.</p>
|
|
|
|
<p>You will find a small set of example filters in the <a
|
|
href="../filters">filters</a> directory. The <a
|
|
href="../filters/coral_cdn_filter.py">coral cdn filter</a> will change links
|
|
to images in the entry itself. The filters in the <a
|
|
href="../filters/stripAd/">stripAd</a> subdirectory will strip specific
|
|
types of advertisements that you may find in feeds.</p>
|
|
|
|
<p>The <a href="../filters/excerpt.py">excerpt</a> filter adds metadata (in
|
|
the form of a <code>planet:excerpt</code> element) to the feed itself. You
|
|
can see examples of how parameters are passed to this program in either
|
|
<a href="../tests/data/filter/excerpt-images.ini">excerpt-images</a> or
|
|
<a href="../examples/opml-top100.ini">opml-top100.ini</a>.
|
|
Alternately parameters may be passed
|
|
<abbr title="Uniform Resource Identifier">URI</abbr> style, for example:
|
|
<a href="../tests/data/filter/excerpt-images2.ini">excerpt-images2</a>.
|
|
</p>
|
|
|
|
<p>The <a href="../filters/xpath_sifter.py">xpath sifter</a> is a variation of
|
|
the above, including or excluding feeds based on the presence (or absence) of
|
|
data specified by <a href="http://www.w3.org/TR/xpath20/">xpath</a>
|
|
expressions. Again, parameters can be passed as
|
|
<a href="../tests/data/filter/xpath-sifter.ini">config options</a> or
|
|
<a href="../tests/data/filter/xpath-sifter2.ini">URI style</a>.
|
|
</p>
|
|
|
|
<h3>Notes</h3>
|
|
|
|
<ul>
|
|
|
|
<li>The file extension of the filter is significant. <code>.py</code> invokes
|
|
python. <code>.xslt</code> involkes XSLT. <code>.sed</code> and
|
|
<code>.tmpl</code> (a.k.a. htmltmp) are also options. Other languages, like
|
|
perl or ruby or class/jar (java), aren't supported at the moment, but these
|
|
would be easy to add.</li>
|
|
|
|
<li>Any filters listed in the <code>[planet]</code> section of your config.ini
|
|
will be invoked on all feeds. Filters listed in individual
|
|
<code>[feed]</code> sections will only be invoked on those feeds.</li>
|
|
|
|
<li>Filters are simply invoked in the order they are listed in the
|
|
configuration file (think unix pipes). Planet wide filters are executed before
|
|
feed specific filters.</li>
|
|
|
|
<li>Templates written using htmltmpl currently only have access to a fixed set
|
|
of fields, whereas XSLT templates have access to everything.</li>
|
|
</ul>
|
|
</body>
|
|
</html>
|