Filters are simple Unix pipes. Input comes in stdin
,
parameters come from the config file, and output goes to stdout
.
Anything written to stderr
is logged as an ERROR message. If no
stdout
is produced, the entry is not written to the cache or
processed further.
Input to a filter is a aggressively normalized entry. For example, if a feed is RSS 1.0 with 10 items, the filter will be called ten times, each with a single Atom 1.0 entry, with all textConstructs expressed as XHTML, and everything encoded as UTF-8.
You will find a small set of example filters in the filters directory. The coral cdn filter will change links to images in the entry itself. The filters in the stripAd subdirectory will strip specific types of advertisements that you may find in feeds.
The excerpt filter adds metadata (in
the form of a planet:excerpt
element) to the feed itself. You
can see examples of how parameters are passed to this program in either
excerpt-images or
opml-top100.ini.
Alternately parameters may be passed
URI style, for example:
excerpt-images2.
The xpath sifter is a variation of the above, including or excluding feeds based on the presence (or absence) of data specified by xpath expressions. Again, parameters can be passed as config options or URI style.
The regexp sifter operates just like the xpath sifter, except it uses regular expressions instead of XPath expressions.
.py
invokes
python. .xslt
involkes XSLT. .sed
and
.tmpl
(a.k.a. htmltmp) are also options. Other languages, like
perl or ruby or class/jar (java), aren't supported at the moment, but these
would be easy to add.[planet]
section of your config.ini
will be invoked on all feeds. Filters listed in individual
[feed]
sections will only be invoked on those feeds.