Retroactive filtering,
and make it clearer in the docs that filters are performed at spider time
This commit is contained in:
parent
3aef94c214
commit
bd1019e9fb
@ -13,7 +13,7 @@
|
||||
parameters come from the config file, and output goes to <code>stdout</code>.
|
||||
Anything written to <code>stderr</code> is logged as an ERROR message. If no
|
||||
<code>stdout</code> is produced, the entry is not written to the cache or
|
||||
processed further.</p>
|
||||
processed further; in fact, if the entry had previously been written to the cache, it will be removed.</p>
|
||||
|
||||
<p>Input to a filter is a aggressively
|
||||
<a href="normalization.html">normalized</a> entry. For
|
||||
@ -54,6 +54,18 @@ instead of XPath expressions.</p>
|
||||
<h3>Notes</h3>
|
||||
|
||||
<ul>
|
||||
<li>Filters are executed when a feed is fetched, and the results are placed
|
||||
into the cache. Changing a configuration file alone is not sufficient to
|
||||
change the contents of the cache — typically that only occurs after
|
||||
a feed is modified.</li>
|
||||
|
||||
<li>Filters are simply invoked in the order they are listed in the
|
||||
configuration file (think unix pipes). Planet wide filters are executed before
|
||||
feed specific filters.</li>
|
||||
|
||||
<li>Any filters listed in the <code>[planet]</code> section of your config.ini
|
||||
will be invoked on all feeds. Filters listed in individual
|
||||
<code>[feed]</code> sections will only be invoked on those feeds.</li>
|
||||
|
||||
<li>The file extension of the filter is significant. <code>.py</code> invokes
|
||||
python. <code>.xslt</code> involkes XSLT. <code>.sed</code> and
|
||||
@ -61,14 +73,6 @@ python. <code>.xslt</code> involkes XSLT. <code>.sed</code> and
|
||||
perl or ruby or class/jar (java), aren't supported at the moment, but these
|
||||
would be easy to add.</li>
|
||||
|
||||
<li>Any filters listed in the <code>[planet]</code> section of your config.ini
|
||||
will be invoked on all feeds. Filters listed in individual
|
||||
<code>[feed]</code> sections will only be invoked on those feeds.</li>
|
||||
|
||||
<li>Filters are simply invoked in the order they are listed in the
|
||||
configuration file (think unix pipes). Planet wide filters are executed before
|
||||
feed specific filters.</li>
|
||||
|
||||
<li>Templates written using htmltmpl currently only have access to a fixed set
|
||||
of fields, whereas XSLT templates have access to everything.</li>
|
||||
</ul>
|
||||
|
@ -194,7 +194,9 @@ def writeCache(feed_uri, feed_info, data):
|
||||
for filter in config.filters(feed_uri):
|
||||
output = shell.run(filter, output, mode="filter")
|
||||
if not output: break
|
||||
if not output: continue
|
||||
if not output:
|
||||
if os.path.exists(cache_file): os.remove(cache_file)
|
||||
continue
|
||||
|
||||
# write out and timestamp the results
|
||||
write(output, cache_file)
|
||||
|
@ -73,6 +73,14 @@ class SpiderTest(unittest.TestCase):
|
||||
self.spiderFeed(testfeed % '1b')
|
||||
self.verify_spiderFeed()
|
||||
|
||||
def test_spiderFeed_retroactive_filter(self):
|
||||
config.load(configfile)
|
||||
self.spiderFeed(testfeed % '1b')
|
||||
self.assertEqual(5, len(glob.glob(workdir+"/*")))
|
||||
config.parser.set('Planet', 'filter', 'two')
|
||||
self.spiderFeed(testfeed % '1b')
|
||||
self.assertEqual(1, len(glob.glob(workdir+"/*")))
|
||||
|
||||
def test_spiderUpdate(self):
|
||||
config.load(configfile)
|
||||
self.spiderFeed(testfeed % '1a')
|
||||
|
Loading…
x
Reference in New Issue
Block a user