Retroactive filtering,
and make it clearer in the docs that filters are performed at spider time
This commit is contained in:
parent
3aef94c214
commit
bd1019e9fb
@ -13,7 +13,7 @@
|
|||||||
parameters come from the config file, and output goes to <code>stdout</code>.
|
parameters come from the config file, and output goes to <code>stdout</code>.
|
||||||
Anything written to <code>stderr</code> is logged as an ERROR message. If no
|
Anything written to <code>stderr</code> is logged as an ERROR message. If no
|
||||||
<code>stdout</code> is produced, the entry is not written to the cache or
|
<code>stdout</code> is produced, the entry is not written to the cache or
|
||||||
processed further.</p>
|
processed further; in fact, if the entry had previously been written to the cache, it will be removed.</p>
|
||||||
|
|
||||||
<p>Input to a filter is a aggressively
|
<p>Input to a filter is a aggressively
|
||||||
<a href="normalization.html">normalized</a> entry. For
|
<a href="normalization.html">normalized</a> entry. For
|
||||||
@ -54,6 +54,18 @@ instead of XPath expressions.</p>
|
|||||||
<h3>Notes</h3>
|
<h3>Notes</h3>
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
|
<li>Filters are executed when a feed is fetched, and the results are placed
|
||||||
|
into the cache. Changing a configuration file alone is not sufficient to
|
||||||
|
change the contents of the cache — typically that only occurs after
|
||||||
|
a feed is modified.</li>
|
||||||
|
|
||||||
|
<li>Filters are simply invoked in the order they are listed in the
|
||||||
|
configuration file (think unix pipes). Planet wide filters are executed before
|
||||||
|
feed specific filters.</li>
|
||||||
|
|
||||||
|
<li>Any filters listed in the <code>[planet]</code> section of your config.ini
|
||||||
|
will be invoked on all feeds. Filters listed in individual
|
||||||
|
<code>[feed]</code> sections will only be invoked on those feeds.</li>
|
||||||
|
|
||||||
<li>The file extension of the filter is significant. <code>.py</code> invokes
|
<li>The file extension of the filter is significant. <code>.py</code> invokes
|
||||||
python. <code>.xslt</code> involkes XSLT. <code>.sed</code> and
|
python. <code>.xslt</code> involkes XSLT. <code>.sed</code> and
|
||||||
@ -61,14 +73,6 @@ python. <code>.xslt</code> involkes XSLT. <code>.sed</code> and
|
|||||||
perl or ruby or class/jar (java), aren't supported at the moment, but these
|
perl or ruby or class/jar (java), aren't supported at the moment, but these
|
||||||
would be easy to add.</li>
|
would be easy to add.</li>
|
||||||
|
|
||||||
<li>Any filters listed in the <code>[planet]</code> section of your config.ini
|
|
||||||
will be invoked on all feeds. Filters listed in individual
|
|
||||||
<code>[feed]</code> sections will only be invoked on those feeds.</li>
|
|
||||||
|
|
||||||
<li>Filters are simply invoked in the order they are listed in the
|
|
||||||
configuration file (think unix pipes). Planet wide filters are executed before
|
|
||||||
feed specific filters.</li>
|
|
||||||
|
|
||||||
<li>Templates written using htmltmpl currently only have access to a fixed set
|
<li>Templates written using htmltmpl currently only have access to a fixed set
|
||||||
of fields, whereas XSLT templates have access to everything.</li>
|
of fields, whereas XSLT templates have access to everything.</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
@ -194,7 +194,9 @@ def writeCache(feed_uri, feed_info, data):
|
|||||||
for filter in config.filters(feed_uri):
|
for filter in config.filters(feed_uri):
|
||||||
output = shell.run(filter, output, mode="filter")
|
output = shell.run(filter, output, mode="filter")
|
||||||
if not output: break
|
if not output: break
|
||||||
if not output: continue
|
if not output:
|
||||||
|
if os.path.exists(cache_file): os.remove(cache_file)
|
||||||
|
continue
|
||||||
|
|
||||||
# write out and timestamp the results
|
# write out and timestamp the results
|
||||||
write(output, cache_file)
|
write(output, cache_file)
|
||||||
|
@ -73,6 +73,14 @@ class SpiderTest(unittest.TestCase):
|
|||||||
self.spiderFeed(testfeed % '1b')
|
self.spiderFeed(testfeed % '1b')
|
||||||
self.verify_spiderFeed()
|
self.verify_spiderFeed()
|
||||||
|
|
||||||
|
def test_spiderFeed_retroactive_filter(self):
|
||||||
|
config.load(configfile)
|
||||||
|
self.spiderFeed(testfeed % '1b')
|
||||||
|
self.assertEqual(5, len(glob.glob(workdir+"/*")))
|
||||||
|
config.parser.set('Planet', 'filter', 'two')
|
||||||
|
self.spiderFeed(testfeed % '1b')
|
||||||
|
self.assertEqual(1, len(glob.glob(workdir+"/*")))
|
||||||
|
|
||||||
def test_spiderUpdate(self):
|
def test_spiderUpdate(self):
|
||||||
config.load(configfile)
|
config.load(configfile)
|
||||||
self.spiderFeed(testfeed % '1a')
|
self.spiderFeed(testfeed % '1a')
|
||||||
|
Loading…
x
Reference in New Issue
Block a user