152 Commits

Author SHA1 Message Date
Sam Ruby
aeeca2a050 Fix infinite loop on errors when multithreading.
Patch was supplied by Lawrence Oluyede
2007-04-02 08:47:15 -04:00
Sam Ruby
5977be5ea4 Add support for creativeCommons and cc namespaces. 2007-03-26 17:42:41 -04:00
Sam Ruby
d1c1bd2c23 Resync with html5lib 2007-03-16 15:43:12 -04:00
Sam Ruby
bd1019e9fb Retroactive filtering,
and make it clearer in the docs that filters are performed at spider time
2007-03-14 08:16:04 -04:00
Sam Ruby
3aef94c214 Handle eventful.com atom feeds (w/nested Atom entries), example:
http://eventful.com/atom/events?page_size=50&sort_order=Date&within=25&units=miles&display=Medium&q=&l=San+Diego+metro+area&t=next+week&c=
2007-03-13 11:10:42 -04:00
Sam Ruby
4b1ff4884e Add support for new_feed_itmes 2007-03-07 13:51:23 -05:00
Sam Ruby
33fb5bdd86 Recovery from bizarrely nested content 2007-03-05 09:54:32 -05:00
Morten Frederiksen
1f79279f6c Merged with intertwingly... 2007-03-04 15:21:44 +01:00
Morten Frederiksen
806b0ee53c Updated expunge test cases to pass... 2007-03-04 14:07:16 +01:00
Morten Frederiksen
a51d09ec07 Added expunge and preliminary test cases 2007-03-04 12:00:28 +01:00
Antonio Cavedoni
fb29d39501 Added, tested and documented the new Config variable available in the Django templates/filters: makes it easy to access .ini configuration items from one's templates 2007-03-01 20:37:02 +01:00
Sam Ruby
7d14cbcf64 Django fixes 2007-02-15 19:48:44 -05:00
Sam Ruby
a0642afec0 regexp sifter 2007-02-15 19:09:10 -05:00
Antonio Cavedoni
7cd69ce7d7 Started testing Django integration, somewhat improved documentation 2007-02-16 00:51:53 +01:00
Antonio Cavedoni
81d10e1f5c Added Django template handler for Planet Venus 2007-02-14 01:07:30 +01:00
Sam Ruby
bc33615ced resync with html5lib (includes improved <pre> support) 2007-01-26 19:22:35 -05:00
Sam Ruby
32a1c49090 Pull in the latest httplib2 2007-01-23 20:09:58 -05:00
Sam Ruby
77d15d22cf xml_base overrides 2007-01-22 13:46:45 -05:00
Sam Ruby
631dd44ff0 Resync with feedvalidator 2007-01-15 20:52:31 -05:00
Sam Ruby
e96dcb61da Resync with html5lib (r491) 2007-01-15 20:22:55 -05:00
Sam Ruby
be5c093b34 Fix regression on handling non-bozo xhtml content and summaries 2007-01-13 10:51:46 -05:00
Sam Ruby
f2ac92465d Properly handle content type text/plain 2007-01-12 06:19:19 -05:00
Sam Ruby
3024af031f Switch from Beautiful Soup to html5lib 2007-01-11 15:05:30 -05:00
Sam Ruby
04f35b8cca Fix typo 2006-12-27 16:07:38 -05:00
Sam Ruby
6f7eddf0f0 Allow one to subscribe to planet feeds 2006-12-27 15:37:16 -05:00
Sam Ruby
9d7627781c Pull in Joe's fix for to consistently create the http cache directory 2006-12-27 13:38:25 -05:00
Joe Gregorio
7dd301cdf4 Moved creation of cache directory so that it is created consistently 2006-12-27 01:25:06 -05:00
Joe Gregorio
45eda384cb Fixed bug with config feed_timeouts not being ints. 2006-12-26 11:49:03 -05:00
Sam Ruby
6cc797ce0a added a new config option: future_dates 2006-12-07 18:31:45 -05:00
Sam Ruby
316a1afe5e Ensure planet information makes it into the source element 2006-11-26 11:10:47 -05:00
Sam Ruby
c20acf9944 Hash content to determine if it was modified 2006-11-22 12:31:22 -05:00
Sam Ruby
70f971750b Complete HttpThread refactoring 2006-11-21 09:11:52 -05:00
Sam Ruby
e85ae48722 More refactoring 2006-11-20 15:55:43 -05:00
Sam Ruby
c6c9bed994 Partial refactoring 2006-11-20 10:07:39 -05:00
Sam Ruby
20cb60df7c Resync with httplib2 2006-11-19 11:56:36 -05:00
Sam Ruby
1ce96ca53b Assign a css-id to each source 2006-11-16 20:18:34 -05:00
Sam Ruby
bf0c7b736d Fix regression where entry updated was always ignored 2006-11-16 15:51:27 -05:00
Sam Ruby
167f0de4da More bullet-proofing 2006-11-15 07:46:35 -05:00
Sam Ruby
6ebbed2ab7 Spider threads 2006-11-14 14:08:15 -05:00
Sam Ruby
ba25b691ff Fix windows regression 2006-11-14 11:05:09 -05:00
Sam Ruby
0df474c8ff Support backlevel versions of Python 2006-11-14 10:28:40 -05:00
Sam Ruby
88fd1b80ca support feedburner origlink relationship 2006-11-11 22:23:31 -05:00
Sam Ruby
41fd17f2c4 Continue to tweak logic to get entry updated time 2006-11-11 13:16:31 -05:00
Sam Ruby
fe3c1664c2 Only take the feed updated time the first time an entry is seen 2006-11-11 11:18:33 -05:00
Harry Fuecks
7696b0e9a7 Attempt to default to feed updated_parsed - response to problem handling RSS 0.9x feed - see http://lists.planetplanet.org/archives/devel/2006-November/001273.html 2006-11-10 15:08:18 +00:00
Joe Gregorio
45f0f92110 Switched to standard socket timeouts. http://mail.python.org/pipermail/python-list/2005-May/281697.html 2006-11-07 22:39:35 -05:00
Joe Gregorio
daec4769c7 Added in support for '-location' in httlib2 responses 2006-11-07 13:19:42 -05:00
Joe Gregorio
56a447e1be Updated to latest httplib2. Now deleting 'content-encoding' header from the httplib2 response before passing to feedparser 2006-11-05 22:48:30 -05:00
Joe Gregorio
4b9e85e4f7 reverted feedparser to HEAD, i.e. it doesn't need changes to be used with an external http client. Made the changes as suggested by Sam on how to get httplib2 and feedparser working together. Added a 'dict' attribute to httplib2.Response to get it to work as feedparser expects. 2006-11-05 22:00:05 -05:00
Sam Ruby
7ca2f56d49 More bullet proofing 2006-11-05 05:01:24 -05:00