137 Commits

Author SHA1 Message Date
Sam Ruby
bc33615ced resync with html5lib (includes improved <pre> support) 2007-01-26 19:22:35 -05:00
Sam Ruby
32a1c49090 Pull in the latest httplib2 2007-01-23 20:09:58 -05:00
Sam Ruby
77d15d22cf xml_base overrides 2007-01-22 13:46:45 -05:00
Sam Ruby
631dd44ff0 Resync with feedvalidator 2007-01-15 20:52:31 -05:00
Sam Ruby
e96dcb61da Resync with html5lib (r491) 2007-01-15 20:22:55 -05:00
Sam Ruby
be5c093b34 Fix regression on handling non-bozo xhtml content and summaries 2007-01-13 10:51:46 -05:00
Sam Ruby
f2ac92465d Properly handle content type text/plain 2007-01-12 06:19:19 -05:00
Sam Ruby
3024af031f Switch from Beautiful Soup to html5lib 2007-01-11 15:05:30 -05:00
Sam Ruby
04f35b8cca Fix typo 2006-12-27 16:07:38 -05:00
Sam Ruby
6f7eddf0f0 Allow one to subscribe to planet feeds 2006-12-27 15:37:16 -05:00
Sam Ruby
9d7627781c Pull in Joe's fix for to consistently create the http cache directory 2006-12-27 13:38:25 -05:00
Joe Gregorio
7dd301cdf4 Moved creation of cache directory so that it is created consistently 2006-12-27 01:25:06 -05:00
Joe Gregorio
45eda384cb Fixed bug with config feed_timeouts not being ints. 2006-12-26 11:49:03 -05:00
Sam Ruby
6cc797ce0a added a new config option: future_dates 2006-12-07 18:31:45 -05:00
Sam Ruby
316a1afe5e Ensure planet information makes it into the source element 2006-11-26 11:10:47 -05:00
Sam Ruby
c20acf9944 Hash content to determine if it was modified 2006-11-22 12:31:22 -05:00
Sam Ruby
70f971750b Complete HttpThread refactoring 2006-11-21 09:11:52 -05:00
Sam Ruby
e85ae48722 More refactoring 2006-11-20 15:55:43 -05:00
Sam Ruby
c6c9bed994 Partial refactoring 2006-11-20 10:07:39 -05:00
Sam Ruby
20cb60df7c Resync with httplib2 2006-11-19 11:56:36 -05:00
Sam Ruby
1ce96ca53b Assign a css-id to each source 2006-11-16 20:18:34 -05:00
Sam Ruby
bf0c7b736d Fix regression where entry updated was always ignored 2006-11-16 15:51:27 -05:00
Sam Ruby
167f0de4da More bullet-proofing 2006-11-15 07:46:35 -05:00
Sam Ruby
6ebbed2ab7 Spider threads 2006-11-14 14:08:15 -05:00
Sam Ruby
ba25b691ff Fix windows regression 2006-11-14 11:05:09 -05:00
Sam Ruby
0df474c8ff Support backlevel versions of Python 2006-11-14 10:28:40 -05:00
Sam Ruby
88fd1b80ca support feedburner origlink relationship 2006-11-11 22:23:31 -05:00
Sam Ruby
41fd17f2c4 Continue to tweak logic to get entry updated time 2006-11-11 13:16:31 -05:00
Sam Ruby
fe3c1664c2 Only take the feed updated time the first time an entry is seen 2006-11-11 11:18:33 -05:00
Harry Fuecks
7696b0e9a7 Attempt to default to feed updated_parsed - response to problem handling RSS 0.9x feed - see http://lists.planetplanet.org/archives/devel/2006-November/001273.html 2006-11-10 15:08:18 +00:00
Joe Gregorio
45f0f92110 Switched to standard socket timeouts. http://mail.python.org/pipermail/python-list/2005-May/281697.html 2006-11-07 22:39:35 -05:00
Joe Gregorio
daec4769c7 Added in support for '-location' in httlib2 responses 2006-11-07 13:19:42 -05:00
Joe Gregorio
56a447e1be Updated to latest httplib2. Now deleting 'content-encoding' header from the httplib2 response before passing to feedparser 2006-11-05 22:48:30 -05:00
Joe Gregorio
4b9e85e4f7 reverted feedparser to HEAD, i.e. it doesn't need changes to be used with an external http client. Made the changes as suggested by Sam on how to get httplib2 and feedparser working together. Added a 'dict' attribute to httplib2.Response to get it to work as feedparser expects. 2006-11-05 22:00:05 -05:00
Sam Ruby
7ca2f56d49 More bullet proofing 2006-11-05 05:01:24 -05:00
Sam Ruby
4c16d1fd96 Fix typo and cleanup log message 2006-11-04 23:07:49 -05:00
Joe Gregorio
b58d815a0d Fixed very weird bug where we would break on relative 301's, but *only* on the second attempt, i.e. only when reading the cache 301 redirect 2006-11-04 17:19:59 -05:00
Joe Gregorio
681eb117f8 Fixed one bug with passing non-2xx responses to feedparser. Also added a try/except to help debug the problem with 'content' undefined in httplib2. 2006-11-04 16:58:03 -05:00
Sam Ruby
13d9211b4d Bullet-proofing 2006-11-04 13:03:54 -05:00
Sam Ruby
eb0f29963b Don't indiscriminantly blast config information into the author name 2006-11-04 12:35:28 -05:00
Joe Gregorio
4569dba5e2 Moved httplib2 directory 2006-11-04 11:31:52 -05:00
Joe Gregorio
b2ccc8c1ff added 304 checking before calling spiderFeed() 2006-11-03 11:40:16 -05:00
Joe Gregorio
217e850e41 Still having problems with channel_name. 2006-11-02 14:48:47 -05:00
Joe Gregorio
58bb4b6e05 Seems to working now 2006-11-02 13:29:01 -05:00
Joe Gregorio
b9604d8330 Different approach to threading 2006-11-02 11:59:25 -05:00
Sam Ruby
405290aaab additional logging and installation information 2006-10-28 20:18:30 -04:00
Sam Ruby
dc6483dfbe More logging 2006-10-27 10:23:09 -04:00
Sam Ruby
d3838e1a5a Support rss images 2006-10-25 14:09:37 -04:00
Sam Ruby
2529bdd36a Add xml:lang to list of scrubbable attributes 2006-10-25 12:20:28 -04:00
Sam Ruby
fdaf129f9b Support dc:description 2006-10-25 10:09:11 -04:00