Python通过HTTP协议定期抓取文件

#!usr/bin/python

import urllib2,time;
class ErrorHandler(urllib2.HTTPDefaultErrorHandler):
    def http_error_default(self, req, fp, code, msg, headers):
        result = urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp)
        result.status = code
        return result

URL='http://www.ibm.com/developerworks/js/ajax1.js'
req=urllib2.Request(URL)
mgr=urllib2.build_opener(ErrorHandler())

while True:
    ns=mgr.open(req)
    if(ns.headers.has_key('last-modified')):
        modified=ns.headers.get('last-modified')
    if(ns.code==304):
        print '''
          ==============================
              NOT MODIFIED
          ==============================
        '''
    elif(ns.code==200):
        print ns.read()
    else:
        print 'there is an error';
        
    if(not locals().has_key('modified')):
        modified=time.time();
    req.add_header('If-Modified-Since',modified)
    time.sleep(10)

本文链接 http://tec.5lulu.com/code/wcn6hbw2tw8ic9.html

我来评分 :6
0

转载注明:转自5lulu技术库

本站遵循:署名-非商业性使用-禁止演绎 3.0 共享协议

www.5lulu.com