温馨提示×

python spider爬虫如何进行异常处理

小樊
81
2024-12-12 04:51:47
栏目: 编程语言

在进行Python爬虫开发时,异常处理是确保程序稳定运行的关键。以下是一些常见的异常处理方法:

  1. 使用try-except: 在可能抛出异常的代码块中使用tryexcept块来捕获和处理异常。

    import requests
    
    try:
        response = requests.get('http://example.com')
        response.raise_for_status()  # 如果响应状态码不是200,会抛出HTTPError异常
    except requests.exceptions.HTTPError as e:
        print(f"HTTP Error: {e}")
    except requests.exceptions.RequestException as e:
        print(f"Request Exception: {e}")
    except Exception as e:
        print(f"Unexpected Error: {e}")
    else:
        print("Request successful")
        # 处理成功的响应
    
  2. 使用logging模块: 使用logging模块记录异常信息,以便后续分析和调试。

    import logging
    import requests
    
    logging.basicConfig(filename='spider.log', level=logging.ERROR)
    
    try:
        response = requests.get('http://example.com')
        response.raise_for_status()
    except requests.exceptions.HTTPError as e:
        logging.error(f"HTTP Error: {e}")
    except requests.exceptions.RequestException as e:
        logging.error(f"Request Exception: {e}")
    except Exception as e:
        logging.error(f"Unexpected Error: {e}")
    else:
        print("Request successful")
        # 处理成功的响应
    
  3. 使用finallyfinally块中的代码无论是否发生异常都会执行,适合用于清理资源。

    import requests
    
    try:
        response = requests.get('http://example.com')
        response.raise_for_status()
    except requests.exceptions.HTTPError as e:
        print(f"HTTP Error: {e}")
    except requests.exceptions.RequestException as e:
        print(f"Request Exception: {e}")
    except Exception as e:
        print(f"Unexpected Error: {e}")
    else:
        print("Request successful")
        # 处理成功的响应
    finally:
        print("Request completed")
    
  4. 使用asyncioaiohttp进行异步爬虫: 在异步爬虫中,可以使用try-except块来捕获和处理异常。

    import aiohttp
    import asyncio
    
    async def fetch(session, url):
        try:
            async with session.get(url) as response:
                response.raise_for_status()
                return await response.text()
        except aiohttp.ClientError as e:
            print(f"Client Error: {e}")
        except Exception as e:
            print(f"Unexpected Error: {e}")
    
    async def main():
        async with aiohttp.ClientSession() as session:
            html = await fetch(session, 'http://example.com')
            print(html)
    
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())
    

通过这些方法,可以有效地处理爬虫过程中可能出现的各种异常,确保程序的稳定性和可靠性。

0