python spider爬虫如何进行异常处理

在进行Python爬虫开发时，异常处理是确保程序稳定运行的关键。以下是一些常见的异常处理方法：

使用try-except块：在可能抛出异常的代码块中使用try和except块来捕获和处理异常。

import requests

try:
    response = requests.get('http://example.com')
    response.raise_for_status()  # 如果响应状态码不是200，会抛出HTTPError异常
except requests.exceptions.HTTPError as e:
    print(f"HTTP Error: {e}")
except requests.exceptions.RequestException as e:
    print(f"Request Exception: {e}")
except Exception as e:
    print(f"Unexpected Error: {e}")
else:
    print("Request successful")
    # 处理成功的响应

使用logging模块：使用logging模块记录异常信息，以便后续分析和调试。

import logging
import requests

logging.basicConfig(filename='spider.log', level=logging.ERROR)

try:
    response = requests.get('http://example.com')
    response.raise_for_status()
except requests.exceptions.HTTPError as e:
    logging.error(f"HTTP Error: {e}")
except requests.exceptions.RequestException as e:
    logging.error(f"Request Exception: {e}")
except Exception as e:
    logging.error(f"Unexpected Error: {e}")
else:
    print("Request successful")
    # 处理成功的响应

使用finally块： finally块中的代码无论是否发生异常都会执行，适合用于清理资源。

import requests

try:
    response = requests.get('http://example.com')
    response.raise_for_status()
except requests.exceptions.HTTPError as e:
    print(f"HTTP Error: {e}")
except requests.exceptions.RequestException as e:
    print(f"Request Exception: {e}")
except Exception as e:
    print(f"Unexpected Error: {e}")
else:
    print("Request successful")
    # 处理成功的响应
finally:
    print("Request completed")

使用asyncio和aiohttp进行异步爬虫：在异步爬虫中，可以使用try-except块来捕获和处理异常。

import aiohttp
import asyncio

async def fetch(session, url):
    try:
        async with session.get(url) as response:
            response.raise_for_status()
            return await response.text()
    except aiohttp.ClientError as e:
        print(f"Client Error: {e}")
    except Exception as e:
        print(f"Unexpected Error: {e}")

async def main():
    async with aiohttp.ClientSession() as session:
        html = await fetch(session, 'http://example.com')
        print(html)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

通过这些方法，可以有效地处理爬虫过程中可能出现的各种异常，确保程序的稳定性和可靠性。

最新问答

相关标签