要使用Python的requests库进行网页爬取并提取数据,你可以按照以下步骤操作:
pip install requests
import requests
url = 'https://example.com' # 替换为你想要爬取的网址
response = requests.get(url)
if response.status_code == 200:
print('请求成功')
else:
print('请求失败,状态码:', response.status_code)
pip install beautifulsoup4
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
paragraphs = soup.find_all('p')
for p in paragraphs:
print(p.get_text())
class="title"
的元素的文本:titles = soup.select('.title')
for title in titles:
print(title.get_text())
title_list = [title.get_text() for title in titles]
with open('titles.txt', 'w', encoding='utf-8') as f:
for title in title_list:
f.write(title + '\n')
这只是一个简单的示例,实际爬虫可能需要处理更复杂的HTML结构和动态加载的内容。你可能需要学习更多关于requests和BeautifulSoup的知识,以便更好地满足你的需求。