Python requests模块基础使用方法实例及高级应用(自动登陆,抓取网页源码)实例详解

发布时间：2020-09-16 13:58:00 阅读：150 作者：返回主页小L 栏目：开发技术

Python开发者专用服务器限时活动，0元免费领，库存有限，领完即止！点击查看>>

1、Python requests模块说明

requests是使用Apache2 licensed 许可证的HTTP库。

用python编写。

比urllib2模块更简洁。

Request支持HTTP连接保持和连接池，支持使用cookie保持会话，支持文件上传，支持自动响应内容的编码，支持国际化的URL和POST数据自动编码。

在python内置模块的基础上进行了高度的封装，从而使得python进行网络请求时，变得人性化，使用Requests可以轻而易举的完成浏览器可有的任何操作。

现代，国际化，友好。

requests会自动实现持久连接keep-alive

2、Python requests模块基础入门

1）导入模块

import requests

2）发送请求的简洁

示例代码：获取一个网页（个人github）

import requests
r = requests.get('https://github.com/Ranxf')    # 最基本的不带参数的get请求
r1 = requests.get(url='http://dict.baidu.com/s', params={'wd': 'python'})   # 带参数的get请求

我们还可以使用requests模块其它请求方法

1 requests.get(‘https://github.com/timeline.json') # GET请求

2 requests.post(“http://httpbin.org/post”) # POST请求

3 requests.put(“http://httpbin.org/put”) # PUT请求

4 requests.delete(“http://httpbin.org/delete”) # DELETE请求

5 requests.head(“http://httpbin.org/get”) # HEAD请求

6 requests.options(“http://httpbin.org/get” ) # OPTIONS请求

3）为url传递参数

>>> url_params = {'key':'value'}    #  字典传递参数，如果值为None的键不会被添加到url中
>>> r = requests.get('your url',params = url_params)
>>> print(r.url)

　　your url?key=value

4）响应的内容

r.encoding #获取当前的编码

r.encoding = 'utf-8' #设置编码

r.text #以encoding解析返回内容。字符串方式的响应体，会自动根据响应头部的字符编码进行解码。

r.content #以字节形式（二进制）返回。字节方式的响应体，会自动为你解码 gzip 和 deflate 压缩。

r.headers #以字典对象存储服务器响应头，但是这个字典比较特殊，字典键不区分大小写，若键不存在则返回None

r.status_code #响应状态码

r.raw #返回原始响应体，也就是 urllib 的 response 对象，使用 r.raw.read()

r.ok # 查看r.ok的布尔值便可以知道是否登陆成功

#*特殊方法*#

r.json() #Requests中内置的JSON解码器，以json形式返回,前提返回的内容确保是json格式的，不然解析出错会抛异常

r.raise_for_status() #失败请求(非200响应)抛出异常

post发送json请求：

import requests
import json
 
r = requests.post('https://api.github.com/some/endpoint', data=json.dumps({'some': 'data'}))

print(r.json())

5）定制头和cookie信息

header = {'user-agent': 'my-app/0.0.1''}
cookie = {'key':'value'}
 r = requests.get/post('your url',headers=header,cookies=cookie) 
data = {'some': 'data'}
headers = {'content-type': 'application/json',
      'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'}
 
r = requests.post('https://api.github.com/some/endpoint', data=data, headers=headers)
print(r.text)

6）响应状态码

使用requests方法后，会返回一个response对象，其存储了服务器响应的内容，如上实例中已经提到的 r.text、r.status_code……

获取文本方式的响应体实例：当你访问 r.text 之时，会使用其响应的文本编码进行解码，并且你可以修改其编码让 r.text 使用自定义的编码进行解码。

r = requests.get('http://www.itwhy.org')
print(r.text, '\n{}\n'.format('*'*79), r.encoding)
r.encoding = 'GBK'
print(r.text, '\n{}\n'.format('*'*79), r.encoding)

示例代码：

import requests

r = requests.get('https://github.com/Ranxf')    # 最基本的不带参数的get请求
print(r.status_code)                # 获取返回状态
r1 = requests.get(url='http://dict.baidu.com/s', params={'wd': 'python'})   # 带参数的get请求
print(r1.url)
print(r1.text)    # 打印解码后的返回数据

运行结果：

/usr/bin/python3.5 /home/rxf/python3_1000/1000/python3_server/python3_requests/demo1.py

200

http://dict.baidu.com/s?wd=python

…………

Process finished with exit code 0

r.status_code #如果不是200，可以使用 r.raise_for_status() 抛出异常

7）响应

r.headers #返回字典类型,头信息

r.requests.headers #返回发送到服务器的头信息

r.cookies #返回cookie

r.history #返回重定向信息,当然可以在请求是加上allow_redirects = false 阻止重定向

8）超时

r = requests.get('url',timeout=1)      #设置秒数超时，仅对于连接有效

9)会话对象，能够跨请求保持某些参数

s = requests.Session()
s.auth = ('auth','passwd')
s.headers = {'key':'value'}
r = s.get('url')
r1 = s.get('url1')

10）代理

proxies = {'http':'ip1','https':'ip2' }
requests.get('url',proxies=proxies)

汇总：

# HTTP请求类型
# get类型
r = requests.get('https://github.com/timeline.json')
# post类型
r = requests.post("http://m.ctrip.com/post")
# put类型
r = requests.put("http://m.ctrip.com/put")
# delete类型
r = requests.delete("http://m.ctrip.com/delete")
# head类型
r = requests.head("http://m.ctrip.com/head")
# options类型
r = requests.options("http://m.ctrip.com/get")

# 获取响应内容
print(r.content) #以字节的方式去显示，中文显示为字符
print(r.text) #以文本的方式去显示

#URL传递参数
payload = {'keyword': '香港', 'salecityid': '2'}
r = requests.get("http://m.ctrip.com/webapp/tourvisa/visa_list", params=payload) 
print（r.url） #示例为http://m.ctrip.com/webapp/tourvisa/visa_list?salecityid=2&keyword=香港

#获取/修改网页编码
r = requests.get('https://github.com/timeline.json')
print （r.encoding）


#json处理
r = requests.get('https://github.com/timeline.json')
print（r.json()） # 需要先import json  

# 定制请求头
url = 'http://m.ctrip.com'
headers = {'User-Agent' : 'Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 4 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19'}
r = requests.post(url, headers=headers)
print （r.request.headers)

#复杂post请求
url = 'http://m.ctrip.com'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload)) #如果传递的payload是string而不是dict，需要先调用dumps方法格式化一下

# post多部分编码文件
url = 'http://m.ctrip.com'
files = {'file': open('report.xls', 'rb')}
r = requests.post(url, files=files)

# 响应状态码
r = requests.get('http://m.ctrip.com')
print(r.status_code)
  
# 响应头
r = requests.get('http://m.ctrip.com')
print (r.headers)
print (r.headers['Content-Type'])
print (r.headers.get('content-type')) #访问响应头部分内容的两种方式
  
# Cookies
url = 'http://example.com/some/cookie/setting/url'
r = requests.get(url)
r.cookies['example_cookie_name']  #读取cookies
  
url = 'http://m.ctrip.com/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies) #发送cookies

#设置超时时间
r = requests.get('http://m.ctrip.com', timeout=0.001)

#设置访问代理
proxies = {
      "http": "http://10.10.1.10:3128",
      "https": "http://10.10.1.100:4444",
     }
r = requests.get('http://m.ctrip.com', proxies=proxies)


#如果代理需要用户名和密码，则需要这样：
proxies = {
  "http": "http://user:pass@10.10.1.10:3128/",
}

# HTTP请求类型
# get类型
r = requests.get('https://github.com/timeline.json')
# post类型
r = requests.post("http://m.ctrip.com/post")
# put类型
r = requests.put("http://m.ctrip.com/put")
# delete类型
r = requests.delete("http://m.ctrip.com/delete")
# head类型
r = requests.head("http://m.ctrip.com/head")
# options类型
r = requests.options("http://m.ctrip.com/get")

# 获取响应内容
print(r.content) #以字节的方式去显示，中文显示为字符
print(r.text) #以文本的方式去显示

#URL传递参数
payload = {'keyword': '香港', 'salecityid': '2'}
r = requests.get("http://m.ctrip.com/webapp/tourvisa/visa_list", params=payload) 
print（r.url） #示例为http://m.ctrip.com/webapp/tourvisa/visa_list?salecityid=2&keyword=香港

#获取/修改网页编码
r = requests.get('https://github.com/timeline.json')
print （r.encoding）


#json处理
r = requests.get('https://github.com/timeline.json')
print（r.json()） # 需要先import json  

# 定制请求头
url = 'http://m.ctrip.com'
headers = {'User-Agent' : 'Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 4 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19'}
r = requests.post(url, headers=headers)
print （r.request.headers)

#复杂post请求
url = 'http://m.ctrip.com'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload)) #如果传递的payload是string而不是dict，需要先调用dumps方法格式化一下

# post多部分编码文件
url = 'http://m.ctrip.com'
files = {'file': open('report.xls', 'rb')}
r = requests.post(url, files=files)

# 响应状态码
r = requests.get('http://m.ctrip.com')
print(r.status_code)
  
# 响应头
r = requests.get('http://m.ctrip.com')
print (r.headers)
print (r.headers['Content-Type'])
print (r.headers.get('content-type')) #访问响应头部分内容的两种方式
  
# Cookies
url = 'http://example.com/some/cookie/setting/url'
r = requests.get(url)
r.cookies['example_cookie_name']  #读取cookies
  
url = 'http://m.ctrip.com/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies) #发送cookies

#设置超时时间
r = requests.get('http://m.ctrip.com', timeout=0.001)

#设置访问代理
proxies = {
      "http": "http://10.10.1.10:3128",
      "https": "http://10.10.1.100:4444",
     }
r = requests.get('http://m.ctrip.com', proxies=proxies)


#如果代理需要用户名和密码，则需要这样：
proxies = {
  "http": "http://user:pass@10.10.1.10:3128/",
}

3、示例代码

GET请求

# 1、无参数实例
 
import requests
 
ret = requests.get('https://github.com/timeline.json')
 
print(ret.url)
print(ret.text)
 
 
 
# 2、有参数实例
 
import requests
 
payload = {'key1': 'value1', 'key2': 'value2'}
ret = requests.get("http://httpbin.org/get", params=payload)
 
print(ret.url)
print(ret.text)

POST请求

# 1、基本POST实例
 
import requests
 
payload = {'key1': 'value1', 'key2': 'value2'}
ret = requests.post("http://httpbin.org/post", data=payload)
 
print(ret.text)
 
 
# 2、发送请求头和数据实例
 
import requests
import json
 
url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}
 
ret = requests.post(url, data=json.dumps(payload), headers=headers)
 
print(ret.text)
print(ret.cookies)

请求参数

def request(method, url, **kwargs):

"""Constructs and sends a :class:`Request <Request>`.

:param method: method for the new :class:`Request` object.

:param url: URL for the new :class:`Request` object.

:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.

:param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.

:param json: (optional) json data to send in the body of the :class:`Request`.

:param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.

:param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.

:param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.

``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``

or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string

defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers

to add for the file.

:param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.

:param timeout: (optional) How long to wait for the server to send data

before giving up, as a float, or a :ref:`(connect timeout, read

timeout) <timeouts>` tuple.

:type timeout: float or tuple

:param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.

:type allow_redirects: bool

:param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.

:param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.

:param stream: (optional) if ``False``, the response content will be immediately downloaded.

:param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.

:return: :class:`Response <Response>` object

:rtype: requests.Response

Usage::

>>> import requests

>>> req = requests.request('GET', 'http://httpbin.org/get')

<Response [200]>

参数示例代码

def param_method_url():
  # requests.request(method='get', url='http://127.0.0.1:8000/test/')
  # requests.request(method='post', url='http://127.0.0.1:8000/test/')
  pass


def param_param():
  # - 可以是字典
  # - 可以是字符串
  # - 可以是字节（ascii编码以内）

  # requests.request(method='get',
  # url='http://127.0.0.1:8000/test/',
  # params={'k1': 'v1', 'k2': '水电费'})

  # requests.request(method='get',
  # url='http://127.0.0.1:8000/test/',
  # params="k1=v1&k2=水电费&k3=v3&k3=vv3")

  # requests.request(method='get',
  # url='http://127.0.0.1:8000/test/',
  # params=bytes("k1=v1&k2=k2&k3=v3&k3=vv3", encoding='utf8'))

  # 错误
  # requests.request(method='get',
  # url='http://127.0.0.1:8000/test/',
  # params=bytes("k1=v1&k2=水电费&k3=v3&k3=vv3", encoding='utf8'))
  pass


def param_data():
  # 可以是字典
  # 可以是字符串
  # 可以是字节
  # 可以是文件对象

  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # data={'k1': 'v1', 'k2': '水电费'})

  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # data="k1=v1; k2=v2; k3=v3; k3=v4"
  # )

  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # data="k1=v1;k2=v2;k3=v3;k3=v4",
  # headers={'Content-Type': 'application/x-www-form-urlencoded'}
  # )

  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # data=open('data_file.py', mode='r', encoding='utf-8'), # 文件内容是：k1=v1;k2=v2;k3=v3;k3=v4
  # headers={'Content-Type': 'application/x-www-form-urlencoded'}
  # )
  pass


def param_json():
  # 将json中对应的数据进行序列化成一个字符串，json.dumps(...)
  # 然后发送到服务器端的body中，并且Content-Type是 {'Content-Type': 'application/json'}
  requests.request(method='POST',
           url='http://127.0.0.1:8000/test/',
           json={'k1': 'v1', 'k2': '水电费'})


def param_headers():
  # 发送请求头到服务器端
  requests.request(method='POST',
           url='http://127.0.0.1:8000/test/',
           json={'k1': 'v1', 'k2': '水电费'},
           headers={'Content-Type': 'application/x-www-form-urlencoded'}
           )


def param_cookies():
  # 发送Cookie到服务器端
  requests.request(method='POST',
           url='http://127.0.0.1:8000/test/',
           data={'k1': 'v1', 'k2': 'v2'},
           cookies={'cook1': 'value1'},
           )
  # 也可以使用CookieJar（字典形式就是在此基础上封装）
  from http.cookiejar import CookieJar
  from http.cookiejar import Cookie

  obj = CookieJar()
  obj.set_cookie(Cookie(version=0, name='c1', value='v1', port=None, domain='', path='/', secure=False, expires=None,
             discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False,
             port_specified=False, domain_specified=False, domain_initial_dot=False, path_specified=False)
          )
  requests.request(method='POST',
           url='http://127.0.0.1:8000/test/',
           data={'k1': 'v1', 'k2': 'v2'},
           cookies=obj)


def param_files():
  # 发送文件
  # file_dict = {
  # 'f1': open('readme', 'rb')
  # }
  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # files=file_dict)

  # 发送文件，定制文件名
  # file_dict = {
  # 'f1': ('test.txt', open('readme', 'rb'))
  # }
  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # files=file_dict)

  # 发送文件，定制文件名
  # file_dict = {
  # 'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf")
  # }
  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # files=file_dict)

  # 发送文件，定制文件名
  # file_dict = {
  #   'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf", 'application/text', {'k1': '0'})
  # }
  # requests.request(method='POST',
  #         url='http://127.0.0.1:8000/test/',
  #         files=file_dict)

  pass


def param_auth():
  from requests.auth import HTTPBasicAuth, HTTPDigestAuth

  ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('wupeiqi', 'sdfasdfasdf'))
  print(ret.text)

  # ret = requests.get('http://192.168.1.1',
  # auth=HTTPBasicAuth('admin', 'admin'))
  # ret.encoding = 'gbk'
  # print(ret.text)

  # ret = requests.get('http://httpbin.org/digest-auth/auth/user/pass', auth=HTTPDigestAuth('user', 'pass'))
  # print(ret)
  #


def param_timeout():
  # ret = requests.get('http://google.com/', timeout=1)
  # print(ret)

  # ret = requests.get('http://google.com/', timeout=(5, 1))
  # print(ret)
  pass


def param_allow_redirects():
  ret = requests.get('http://127.0.0.1:8000/test/', allow_redirects=False)
  print(ret.text)


def param_proxies():
  # proxies = {
  # "http": "61.172.249.96:80",
  # "https": "http://61.185.219.126:3128",
  # }

  # proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}

  # ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies)
  # print(ret.headers)


  # from requests.auth import HTTPProxyAuth
  #
  # proxyDict = {
  # 'http': '77.75.105.165',
  # 'https': '77.75.105.165'
  # }
  # auth = HTTPProxyAuth('username', 'mypassword')
  #
  # r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)
  # print(r.text)

  pass


def param_stream():
  ret = requests.get('http://127.0.0.1:8000/test/', stream=True)
  print(ret.content)
  ret.close()

  # from contextlib import closing
  # with closing(requests.get('http://httpbin.org/get', stream=True)) as r:
  # # 在此处理响应。
  # for i in r.iter_content():
  # print(i)


def requests_session():
  import requests

  session = requests.Session()

  ### 1、首先登陆任何页面，获取cookie

  i1 = session.get(url="http://dig.chouti.com/help/service")

  ### 2、用户登陆，携带上一次的cookie，后台对cookie中的 gpsd 进行授权
  i2 = session.post(
    url="http://dig.chouti.com/login",
    data={
      'phone': "8615131255089",
      'password': "xxxxxx",
      'oneMonth': ""
    }
  )

  i3 = session.post(
    url="http://dig.chouti.com/link/vote?linksId=8589623",
  )
  print(i3.text)

json请求：

#! /usr/bin/python3
import requests
import json


class url_request():
  def __init__(self):
    ''' init '''

if __name__ == '__main__':
  heard = {'Content-Type': 'application/json'}
  payload = {'CountryName': '中国',
        'ProvinceName': '四川省',
        'L1CityName': 'chengdu',
        'L2CityName': 'yibing',
        'TownName': '',
        'Longitude': '107.33393',
        'Latitude': '33.157131',
        'Language': 'CN'}
  r = requests.post("http://www.xxxxxx.com/CityLocation/json/LBSLocateCity", heards=heard, data=payload)
  data = r.json()
  if r.status_code!=200:
    print('LBSLocateCity API Error' + str(r.status_code))
  print(data['CityEntities'][0]['CityID']) # 打印返回json中的某个key的value
  print(data['ResponseStatus']['Ack'])
  print(json.dump(data, indent=4, sort_keys=True, ensure_ascii=False)) # 树形打印json，ensure_ascii必须设为False否则中文会显示为unicode

Xml请求：

#! /usr/bin/python3
import requests

class url_request():
  def __init__(self):
    """init"""

if __name__ == '__main__':
  heards = {'Content-type': 'text/xml'}
  XML = '<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><Request xmlns="http://tempuri.org/"><jme><JobClassFullName>WeChatJSTicket.JobWS.Job.JobRefreshTicket,WeChatJSTicket.JobWS</JobClassFullName><Action>RUN</Action><Param>1</Param><HostIP>127.0.0.1</HostIP><JobInfo>1</JobInfo><NeedParallel>false</NeedParallel></jme></Request></soap:Body></soap:Envelope>'
  url = 'http://jobws.push.mobile.xxxxxxxx.com/RefreshWeiXInTokenJob/RefreshService.asmx'
  r = requests.post(url=url, heards=heards, data=XML)
  data = r.text
  print(data)

状态异常处理

import requests

URL = 'http://ip.taobao.com/service/getIpInfo.php' # 淘宝IP地址库API
try:
  r = requests.get(URL, params={'ip': '8.8.8.8'}, timeout=1)
  r.raise_for_status() # 如果响应状态码不是 200，就主动抛出异常
except requests.RequestException as e:
  print(e)
else:
  result = r.json()
  print(type(result), result, sep='\n')

上传文件

使用request模块，也可以上传文件，文件的类型会自动进行处理：

import requests
 
url = 'http://127.0.0.1:8080/upload'
files = {'file': open('/home/rxf/test.jpg', 'rb')}
#files = {'file': ('report.jpg', open('/home/lyb/sjzl.mpg', 'rb'))}   #显式的设置文件名
 
r = requests.post(url, files=files)
print(r.text)

request更加方便的是，可以把字符串当作文件进行上传：

import requests
 
url = 'http://127.0.0.1:8080/upload'
files = {'file': ('test.txt', b'Hello Requests.')}   #必需显式的设置文件名
 
r = requests.post(url, files=files)
print(r.text)

身份验证

基本身份认证(HTTP Basic Auth)

import requests
from requests.auth import HTTPBasicAuth
 
r = requests.get('https://httpbin.org/hidden-basic-auth/user/passwd', auth=HTTPBasicAuth('user', 'passwd'))
# r = requests.get('https://httpbin.org/hidden-basic-auth/user/passwd', auth=('user', 'passwd'))  # 简写
print(r.json())

另一种非常流行的HTTP身份认证形式是摘要式身份认证，Requests对它的支持也是开箱即可用的:

requests.get(URL, auth=HTTPDigestAuth('user', 'pass')

Cookies与会话对象

如果某个响应中包含一些Cookie，你可以快速访问它们：

import requests
 
r = requests.get('http://www.google.com.hk/')
print(r.cookies['NID'])
print(tuple(r.cookies))

要想发送你的cookies到服务器，可以使用 cookies 参数：

import requests
 
url = 'http://httpbin.org/cookies'
cookies = {'testCookies_1': 'Hello_Python3', 'testCookies_2': 'Hello_Requests'}
# 在Cookie Version 0中规定空格、方括号、圆括号、等于号、逗号、双引号、斜杠、问号、@，冒号，分号等特殊符号都不能作为Cookie的内容。
r = requests.get(url, cookies=cookies)
print(r.json())

会话对象让你能够跨请求保持某些参数，最方便的是在同一个Session实例发出的所有请求之间保持cookies，且这些都是自动处理的，甚是方便。

下面就来一个真正的实例，如下是快盘签到脚本：

import requests
 
headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Encoding': 'gzip, deflate, compress',
      'Accept-Language': 'en-us;q=0.5,en;q=0.3',
      'Cache-Control': 'max-age=0',
      'Connection': 'keep-alive',
      'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'}
 
s = requests.Session()
s.headers.update(headers)
# s.auth = ('superuser', '123')
s.get('https://www.kuaipan.cn/account_login.htm')
 
_URL = 'http://www.kuaipan.cn/index.php'
s.post(_URL, params={'ac':'account', 'op':'login'},
    data={'username':'****@foxmail.com', 'userpwd':'********', 'isajax':'yes'})
r = s.get(_URL, params={'ac':'zone', 'op':'taskdetail'})
print(r.json())
s.get(_URL, params={'ac':'common', 'op':'usersign'})

requests模块抓取网页源码并保存到文件示例

这是一个基本的文件保存操作，但这里有几个值得注意的问题：

1.安装requests包，命令行输入pip install requests即可自动安装。很多人推荐使用requests，自带的urllib.request也可以抓取网页源码

2.open方法encoding参数设为utf-8，否则保存的文件会出现乱码。

3.如果直接在cmd中输出抓取的内容，会提示各种编码错误，所以保存到文件查看。

4.with open方法是更好的写法，可以自动操作完毕后释放资源

Python requests模块抽屉自动登录

#! /urs/bin/python3
import requests

'''requests模块抓取网页源码并保存到文件示例'''
html = requests.get("http://www.baidu.com")
with open('test.txt', 'w', encoding='utf-8') as f:
  f.write(html.text)
  
'''读取一个txt文件，每次读取一行，并保存到另一个txt文件中的示例'''
ff = open('testt.txt', 'w', encoding='utf-8')
with open('test.txt', encoding="utf-8") as f:
  for line in f:
    ff.write(line)
    ff.close()

因为在命令行中打印每次读取一行的数据，中文会出现编码错误，所以每次读取一行并保存到另一个文件，这样来测试读取是否正常。（注意open的时候制定encoding编码方式）

Python requests模块自动登陆实例：

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests


# ############## 方式一 ##############
"""
# ## 1、首先登陆任何页面，获取cookie
i1 = requests.get(url="http://dig.chouti.com/help/service")
i1_cookies = i1.cookies.get_dict()

# ## 2、用户登陆，携带上一次的cookie，后台对cookie中的 gpsd 进行授权
i2 = requests.post(
  url="http://dig.chouti.com/login",
  data={
    'phone': "8615131255089",
    'password': "xxooxxoo",
    'oneMonth': ""
  },
  cookies=i1_cookies
)

# ## 3、点赞（只需要携带已经被授权的gpsd即可）
gpsd = i1_cookies['gpsd']
i3 = requests.post(
  url="http://dig.chouti.com/link/vote?linksId=8589523",
  cookies={'gpsd': gpsd}
)

print(i3.text)
"""


# ############## 方式二 ##############
"""
import requests

session = requests.Session()
i1 = session.get(url="http://dig.chouti.com/help/service")
i2 = session.post(
  url="http://dig.chouti.com/login",
  data={
    'phone': "8615131255089",
    'password': "xxooxxoo",
    'oneMonth': ""
  }
)
i3 = session.post(
  url="http://dig.chouti.com/link/vote?linksId=8589523"
)
print(i3.text)

"""

Python requests模块github自动登录

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import requests
from bs4 import BeautifulSoup

# ############## 方式一 ##############
#
# # 1. 访问登陆页面，获取 authenticity_token
# i1 = requests.get('https://github.com/login')
# soup1 = BeautifulSoup(i1.text, features='lxml')
# tag = soup1.find(name='input', attrs={'name': 'authenticity_token'})
# authenticity_token = tag.get('value')
# c1 = i1.cookies.get_dict()
# i1.close()
#
# # 1. 携带authenticity_token和用户名密码等信息，发送用户验证
# form_data = {
# "authenticity_token": authenticity_token,
#   "utf8": "",
#   "commit": "Sign in",
#   "login": "wupeiqi@live.com",
#   'password': 'xxoo'
# }
#
# i2 = requests.post('https://github.com/session', data=form_data, cookies=c1)
# c2 = i2.cookies.get_dict()
# c1.update(c2)
# i3 = requests.get('https://github.com/settings/repositories', cookies=c1)
#
# soup3 = BeautifulSoup(i3.text, features='lxml')
# list_group = soup3.find(name='div', class_='listgroup')
#
# from bs4.element import Tag
#
# for child in list_group.children:
#   if isinstance(child, Tag):
#     project_tag = child.find(name='a', class_='mr-1')
#     size_tag = child.find(name='small')
#     temp = "项目:%s(%s); 项目路径:%s" % (project_tag.get('href'), size_tag.string, project_tag.string, )
#     print(temp)



# ############## 方式二 ##############
# session = requests.Session()
# # 1. 访问登陆页面，获取 authenticity_token
# i1 = session.get('https://github.com/login')
# soup1 = BeautifulSoup(i1.text, features='lxml')
# tag = soup1.find(name='input', attrs={'name': 'authenticity_token'})
# authenticity_token = tag.get('value')
# c1 = i1.cookies.get_dict()
# i1.close()
#
# # 1. 携带authenticity_token和用户名密码等信息，发送用户验证
# form_data = {
#   "authenticity_token": authenticity_token,
#   "utf8": "",
#   "commit": "Sign in",
#   "login": "wupeiqi@live.com",
#   'password': 'xxoo'
# }
#
# i2 = session.post('https://github.com/session', data=form_data)
# c2 = i2.cookies.get_dict()
# c1.update(c2)
# i3 = session.get('https://github.com/settings/repositories')
#
# soup3 = BeautifulSoup(i3.text, features='lxml')
# list_group = soup3.find(name='div', class_='listgroup')
#
# from bs4.element import Tag
#
# for child in list_group.children:
#   if isinstance(child, Tag):
#     project_tag = child.find(name='a', class_='mr-1')
#     size_tag = child.find(name='small')
#     temp = "项目:%s(%s); 项目路径:%s" % (project_tag.get('href'), size_tag.string, project_tag.string, )
#     print(temp)

Python requests模块知乎自动登录

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import time

import requests
from bs4 import BeautifulSoup

session = requests.Session()

i1 = session.get(
  url='https://www.zhihu.com/#signin',
  headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
  }
)

soup1 = BeautifulSoup(i1.text, 'lxml')
xsrf_tag = soup1.find(name='input', attrs={'name': '_xsrf'})
xsrf = xsrf_tag.get('value')

current_time = time.time()
i2 = session.get(
  url='https://cache.yisu.com/upload/information/20200622/113/12088.gif',
  params={'r': current_time, 'type': 'login'},
  headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
  })

with open('zhihu.gif', 'wb') as f:
  f.write(i2.content)

captcha = input('请打开zhihu.gif文件，查看并输入验证码：')
form_data = {
  "_xsrf": xsrf,
  'password': 'xxooxxoo',
  "captcha": 'captcha',
  'email': '424662508@qq.com'
}
i3 = session.post(
  url='https://www.zhihu.com/login/email',
  data=form_data,
  headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
  }
)

i4 = session.get(
  url='https://www.zhihu.com/settings/profile',
  headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
  }
)

soup4 = BeautifulSoup(i4.text, 'lxml')
tag = soup4.find(id='rename-section')
nick_name = tag.find('span',class_='name').string
print(nick_name)

Python requests模块博客园自动登录

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import re
import json
import base64

import rsa
import requests


def js_encrypt(text):
  b64der = 'MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCp0wHYbg/NOPO3nzMD3dndwS0MccuMeXCHgVlGOoYyFwLdS24Im2e7YyhB0wrUsyYf0/nhzCzBK8ZC9eCWqd0aHbdgOQT6CuFQBMjbyGYvlVYU2ZP7kG9Ft6YV6oc9ambuO7nPZh+bvXH0zDKfi02prknrScAKC0XhadTHT3Al0QIDAQAB'
  der = base64.standard_b64decode(b64der)

  pk = rsa.PublicKey.load_pkcs1_openssl_der(der)
  v1 = rsa.encrypt(bytes(text, 'utf8'), pk)
  value = base64.encodebytes(v1).replace(b'\n', b'')
  value = value.decode('utf8')

  return value


session = requests.Session()

i1 = session.get('https://passport.cnblogs.com/user/signin')
rep = re.compile("'VerificationToken': '(.*)'")
v = re.search(rep, i1.text)
verification_token = v.group(1)

form_data = {
  'input1': js_encrypt('wptawy'),
  'input2': js_encrypt('asdfasdf'),
  'remember': False
}

i2 = session.post(url='https://passport.cnblogs.com/user/signin',
         data=json.dumps(form_data),
         headers={
           'Content-Type': 'application/json; charset=UTF-8',
           'X-Requested-With': 'XMLHttpRequest',
           'VerificationToken': verification_token}
         )

i3 = session.get(url='https://i.cnblogs.com/EditDiary.aspx')

print(i3.text)

Python requests模块拉勾网自动登录

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import requests


# 第一步：访问登陆页,拿到X_Anti_Forge_Token，X_Anti_Forge_Code
# 1、请求url:https://passport.lagou.com/login/login.html
# 2、请求方法:GET
# 3、请求头:
#  User-agent
r1 = requests.get('https://passport.lagou.com/login/login.html',
         headers={
           'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
         },
         )

X_Anti_Forge_Token = re.findall("X_Anti_Forge_Token = '(.*?)'", r1.text, re.S)[0]
X_Anti_Forge_Code = re.findall("X_Anti_Forge_Code = '(.*?)'", r1.text, re.S)[0]
print(X_Anti_Forge_Token, X_Anti_Forge_Code)
# print(r1.cookies.get_dict())
# 第二步：登陆
# 1、请求url:https://passport.lagou.com/login/login.json
# 2、请求方法:POST
# 3、请求头:
#  cookie
#  User-agent
#  Referer:https://passport.lagou.com/login/login.html
#  X-Anit-Forge-Code:53165984
#  X-Anit-Forge-Token:3b6a2f62-80f0-428b-8efb-ef72fc100d78
#  X-Requested-With:XMLHttpRequest
# 4、请求体：
# isValidate:true
# username:15131252215
# password:ab18d270d7126ea65915c50288c22c0d
# request_form_verifyCode:''
# submit:''
r2 = requests.post(
  'https://passport.lagou.com/login/login.json',
  headers={
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
    'Referer': 'https://passport.lagou.com/login/login.html',
    'X-Anit-Forge-Code': X_Anti_Forge_Code,
    'X-Anit-Forge-Token': X_Anti_Forge_Token,
    'X-Requested-With': 'XMLHttpRequest'
  },
  data={
    "isValidate": True,
    'username': '15131255089',
    'password': 'ab18d270d7126ea65915c50288c22c0d',
    'request_form_verifyCode': '',
    'submit': ''
  },
  cookies=r1.cookies.get_dict()
)
print(r2.text)

更多关于Python requests模块基础使用方法请查看下面的相关链接

亿速云「云服务器」，即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘，价格低至29元/月。点击查看>>

向AI问一下细节

Python requests模块基础使用方法实例及高级应用(自动登陆,抓取网页源码)实例详解

1、Python requests模块说明

2、Python requests模块基础入门

1）导入模块

2）发送请求的简洁

3）为url传递参数

4）响应的内容

post发送json请求：

5）定制头和cookie信息

6）响应状态码

7）响应

8）超时

9)会话对象，能够跨请求保持某些参数

10）代理

3、示例代码

GET请求

POST请求

json请求：

Xml请求：

状态异常处理

上传文件

身份验证

Cookies与会话对象

requests模块抓取网页源码并保存到文件示例

猜你喜欢

最新资讯

相关推荐

开发者交流群：

相关标签