网站爬虫限制默认在心中
robots.txt
爬一个网站怎么预测爬的量
每个网站都使用各种各样的技术,怎么确定网站使用的技术
pip install builtwith
>>> import builtwith
>>> builtwith.parse('http://www.douban.com')
{u'javascript-frameworks': [u'jQuery'], u'tag-managers': [u'Google Tag Manager'], u'analytics': [u'Piwik']}
#网站的所属者
pip install python-whois
>>> print whois.whois('cnblogs.com')
{
"updated_date": [
"2014-11-12 00:00:00",
"2014-11-12 01:07:15"
],
"status": [
"clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited",
"clientTransferProhibited https://icann.org/epp#clientTransferProhibited"
],
"name": "du yong",
"dnssec": "unsigned",
"city": "Shanghai",
"expiration_date": [
"2021-11-12 00:00:00",
"2021-11-11 04:00:00"
],
"zipcode": "201203",
"domain_name": [
"CNBLOGS.COM",
"cnblogs.com"
],
"country": "CN",
"whois_server": "whois.35.com",
"state": "Shanghai",
"registrar": "35 Technology Co., Ltd.",
"referral_url": "http://www.35.com",
"address": "Room 312, No.22 BOXIA Rd, Pudong New District",
"name_servers": [
"NS3.DNSV4.COM",
"NS4.DNSV4.COM",
"ns3.dnsv4.com",
"ns4.dnsv4.com"
],
"org": "Shanghai Yucheng Information Technology Co. Ltd.",
"creation_date": [
"2003-11-12 00:00:00",
"2003-11-11 04:00:00"
],
"emails": [
"abuse@35.cn",
"dudu.yz@gmail.com"
]
}
亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。