js爬虫怎样控制爬取速度

小樊

2024-11-26 16:52:24

栏目: 编程语言

在JavaScript中，可以使用setTimeout函数来控制爬虫的速度。setTimeout函数允许你在指定的毫秒数后执行一个函数。为了控制爬虫速度，你可以在每次爬取网页后使用setTimeout函数设置一个延迟。

以下是一个简单的示例，展示了如何使用setTimeout控制爬虫速度：

const axios = require('axios');
const cheerio = require('cheerio');

// 爬取函数
async function fetch(url) {
  try {
    const response = await axios.get(url);
    const $ = cheerio.load(response.data);
    // 在这里解析网页内容，提取所需数据
    console.log($('selector').text());
  } catch (error) {
    console.error(`Error fetching ${url}:`, error);
  }
}

// 控制爬取速度的函数
function sleep(ms) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

// 主程序
async function main() {
  const urls = [
    'https://example.com/page1',
    'https://example.com/page2',
    'https://example.com/page3',
    // ...
  ];

  for (const url of urls) {
    await fetch(url);
    await sleep(1000); // 设置延迟1秒（1000毫秒）
  }
}

main();

在这个示例中，我们首先使用axios库获取网页内容，然后使用cheerio库解析网页。在每次爬取网页后，我们使用sleep函数设置一个1秒的延迟。你可以根据需要调整延迟时间以控制爬虫速度。

亿速云「云服务器」，即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘，价格低至29元/月。点击查看>>

js爬虫怎样控制爬取速度

最新问答

相关标签