这篇文章主要讲解了“Storm怎么写一个爬虫”,文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习“Storm怎么写一个爬虫”吧!
package com.digitalpebble.storm.crawler.bolt.indexing; import java.util.Map; import org.slf4j.LoggerFactory; import backtype.storm.task.OutputCollector; import backtype.storm.task.TopologyContext; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseRichBolt; import backtype.storm.tuple.Tuple; import com.digitalpebble.storm.crawler.StormConfiguration; import com.digitalpebble.storm.crawler.util.Configuration; /** * A generic bolt for indexing documents which determines which endpoint to use * based on the configuration and delegates the indexing to it. ***/ @SuppressWarnings("serial") public class IndexerBolt extends BaseRichBolt { private Configuration config; private BaseRichBolt endpoint; private static final org.slf4j.Logger LOG = LoggerFactory .getLogger(IndexerBolt.class); public void prepare(Map conf, TopologyContext context, OutputCollector collector) { config = StormConfiguration.create(); // get the implementation to use // and instanciate it String className = config.get("stormcrawler.indexer.class"); if (className == null) { throw new RuntimeException("No configuration found for indexing"); } try { final Class<BaseRichBolt> implClass = (Class<BaseRichBolt>) Class .forName(className); endpoint = implClass.newInstance(); } catch (final Exception e) { throw new RuntimeException("Couldn't create " + className, e); } if (endpoint != null) endpoint.prepare(conf, context, collector); } public void execute(Tuple tuple) { if (endpoint != null) endpoint.execute(tuple); } public void declareOutputFields(OutputFieldsDeclarer declarer) { if (endpoint != null) endpoint.declareOutputFields(declarer); } }
感谢各位的阅读,以上就是“Storm怎么写一个爬虫”的内容了,经过本文的学习后,相信大家对Storm怎么写一个爬虫这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是亿速云,小编将为大家推送更多相关知识点的文章,欢迎关注!
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。