温馨提示×

温馨提示×

您好,登录后才能下订单哦!

密码登录×
登录注册×
其他方式登录
点击 登录注册 即表示同意《亿速云用户服务条款》

如何使用Spark分析网站日志

发布时间:2021-11-10 18:54:33 阅读:146 作者:柒染 栏目:云计算
开发者测试专用服务器限时活动,0元免费领,库存有限,领完即止! 点击查看>>

如何使用Spark分析网站日志,相信很多没有经验的人对此束手无策,为此本文总结了问题出现的原因和解决方法,通过这篇文章希望你能解决这个问题。

郁闷从昨天开始个人网站不断的发出告警504错误,登录机器看了一下是php-fpm报错,这个错误重启php-fpm后,几个小时就告警,快一年了都没什么问题,奇怪

[28-Sep-2016 11:53:19] NOTICE: ready to handle connections[28-Sep-2016 11:53:19] NOTICE: systemd monitor interval set to 10000ms[28-Sep-2016 11:53:26] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it[28-Sep-2016 13:46:35] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it[28-Sep-2016 13:49:32] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it

以为是这个值设置的太小了,所以修改了配置修改大了值

[28-Sep-2016 15:51:43] NOTICE: fpm is running, pid 28179[28-Sep-2016 15:51:43] NOTICE: ready to handle connections[28-Sep-2016 15:51:43] NOTICE: systemd monitor interval set to 10000ms[28-Sep-2016 15:52:12] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 7 total children[28-Sep-2016 16:15:58] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it[28-Sep-2016 16:52:32] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it[28-Sep-2016 16:53:05] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it[28-Sep-2016 16:55:17] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it

结果后来还是一样,几个小时之后再次504告警,再看nginx的日志,发现一些奇怪的ip访问量非常大。。。有怀疑是有恶意ip的访问,看来有必要查查访问日志中的ip访问量

root@iZ28bhfjhgkZ:/var/log/nginx# vim access.log121.42.53.180 - - [25/Sep/2016:06:26:29 +0800] "POST /wp-cron.php?doing_wp_cron=1474755989.0131719112396240234375 HTTP/1.0" 499 0 "-" "WordPress/4.3.1; http://zhwen.org"182.92.148.207 - - [25/Sep/2016:06:26:29 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"203.208.60.226 - - [25/Sep/2016:06:28:55 +0800] "GET /?p=675 HTTP/1.1" 200 8204 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"203.208.60.226 - - [25/Sep/2016:06:28:57 +0800] "GET /wp-content/themes/sparkling/inc/css/font-awesome.min.css?ver=4.3.1 HTTP/1.1" 200 26711 "http://zhwen.org/?p=675" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"203.208.60.226 - - [25/Sep/2016:06:28:57 +0800] "GET /wp-content/plugins/wp-pagenavi/pagenavi-css.css?ver=2.70 HTTP/1.1" 200 374 "http://zhwen.org/?p=675" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"203.208.60.226 - - [25/Sep/2016:06:28:58 +0800] "GET /wp-content/plugins/yet-another-related-posts-plugin/style/widget.css?ver=4.3.1 HTTP/1.1" 200 771 "http://zhwen.org/?p=675" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"121.43.107.174 - - [25/Sep/2016:06:29:18 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"115.28.189.208 - - [25/Sep/2016:06:29:33 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"42.156.139.59 - - [25/Sep/2016:06:30:58 +0800] "GET /?paged=14 HTTP/1.1" 200 11164 "-" "YisouSpider"182.92.148.207 - - [25/Sep/2016:06:31:29 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"61.135.169.81 - - [25/Sep/2016:06:34:14 +0800] "GET /?p=articles/cscope-tags HTTP/1.1" 200 10681 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12) AppleWebKit/602.1.50 (KHTML, like Gecko)"61.135.169.81 - - [25/Sep/2016:06:34:14 +0800] "GET /apple-touch-icon-precomposed.png HTTP/1.1" 404 151 "-" "Safari/12602.1.50.0.10 CFNetwork/807.0.4 Darwin/16.0.0 (x86_64)"

所以对访问日志的ip做了一个简单统计:
1)先把ip取出来(为了减少数据量,其实也可以直接压缩后下载到本地),再下载到本地
root@iZ28bhfjhgkZ:/var/log/nginx# cat access.log|awk ‘{print $1}’ > tt

在sparkshell中执行下面的代码:

val line = sc.textFile("/data1/data/t1")line.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).map(e => (e._2, e._1)).reduceByKey(_+","+_).sortByKey(true,1).saveAsTextFile("/data1/data/t3")

2)最后的结果t3的内容如下,发现这几个ip的访问量非常大,尤其

191.96.249.53。。。。。(855,182.92.148.207)(3100,121.8.136.75)(3889,61.135.169.81)(53513,191.96.249.53)

3)再搞一个iptables限制,搞定。spark做这种统计分析还是非常简单的,就是一行代码搞定分析。

root@iZ28bhfjhgkZ:/var/log# iptables -LChain INPUT (policy ACCEPT)target     prot opt source               destination         Chain FORWARD (policy ACCEPT)target     prot opt source               destination         Chain OUTPUT (policy ACCEPT)target     prot opt source               destination         root@iZ28bhfjhgkZ:/var/log# iptables -A INPUT -s 191.96.249.53 -j DROProot@iZ28bhfjhgkZ:/var/log# iptables -LChain INPUT (policy ACCEPT)target     prot opt source               destination         DROP       all  --  DEDICATED.SERVER     anywhere            Chain FORWARD (policy ACCEPT)target     prot opt source               destination         Chain OUTPUT (policy ACCEPT)target     prot opt source               destination         root@iZ28bhfjhgkZ:/var/log# 

看完上述内容,你们掌握如何使用Spark分析网站日志的方法了吗?如果还想学到更多技能或想了解更多相关内容,欢迎关注亿速云行业资讯频道,感谢各位的阅读!

亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>

向AI问一下细节

免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。

原文链接:https://my.oschina.net/u/3422238/blog/4377482

AI

开发者交流群×