小编给大家分享一下perl如何自动获取网页上的信息,希望大家阅读完这篇文章之后都有所收获,下面让我们一起去探讨吧!
perl获取网页上的信息
perl自动上网,然后获取网页上的信息:
#!/usr/bin/perl -w
# Perl pragma to restrict unsafe constructs
use strict;
# use LWP::UserAgent model
use LWP::UserAgent;
# main function
sub main {
# get params
# @_
# Within a subroutine the array @_ contains the parameters passed to that subroutine.
# Inside a subroutine, @_ is the default array for the array operators push, pop, shift, and unshift.
my $url = 'http://www.taobao.com';
die "no url param!\n" unless $url;
# create LWP::UserAgent object
my $ua = LWP::UserAgent->new;
# set connect timeout
$ua->timeout(20);
# set User-Agent header
$ua->agent("Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SV1; .NET CLR 2.0.50727)");
# send url use get mothed, and store response at var $resp
my $resp = $ua->get($url);
# check response
if ($resp->is_success) {
# get response content(html source code)
my $content = $resp->decoded_content;
# use Regex get page title from $content
if ( $content =~ m{<title>(.*)</title>}si ) {
# <title>(.+?)</title> (.+?) match title string, use () to store this str at a special variable $1 (this is a perl variable ),
# The bracketing construct ( ... ) creates capture groups (also referred to as capture buffers). To refer to the current contents of a group later on, within the same pattern, use $1 for the first,$2 for the second, and so on.
my $head = $1;
print "find page title : $head\n";
} else {
print "no page title for url : $url\n";
}
} else {
#display status information and exit
die $resp->status_line;
}
}
# pass params to main function,
# @ARGV
# The array @ARGV contains the command-line arguments intended for the script.
main(@ARGV);
看完了这篇文章,相信你对“perl如何自动获取网页上的信息”有了一定的了解,如果想了解更多相关知识,欢迎关注亿速云行业资讯频道,感谢各位的阅读!
亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。