Web Bases with LWP
Introduction
LWP (reduction from " Library for WWW in Perl ") is very popular group of modules of language Perl for a data access in network Internet. As well as the majority of modules of language Perl, on each module from structure LWP is present the documentation with the full description of his{its} interface. However, in LWP there is a set of modules for which it is rather difficult to find the documentation on, apparently, elementary things.
Introduction in use LWP, obviously, should borrow{occupy} the whole book, - the book which only has come out of press, and named: Perl and LWP. This clause{article} offers you some examples which will help you with the organization of usual things with LWP.
Access to pages with help LWP:: Simple
If you only want to receive the document which is taking place to certain (URL address the most simple way to make it - to use functions of module LWP:: Simple.
In a Perl-script you can make it, having called function get ($url). She will try to extend contents of it URL. If all will fulfil normally function will return these contents; but if there will be any mistake she will return undef.
my $url = ' http: // freshair.npr.org/dayFA.cfm? todayDate=current ';
* Only for an example: new recordings on/Fresh Air/
use LWP:: Simple;
my $content = get $url;
die " Couldn't get $url " unless defined $content;
* Further something is done{made} with $content, for example:
if ($content = ~ m/jazz/i) {
print " They're talking about jazz today on Fresh Air! n ";
} else {
print " Fresh Air is apparently jazzless today.n ";
}
More convenient variant of function get is getprint which is convenient for prostmotra sodarzhimogo pages through Perl. If function getprint can "get" a page, which address you have set, she sends contents in STDOUT; otherwise, in a role of the book of complaints acts STDERR.
% perl-MLWP:: Simple-e " getprint ' http://cpan.org/RECENT ' "
It URL a simple text file. He contains the list of new files on CPAN for last two weeks. You can easily make the shell-command which, for example, will send to you the list of new modules Acme:::
[] CODE
% perl-MLWP:: Simple-e " getprint ' http://cpan.org/RECENT ' "
| grep "/by-module/Acme" | mail-s " New Acme modules! Joy! " $USER
[/CODE]
In module LWP:: Simple exists still a little bit enough useful functions, including function for performance of HEAD-search for URL (it is useful to check of links or reception of date of last updating of the document) and two functions for preservation and mirrorings URL in a local file. Look the documentation on LWP:: Simple for more detailed information, or Chapter{Head} 2, " Web Bases " Perl and LWP for a lot of examples.
Bases of class model LWP
Functions LWP:: Simple are convenient only for simple cases, but these functions do not support shadow parcels{sendings} (further cookies) and authentications (further authorization); they also do not allow to establish any parameters HTTP of search; and the main thing, they do not allow to read out a line of heading in HTTP the answer (especially full text of the message in case HTTP of a mistake (HTTP error message)). For access to all these opportunities, you should use all set of classes LWP.
LWP contains set of classes, but main two which you should understand is LWP:: UserAgent and HTTP::Response. LWP:: UserAgent it is a class for " virtual browsers ", ktorymi you will use for performance of searches. HTTP::Response it is a class for answers (or messages on a mistake) which you receive back, after search.
The basic expression at job with LWP: $response = $browser-> get ($url), or completely:
use LWP 5.64; * all Is loaded necessary LWP classes, and udostoverivaemsja
* In sufficient freshness of the version of the module.
my $browser = LWP:: UserAgent-> new;
...
* That is used below, URL to which the search will be made:
my $url = ' http: // freshair.npr.org/dayFA.cfm? todayDate=current ';
my $response = $browser-> get ($url);
die " Can't get $url - ", $response-> status_line
unless $response-> is_success;
die " Hey, I was expecting HTML, not ", $response-> content_type
unless $response-> content_type eq ' text/html ';
* Or another content-type which approaches you
* Otherwise, we make processing contents:
if ($response-> content = ~ m/jazz/i) {
print " They're talking about jazz today on Fresh Air! n ";
} else {
print " Fresh Air is apparently jazzless today.n ";
}
In this example it has been switched on two objects, in comparison with the previous example: $browser which contains object of class LWP:: UserAgent, and object $response which of a class HTTP::Response. Usually it is necessary to you no more than one object $browser; but each time as you interpellate, you receive back new object HTTP::Response which contains some interesting methods:
*
Status code (the Code of a status) which shows success or failure of search (you it can check up so: $response-> is_success).
*
HTTP status line (status bar) which, I think, will be pleased informative in case of a mistake (you can see her{it}, using $response-> status_line, she returns something like: " 404 Not Found ").
*
MIME content-type, for example "text/html", "image/gif", "application/xml", etc. which you can see, using $response-> content_type
*
Actually contents of the required document in $response-> content. In a case with HTML, here will be HTML a code; if - GIF $response-> content will return binary data GIF.
*
And also set convenient and more specific which are described in the documentation on HTTP::Response, and to his{its} super classes, HTTP::Message and HTTP::Headers.
Addition of other headings HTTP of search
Most often used syntax for searches $response = $browser-> get ($url), but, to tell the truth, you can add own lines HTTP of headings to search, addition of the list of pairs key - value after URL, for example:
$response = $browser-> get ($url, $key1, $value1, $key2, $value2...);
Here is how to send Netscape-like headings:
my @ns_headers = (
' User-Agent ' => ' Mozilla/4.76 [en] (Win98; U) ',
'Accept' => ' image/gif, image/x-xbitmap, image/jpeg,
image/pjpeg, image/png, */* ',
' Accept-Charset ' => ' iso-8859-1, *, utf-8 ',
' Accept-Language ' => ' en-US ',
);
...
$response = $browser-> get ($url, @ns_headers);
If you will not use this file further, you can act{arrive} as follows:
$response = $browser-> get ($url,
' User-Agent ' => ' Mozilla/4.76 [en] (Win98; U) ',
'Accept' => ' image/gif, image/x-xbitmap, image/jpeg,
image/pjpeg, image/png, */* ',
' Accept-Charset ' => ' iso-8859-1, *, utf-8 ',
' Accept-Language ' => ' en-US ',
);
If you are going to to change only ' User-Agent '-parameters, you can change standard installation of object $browser "libwww-perl/5.65" (or something similar) to all that you want, using a corresponding method of object LWP:: UserAgent:
$browser-> agent (' Mozilla/4.76 [en] (Win98; U) ');
Inclusion Cookies (Shadow parcels{sendings})
Usually object LWP:: UserAgent works as a browser with the switched - off support cookies. There are some ways to switch on such support, using a method cookie_jar. " cookie jar " are an object which if it is possible so to say, personifies a small DB with all HTTP cookies about which the browser can know. "DB" can be saved on a disk (so works Netscape, using a file cookies.txt), or "to hang" in memory, thus all set cookies will be lost, as soon as the program will finish the job.
To create empty object cookie jar in memory, call cookie_jar a method as follows:
$browser-> cookie_jar ({}};
To do{make} copies cookies in a file on a disk which will contain all set cookies with which the browser worked, after end of the program, call cookie_jar a method as follows:
use HTTP::Cookies;
$browser-> cookie_jar (HTTP:: Cookies-> new (
'file' => '/some/where/cookies.lwp ',
*??®? an exchange
' autosave ' => 1,
*»« end whether to save a file
));
This file will be in specific format LWP. If you want to get access to cookies from yours Netscape-cookies a file, you can use the following method: HTTP::Cookies::Netscape:
use HTTP::Cookies;
$browser-> cookie_jar (HTTP:: Cookies:: Netscape-> new (
' file ' => ' c:/Program Files/Netscape/Users/DIR-NAME-HERE/cookies.txt ',
* Whence to read kuki
));
You can add a line ' autosave ' => 1 as we did{made} earlier, but at the moment of recording there is a probability of that Netscape can give up in recording the some people cookies back on a disk.
Sending of the given forms by method POST
Many HTML forms send the data on the server, using search HTTP POST which you can carry out as follows:
$response = $browser-> post ($url,
[
formkey1 => value1,
formkey2 => value2,
...
],
);
Or, if you need to send HTTP headings:
$response = $browser-> post ($url,
[
formkey1 => value1,
formkey2 => value2,
...
],
headerkey1 => value1,
headerkey2 => value2,
);
For example, the following program carries out search search on AltaVista (sending of some given forms, using method HTTP POST), and takes quantity{amount} of concurrences from the test of the answer:
use strict;
use warnings;
use LWP 5.64;
my $browser = LWP:: UserAgent-> new;
my $word = ' tarragon ';
my $url = ' http: // www.altavista.com/sites/search/web ';
my $response = $browser-> post ($url,
[' q ' => $word, *»«??¬«??n a phrase
' pg ' => ' q ',' avkw ' => ' tgz ',' kl ' => ' XX ',
]
);
die " $url error: ", $response-> status_line
unless $response-> is_success;
die " Weird content type at $url - ", $response-> content_type
unless $response-> content_type eq ' text/html ';
if ($response-> content = ~ m {AltaVista found ([0-9,] +) results}) {
*?«????«¬? will be a kind: " AltaVista found 2,345 results "
print " $word: $1n ";
} else {
print " Couldn't find the match-string in the responsen ";
}
Data transfer of forms by method GET
HTML forms pass the some people given not sending by method POST, and fulfilment ordinary GET search with the certain data set in end URL. For example, if you will go on imdb.com and start search on phrase Blade Runner URL which you will see, will be the following:
http: // us.imdb.com/Tsearch? title=Blade%20Runner*restrict=Movies+and+TV
For start of such search with help LWP, it is necessary to make the following:
use URI;
my $url = URI-> new (' http://us.imdb.com/Tsearch ');
* Creates the object representing URL
$url-> query_form (* Here pairs a key => value:
' title ' => ' Blade Runner ',
' restrict ' => ' Movies and TV ',
);
my $response = $browser-> get ($url);
Look Chapter{Head} 2, "Forms" of book Perl and LWP for more detailed studying HTML of forms, as well as the chapter{head} with the sixth on the ninth for detailed studying extraction of the data from HTML.

|