Home
Home Page
Web Bases with LWP
Transformation relative in absoljutye links
For the greater information read the full documentation on LWP:: UserAgent.
Accessing HTTPS URLs
Job with the text and graphic data in common in PHP and MySQL
Change of appearance of the counter in CNStats
Simple banner system phpFBS
How to protect a site from total uploading.
21 mistake of programmer PHP
API functions
Minuses of use API of functions
Generation of the image
The guest book step by step
The guest book on PHP/MySQL
PHP - Simple caching
Even about protection e-mail addresses on webs - pages
Language of web - statistics
Program extract of the bill in system WebMoney
Language of web - statistics
Links

 

Transformation relative in absoljutye links


URI a class which we have considered just, the set of every possible functions for job with various parts URL gives (such as definition such as URL - $url-> scheme, definition on what host he refers - $url-> host, and so on on the basis of the documentation on classes URI. Nevertheless, the most interesting are a method query_form, considered earlier, and now a method new_abs for transformation of the relative link ("../foo.html") in absolute (" http: // www.perl.com/stuff/foo.html "):



use URI;

$abs = URI-> new_abs ($maybe_relative, $base);


For example, we shall consider this programmku which chooses links from a HTML-page snovymi modules on CPAN:



use strict;

use warnings;

use LWP 5.64;

my $browser = LWP:: UserAgent-> new;


my $url = ' http: // www.cpan.org/RECENT.html ';

my $response = $browser-> get ($url);

die " Can't get $url - ", $response-> status_line

unless $response-> is_success;


my $html = $response-> content;

while ($html = ~ m / <A HREF = " (. *?) "/g) {

print "$1n";

}


At start she starts to give out something like it:



MIRRORING.FROM

RECENT

RECENT.html

authors/00whois.html

authors/01mailrc.txt.gz

authors/id/A/AA/AASSAD/CHECKSUMS

...


But, if you want receive the list of absolute links you can to use a method new_abs, having changed a cycle while as follows:



while ($html = ~ m / <A HREF = " (. *?) "/g) {

print URI-> new_abs ($1, $response-> base), "n";

}


($response-> base the module HTTP::Message it is used for definition of the base address for transformation of relative links in absolute.)


Now our program gives out that ndo:



http://www.cpan.org/MIRRORING.FROM

http://www.cpan.org/RECENT

http://www.cpan.org/RECENT.html

http://www.cpan.org/authors/00whois.html

http://www.cpan.org/authors/01mailrc.txt.gz

http://www.cpan.org/authors/id/A/AA/AASSAD/CHECKSUMS

...


See. Chapter{head} 4, "URLs", books Perl and LWP for the greater information on objects URI.


Certainly, use regexp for allocation of addresses is too prmitivnym a method, therefore for more serious programs it is necessary to use modules of " grammatic analysis HTML " similar HTML:: LinkExtor or HTML:: TokeParser, or, even can be, HTML:: TreeBuilder.

Other properties of a browser


Objects LWP:: UserAgent have set svojst for management of own job. Some from them:


*


$browser-> timeout (15): This method establishes a maximum quantity of time for expectation of the answer of the server. If after 15 seconds (in this case) it will not be received the answer the browser will stop search.

*


$browser-> protocols_allowed ([' http ',' gopher ']): types of links with which the browser will "communicate" Are established., in particular HTTP and gopher. If there will be osuhhestvena an attempt to get access to any document under other report (for example, " ftp: ", " mailto: ", " news: ") there will be no even an attempt of connection, and we shall receive a mistake 500, with the message similar: " Access to ftp URIs has been disabled ".

*


use LWP:: ConnCache;

$browser-> conn_cache (LWP:: ConnCache-> new ()): After this installation the object of a browser tries to use HTTP/1.1 "Keep-Alive" which accelerates searches by use of one connection for several searches to the same server.

*


$browser-> agent (' SomeName/1.23 (more info here maybe) '): we Determine as our browser will identify myself in line "User-Agent" HTTP searches. By default, he sends "libwww-perl/versionnumber", i.e. "libwww-perl/5.65". You can change it to more informative message:



$browser-> agent (' SomeName/3.14 (contact@robotplexus.int) ');


Or, if it is necessary, you can will pretend to be a real browser:



$browser-> agent (

' Mozilla/4.0 (compatible; MSIE 5.12; Mac_PowerPC) ');


*


push {$ua-> requests_redirectable}, ' POST ': we Establish{Install} our browser on carrying out readdressing on POST searches (so does{makes} the majority of modern browsers (IE, NN, Opera)) though HTTP RFC speaks us about that, what is it generally it should not be carried out.