| 1 | =head1 NAME
|
|---|
| 2 |
|
|---|
| 3 | perlfaq9 - Networking ($Revision: 1.28 $, $Date: 2005/12/31 00:54:37 $)
|
|---|
| 4 |
|
|---|
| 5 | =head1 DESCRIPTION
|
|---|
| 6 |
|
|---|
| 7 | This section deals with questions related to networking, the internet,
|
|---|
| 8 | and a few on the web.
|
|---|
| 9 |
|
|---|
| 10 | =head2 What is the correct form of response from a CGI script?
|
|---|
| 11 |
|
|---|
| 12 | (Alan Flavell <[email protected]> answers...)
|
|---|
| 13 |
|
|---|
| 14 | The Common Gateway Interface (CGI) specifies a software interface between
|
|---|
| 15 | a program ("CGI script") and a web server (HTTPD). It is not specific
|
|---|
| 16 | to Perl, and has its own FAQs and tutorials, and usenet group,
|
|---|
| 17 | comp.infosystems.www.authoring.cgi
|
|---|
| 18 |
|
|---|
| 19 | The CGI specification is outlined in an informational RFC:
|
|---|
| 20 | http://www.ietf.org/rfc/rfc3875
|
|---|
| 21 |
|
|---|
| 22 | Other relevant documentation listed in: http://www.perl.org/CGI_MetaFAQ.html
|
|---|
| 23 |
|
|---|
| 24 | These Perl FAQs very selectively cover some CGI issues. However, Perl
|
|---|
| 25 | programmers are strongly advised to use the CGI.pm module, to take care
|
|---|
| 26 | of the details for them.
|
|---|
| 27 |
|
|---|
| 28 | The similarity between CGI response headers (defined in the CGI
|
|---|
| 29 | specification) and HTTP response headers (defined in the HTTP
|
|---|
| 30 | specification, RFC2616) is intentional, but can sometimes be confusing.
|
|---|
| 31 |
|
|---|
| 32 | The CGI specification defines two kinds of script: the "Parsed Header"
|
|---|
| 33 | script, and the "Non Parsed Header" (NPH) script. Check your server
|
|---|
| 34 | documentation to see what it supports. "Parsed Header" scripts are
|
|---|
| 35 | simpler in various respects. The CGI specification allows any of the
|
|---|
| 36 | usual newline representations in the CGI response (it's the server's
|
|---|
| 37 | job to create an accurate HTTP response based on it). So "\n" written in
|
|---|
| 38 | text mode is technically correct, and recommended. NPH scripts are more
|
|---|
| 39 | tricky: they must put out a complete and accurate set of HTTP
|
|---|
| 40 | transaction response headers; the HTTP specification calls for records
|
|---|
| 41 | to be terminated with carriage-return and line-feed, i.e ASCII \015\012
|
|---|
| 42 | written in binary mode.
|
|---|
| 43 |
|
|---|
| 44 | Using CGI.pm gives excellent platform independence, including EBCDIC
|
|---|
| 45 | systems. CGI.pm selects an appropriate newline representation
|
|---|
| 46 | ($CGI::CRLF) and sets binmode as appropriate.
|
|---|
| 47 |
|
|---|
| 48 | =head2 My CGI script runs from the command line but not the browser. (500 Server Error)
|
|---|
| 49 |
|
|---|
| 50 | Several things could be wrong. You can go through the "Troubleshooting
|
|---|
| 51 | Perl CGI scripts" guide at
|
|---|
| 52 |
|
|---|
| 53 | http://www.perl.org/troubleshooting_CGI.html
|
|---|
| 54 |
|
|---|
| 55 | If, after that, you can demonstrate that you've read the FAQs and that
|
|---|
| 56 | your problem isn't something simple that can be easily answered, you'll
|
|---|
| 57 | probably receive a courteous and useful reply to your question if you
|
|---|
| 58 | post it on comp.infosystems.www.authoring.cgi (if it's something to do
|
|---|
| 59 | with HTTP or the CGI protocols). Questions that appear to be Perl
|
|---|
| 60 | questions but are really CGI ones that are posted to comp.lang.perl.misc
|
|---|
| 61 | are not so well received.
|
|---|
| 62 |
|
|---|
| 63 | The useful FAQs, related documents, and troubleshooting guides are
|
|---|
| 64 | listed in the CGI Meta FAQ:
|
|---|
| 65 |
|
|---|
| 66 | http://www.perl.org/CGI_MetaFAQ.html
|
|---|
| 67 |
|
|---|
| 68 |
|
|---|
| 69 | =head2 How can I get better error messages from a CGI program?
|
|---|
| 70 |
|
|---|
| 71 | Use the CGI::Carp module. It replaces C<warn> and C<die>, plus the
|
|---|
| 72 | normal Carp modules C<carp>, C<croak>, and C<confess> functions with
|
|---|
| 73 | more verbose and safer versions. It still sends them to the normal
|
|---|
| 74 | server error log.
|
|---|
| 75 |
|
|---|
| 76 | use CGI::Carp;
|
|---|
| 77 | warn "This is a complaint";
|
|---|
| 78 | die "But this one is serious";
|
|---|
| 79 |
|
|---|
| 80 | The following use of CGI::Carp also redirects errors to a file of your choice,
|
|---|
| 81 | placed in a BEGIN block to catch compile-time warnings as well:
|
|---|
| 82 |
|
|---|
| 83 | BEGIN {
|
|---|
| 84 | use CGI::Carp qw(carpout);
|
|---|
| 85 | open(LOG, ">>/var/local/cgi-logs/mycgi-log")
|
|---|
| 86 | or die "Unable to append to mycgi-log: $!\n";
|
|---|
| 87 | carpout(*LOG);
|
|---|
| 88 | }
|
|---|
| 89 |
|
|---|
| 90 | You can even arrange for fatal errors to go back to the client browser,
|
|---|
| 91 | which is nice for your own debugging, but might confuse the end user.
|
|---|
| 92 |
|
|---|
| 93 | use CGI::Carp qw(fatalsToBrowser);
|
|---|
| 94 | die "Bad error here";
|
|---|
| 95 |
|
|---|
| 96 | Even if the error happens before you get the HTTP header out, the module
|
|---|
| 97 | will try to take care of this to avoid the dreaded server 500 errors.
|
|---|
| 98 | Normal warnings still go out to the server error log (or wherever
|
|---|
| 99 | you've sent them with C<carpout>) with the application name and date
|
|---|
| 100 | stamp prepended.
|
|---|
| 101 |
|
|---|
| 102 | =head2 How do I remove HTML from a string?
|
|---|
| 103 |
|
|---|
| 104 | The most correct way (albeit not the fastest) is to use HTML::Parser
|
|---|
| 105 | from CPAN. Another mostly correct
|
|---|
| 106 | way is to use HTML::FormatText which not only removes HTML but also
|
|---|
| 107 | attempts to do a little simple formatting of the resulting plain text.
|
|---|
| 108 |
|
|---|
| 109 | Many folks attempt a simple-minded regular expression approach, like
|
|---|
| 110 | C<< s/<.*?>//g >>, but that fails in many cases because the tags
|
|---|
| 111 | may continue over line breaks, they may contain quoted angle-brackets,
|
|---|
| 112 | or HTML comment may be present. Plus, folks forget to convert
|
|---|
| 113 | entities--like C<<> for example.
|
|---|
| 114 |
|
|---|
| 115 | Here's one "simple-minded" approach, that works for most files:
|
|---|
| 116 |
|
|---|
| 117 | #!/usr/bin/perl -p0777
|
|---|
| 118 | s/<(?:[^>'"]*|(['"]).*?\1)*>//gs
|
|---|
| 119 |
|
|---|
| 120 | If you want a more complete solution, see the 3-stage striphtml
|
|---|
| 121 | program in
|
|---|
| 122 | http://www.cpan.org/authors/Tom_Christiansen/scripts/striphtml.gz
|
|---|
| 123 | .
|
|---|
| 124 |
|
|---|
| 125 | Here are some tricky cases that you should think about when picking
|
|---|
| 126 | a solution:
|
|---|
| 127 |
|
|---|
| 128 | <IMG SRC = "foo.gif" ALT = "A > B">
|
|---|
| 129 |
|
|---|
| 130 | <IMG SRC = "foo.gif"
|
|---|
| 131 | ALT = "A > B">
|
|---|
| 132 |
|
|---|
| 133 | <!-- <A comment> -->
|
|---|
| 134 |
|
|---|
| 135 | <script>if (a<b && a>c)</script>
|
|---|
| 136 |
|
|---|
| 137 | <# Just data #>
|
|---|
| 138 |
|
|---|
| 139 | <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
|
|---|
| 140 |
|
|---|
| 141 | If HTML comments include other tags, those solutions would also break
|
|---|
| 142 | on text like this:
|
|---|
| 143 |
|
|---|
| 144 | <!-- This section commented out.
|
|---|
| 145 | <B>You can't see me!</B>
|
|---|
| 146 | -->
|
|---|
| 147 |
|
|---|
| 148 | =head2 How do I extract URLs?
|
|---|
| 149 |
|
|---|
| 150 | You can easily extract all sorts of URLs from HTML with
|
|---|
| 151 | C<HTML::SimpleLinkExtor> which handles anchors, images, objects,
|
|---|
| 152 | frames, and many other tags that can contain a URL. If you need
|
|---|
| 153 | anything more complex, you can create your own subclass of
|
|---|
| 154 | C<HTML::LinkExtor> or C<HTML::Parser>. You might even use
|
|---|
| 155 | C<HTML::SimpleLinkExtor> as an example for something specifically
|
|---|
| 156 | suited to your needs.
|
|---|
| 157 |
|
|---|
| 158 | You can use URI::Find to extract URLs from an arbitrary text document.
|
|---|
| 159 |
|
|---|
| 160 | Less complete solutions involving regular expressions can save
|
|---|
| 161 | you a lot of processing time if you know that the input is simple. One
|
|---|
| 162 | solution from Tom Christiansen runs 100 times faster than most
|
|---|
| 163 | module based approaches but only extracts URLs from anchors where the first
|
|---|
| 164 | attribute is HREF and there are no other attributes.
|
|---|
| 165 |
|
|---|
| 166 | #!/usr/bin/perl -n00
|
|---|
| 167 | # qxurl - [email protected]
|
|---|
| 168 | print "$2\n" while m{
|
|---|
| 169 | < \s*
|
|---|
| 170 | A \s+ HREF \s* = \s* (["']) (.*?) \1
|
|---|
| 171 | \s* >
|
|---|
| 172 | }gsix;
|
|---|
| 173 |
|
|---|
| 174 |
|
|---|
| 175 | =head2 How do I download a file from the user's machine? How do I open a file on another machine?
|
|---|
| 176 |
|
|---|
| 177 | In this case, download means to use the file upload feature of HTML
|
|---|
| 178 | forms. You allow the web surfer to specify a file to send to your web
|
|---|
| 179 | server. To you it looks like a download, and to the user it looks
|
|---|
| 180 | like an upload. No matter what you call it, you do it with what's
|
|---|
| 181 | known as B<multipart/form-data> encoding. The CGI.pm module (which
|
|---|
| 182 | comes with Perl as part of the Standard Library) supports this in the
|
|---|
| 183 | start_multipart_form() method, which isn't the same as the startform()
|
|---|
| 184 | method.
|
|---|
| 185 |
|
|---|
| 186 | See the section in the CGI.pm documentation on file uploads for code
|
|---|
| 187 | examples and details.
|
|---|
| 188 |
|
|---|
| 189 | =head2 How do I make a pop-up menu in HTML?
|
|---|
| 190 |
|
|---|
| 191 | Use the B<< <SELECT> >> and B<< <OPTION> >> tags. The CGI.pm
|
|---|
|
|---|