source: trunk/essentials/dev-lang/perl/pod/perlfaq9.pod@ 3397

Last change on this file since 3397 was 3181, checked in by bird, 19 years ago

perl 5.8.8

File size: 23.9 KB
Line 
1=head1 NAME
2
3perlfaq9 - Networking ($Revision: 1.28 $, $Date: 2005/12/31 00:54:37 $)
4
5=head1 DESCRIPTION
6
7This section deals with questions related to networking, the internet,
8and a few on the web.
9
10=head2 What is the correct form of response from a CGI script?
11
12(Alan Flavell <[email protected]> answers...)
13
14The Common Gateway Interface (CGI) specifies a software interface between
15a program ("CGI script") and a web server (HTTPD). It is not specific
16to Perl, and has its own FAQs and tutorials, and usenet group,
17comp.infosystems.www.authoring.cgi
18
19The CGI specification is outlined in an informational RFC:
20http://www.ietf.org/rfc/rfc3875
21
22Other relevant documentation listed in: http://www.perl.org/CGI_MetaFAQ.html
23
24These Perl FAQs very selectively cover some CGI issues. However, Perl
25programmers are strongly advised to use the CGI.pm module, to take care
26of the details for them.
27
28The similarity between CGI response headers (defined in the CGI
29specification) and HTTP response headers (defined in the HTTP
30specification, RFC2616) is intentional, but can sometimes be confusing.
31
32The CGI specification defines two kinds of script: the "Parsed Header"
33script, and the "Non Parsed Header" (NPH) script. Check your server
34documentation to see what it supports. "Parsed Header" scripts are
35simpler in various respects. The CGI specification allows any of the
36usual newline representations in the CGI response (it's the server's
37job to create an accurate HTTP response based on it). So "\n" written in
38text mode is technically correct, and recommended. NPH scripts are more
39tricky: they must put out a complete and accurate set of HTTP
40transaction response headers; the HTTP specification calls for records
41to be terminated with carriage-return and line-feed, i.e ASCII \015\012
42written in binary mode.
43
44Using CGI.pm gives excellent platform independence, including EBCDIC
45systems. CGI.pm selects an appropriate newline representation
46($CGI::CRLF) and sets binmode as appropriate.
47
48=head2 My CGI script runs from the command line but not the browser. (500 Server Error)
49
50Several things could be wrong. You can go through the "Troubleshooting
51Perl CGI scripts" guide at
52
53 http://www.perl.org/troubleshooting_CGI.html
54
55If, after that, you can demonstrate that you've read the FAQs and that
56your problem isn't something simple that can be easily answered, you'll
57probably receive a courteous and useful reply to your question if you
58post it on comp.infosystems.www.authoring.cgi (if it's something to do
59with HTTP or the CGI protocols). Questions that appear to be Perl
60questions but are really CGI ones that are posted to comp.lang.perl.misc
61are not so well received.
62
63The useful FAQs, related documents, and troubleshooting guides are
64listed in the CGI Meta FAQ:
65
66 http://www.perl.org/CGI_MetaFAQ.html
67
68
69=head2 How can I get better error messages from a CGI program?
70
71Use the CGI::Carp module. It replaces C<warn> and C<die>, plus the
72normal Carp modules C<carp>, C<croak>, and C<confess> functions with
73more verbose and safer versions. It still sends them to the normal
74server error log.
75
76 use CGI::Carp;
77 warn "This is a complaint";
78 die "But this one is serious";
79
80The following use of CGI::Carp also redirects errors to a file of your choice,
81placed in a BEGIN block to catch compile-time warnings as well:
82
83 BEGIN {
84 use CGI::Carp qw(carpout);
85 open(LOG, ">>/var/local/cgi-logs/mycgi-log")
86 or die "Unable to append to mycgi-log: $!\n";
87 carpout(*LOG);
88 }
89
90You can even arrange for fatal errors to go back to the client browser,
91which is nice for your own debugging, but might confuse the end user.
92
93 use CGI::Carp qw(fatalsToBrowser);
94 die "Bad error here";
95
96Even if the error happens before you get the HTTP header out, the module
97will try to take care of this to avoid the dreaded server 500 errors.
98Normal warnings still go out to the server error log (or wherever
99you've sent them with C<carpout>) with the application name and date
100stamp prepended.
101
102=head2 How do I remove HTML from a string?
103
104The most correct way (albeit not the fastest) is to use HTML::Parser
105from CPAN. Another mostly correct
106way is to use HTML::FormatText which not only removes HTML but also
107attempts to do a little simple formatting of the resulting plain text.
108
109Many folks attempt a simple-minded regular expression approach, like
110C<< s/<.*?>//g >>, but that fails in many cases because the tags
111may continue over line breaks, they may contain quoted angle-brackets,
112or HTML comment may be present. Plus, folks forget to convert
113entities--like C<&lt;> for example.
114
115Here's one "simple-minded" approach, that works for most files:
116
117 #!/usr/bin/perl -p0777
118 s/<(?:[^>'"]*|(['"]).*?\1)*>//gs
119
120If you want a more complete solution, see the 3-stage striphtml
121program in
122http://www.cpan.org/authors/Tom_Christiansen/scripts/striphtml.gz
123.
124
125Here are some tricky cases that you should think about when picking
126a solution:
127
128 <IMG SRC = "foo.gif" ALT = "A > B">
129
130 <IMG SRC = "foo.gif"
131 ALT = "A > B">
132
133 <!-- <A comment> -->
134
135 <script>if (a<b && a>c)</script>
136
137 <# Just data #>
138
139 <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
140
141If HTML comments include other tags, those solutions would also break
142on text like this:
143
144 <!-- This section commented out.
145 <B>You can't see me!</B>
146 -->
147
148=head2 How do I extract URLs?
149
150You can easily extract all sorts of URLs from HTML with
151C<HTML::SimpleLinkExtor> which handles anchors, images, objects,
152frames, and many other tags that can contain a URL. If you need
153anything more complex, you can create your own subclass of
154C<HTML::LinkExtor> or C<HTML::Parser>. You might even use
155C<HTML::SimpleLinkExtor> as an example for something specifically
156suited to your needs.
157
158You can use URI::Find to extract URLs from an arbitrary text document.
159
160Less complete solutions involving regular expressions can save
161you a lot of processing time if you know that the input is simple. One
162solution from Tom Christiansen runs 100 times faster than most
163module based approaches but only extracts URLs from anchors where the first
164attribute is HREF and there are no other attributes.
165
166 #!/usr/bin/perl -n00
167 # qxurl - [email protected]
168 print "$2\n" while m{
169 < \s*
170 A \s+ HREF \s* = \s* (["']) (.*?) \1
171 \s* >
172 }gsix;
173
174
175=head2 How do I download a file from the user's machine? How do I open a file on another machine?
176
177In this case, download means to use the file upload feature of HTML
178forms. You allow the web surfer to specify a file to send to your web
179server. To you it looks like a download, and to the user it looks
180like an upload. No matter what you call it, you do it with what's
181known as B<multipart/form-data> encoding. The CGI.pm module (which
182comes with Perl as part of the Standard Library) supports this in the
183start_multipart_form() method, which isn't the same as the startform()
184method.
185
186See the section in the CGI.pm documentation on file uploads for code
187examples and details.
188
189=head2 How do I make a pop-up menu in HTML?
190
191Use the B<< <SELECT> >> and B<< <OPTION> >> tags. The CGI.pm