Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

perlfaq9.pod@ 3397

Visit:

Last change on this file since 3397 was 3181, checked in by bird, 19 years ago
perl 5.8.8
File size: 23.9 KB

Line
1	=head1 NAME
2
3	perlfaq9 - Networking ($Revision: 1.28 $, $Date: 2005/12/31 00:54:37 $)
4
5	=head1 DESCRIPTION
6
7	This section deals with questions related to networking, the internet,
8	and a few on the web.
9
10	=head2 What is the correct form of response from a CGI script?
11
12	(Alan Flavell <[email protected]> answers...)
13
14	The Common Gateway Interface (CGI) specifies a software interface between
15	a program ("CGI script") and a web server (HTTPD). It is not specific
16	to Perl, and has its own FAQs and tutorials, and usenet group,
17	comp.infosystems.www.authoring.cgi
18
19	The CGI specification is outlined in an informational RFC:
20	http://www.ietf.org/rfc/rfc3875
21
22	Other relevant documentation listed in: http://www.perl.org/CGI_MetaFAQ.html
23
24	These Perl FAQs very selectively cover some CGI issues. However, Perl
25	programmers are strongly advised to use the CGI.pm module, to take care
26	of the details for them.
27
28	The similarity between CGI response headers (defined in the CGI
29	specification) and HTTP response headers (defined in the HTTP
30	specification, RFC2616) is intentional, but can sometimes be confusing.
31
32	The CGI specification defines two kinds of script: the "Parsed Header"
33	script, and the "Non Parsed Header" (NPH) script. Check your server
34	documentation to see what it supports. "Parsed Header" scripts are
35	simpler in various respects. The CGI specification allows any of the
36	usual newline representations in the CGI response (it's the server's
37	job to create an accurate HTTP response based on it). So "\n" written in
38	text mode is technically correct, and recommended. NPH scripts are more
39	tricky: they must put out a complete and accurate set of HTTP
40	transaction response headers; the HTTP specification calls for records
41	to be terminated with carriage-return and line-feed, i.e ASCII \015\012
42	written in binary mode.
43
44	Using CGI.pm gives excellent platform independence, including EBCDIC
45	systems. CGI.pm selects an appropriate newline representation
46	($CGI::CRLF) and sets binmode as appropriate.
47
48	=head2 My CGI script runs from the command line but not the browser. (500 Server Error)
49
50	Several things could be wrong. You can go through the "Troubleshooting
51	Perl CGI scripts" guide at
52
53	http://www.perl.org/troubleshooting_CGI.html
54
55	If, after that, you can demonstrate that you've read the FAQs and that
56	your problem isn't something simple that can be easily answered, you'll
57	probably receive a courteous and useful reply to your question if you
58	post it on comp.infosystems.www.authoring.cgi (if it's something to do
59	with HTTP or the CGI protocols). Questions that appear to be Perl
60	questions but are really CGI ones that are posted to comp.lang.perl.misc
61	are not so well received.
62
63	The useful FAQs, related documents, and troubleshooting guides are
64	listed in the CGI Meta FAQ:
65
66	http://www.perl.org/CGI_MetaFAQ.html
67
68
69	=head2 How can I get better error messages from a CGI program?
70
71	Use the CGI::Carp module. It replaces C<warn> and C<die>, plus the
72	normal Carp modules C<carp>, C<croak>, and C<confess> functions with
73	more verbose and safer versions. It still sends them to the normal
74	server error log.
75
76	use CGI::Carp;
77	warn "This is a complaint";
78	die "But this one is serious";
79
80	The following use of CGI::Carp also redirects errors to a file of your choice,
81	placed in a BEGIN block to catch compile-time warnings as well:
82
83	BEGIN {
84	use CGI::Carp qw(carpout);
85	open(LOG, ">>/var/local/cgi-logs/mycgi-log")
86	or die "Unable to append to mycgi-log: $!\n";
87	carpout(*LOG);
88	}
89
90	You can even arrange for fatal errors to go back to the client browser,
91	which is nice for your own debugging, but might confuse the end user.
92
93	use CGI::Carp qw(fatalsToBrowser);
94	die "Bad error here";
95
96	Even if the error happens before you get the HTTP header out, the module
97	will try to take care of this to avoid the dreaded server 500 errors.
98	Normal warnings still go out to the server error log (or wherever
99	you've sent them with C<carpout>) with the application name and date
100	stamp prepended.
101
102	=head2 How do I remove HTML from a string?
103
104	The most correct way (albeit not the fastest) is to use HTML::Parser
105	from CPAN. Another mostly correct
106	way is to use HTML::FormatText which not only removes HTML but also
107	attempts to do a little simple formatting of the resulting plain text.
108
109	Many folks attempt a simple-minded regular expression approach, like
110	C<< s/<.*?>//g >>, but that fails in many cases because the tags
111	may continue over line breaks, they may contain quoted angle-brackets,
112	or HTML comment may be present. Plus, folks forget to convert
113	entities--like C<<> for example.
114
115	Here's one "simple-minded" approach, that works for most files:
116
117	#!/usr/bin/perl -p0777
118	s/<(?:[^>'"]\|(['"]).?\1)*>//gs
119
120	If you want a more complete solution, see the 3-stage striphtml
121	program in
122	http://www.cpan.org/authors/Tom_Christiansen/scripts/striphtml.gz
123	.
124
125	Here are some tricky cases that you should think about when picking
126	a solution:
127
128	<IMG SRC = "foo.gif" ALT = "A > B">
129
130	<IMG SRC = "foo.gif"
131	ALT = "A > B">
132
133	<!-- <A comment> -->
134
135	<script>if (a<b && a>c)</script>
136
137	<# Just data #>
138
139	<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
140
141	If HTML comments include other tags, those solutions would also break
142	on text like this:
143
144	<!-- This section commented out.
145	<B>You can't see me!</B>
146	-->
147
148	=head2 How do I extract URLs?
149
150	You can easily extract all sorts of URLs from HTML with
151	C<HTML::SimpleLinkExtor> which handles anchors, images, objects,
152	frames, and many other tags that can contain a URL. If you need
153	anything more complex, you can create your own subclass of
154	C<HTML::LinkExtor> or C<HTML::Parser>. You might even use
155	C<HTML::SimpleLinkExtor> as an example for something specifically
156	suited to your needs.
157
158	You can use URI::Find to extract URLs from an arbitrary text document.
159
160	Less complete solutions involving regular expressions can save
161	you a lot of processing time if you know that the input is simple. One
162	solution from Tom Christiansen runs 100 times faster than most
163	module based approaches but only extracts URLs from anchors where the first
164	attribute is HREF and there are no other attributes.
165
166	#!/usr/bin/perl -n00
167	# qxurl - [email protected]
168	print "$2\n" while m{
169	< \s*
170	A \s+ HREF \s* = \s* (["']) (.*?) \1
171	\s* >
172	}gsix;
173
174
175	=head2 How do I download a file from the user's machine? How do I open a file on another machine?
176
177	In this case, download means to use the file upload feature of HTML
178	forms. You allow the web surfer to specify a file to send to your web
179	server. To you it looks like a download, and to the user it looks
180	like an upload. No matter what you call it, you do it with what's
181	known as B<multipart/form-data> encoding. The CGI.pm module (which
182	comes with Perl as part of the Standard Library) supports this in the
183	start_multipart_form() method, which isn't the same as the startform()
184	method.
185
186	See the section in the CGI.pm documentation on file uploads for code
187	examples and details.
188
189	=head2 How do I make a pop-up menu in HTML?
190
191	Use the B<< <SELECT> >> and B<< <OPTION> >> tags. The CGI.pm