Pod::Simple::Subclassing -- write a formatter as a Pod::Simple subclass
package Pod::SomeFormatter;
use Pod::Simple;
@ISA = qw(Pod::Simple);
$VERSION = '1.01';
use strict;
sub _handle_element_start {
my($parser, $element_name, $attr_hash_r) = @_;
...
}
sub _handle_element_end {
my($parser, $element_name) = @_;
...
}
sub _handle_text {
my($parser, $text) = @_;
...
}
1;
This document is about using Pod::Simple to write a Pod processor, generally a Pod formatter. If you just want to know about using an existing Pod formatter, instead see its documentation and see also the docs in Pod::Simple.
The zeroeth step in writing a Pod formatter is to make sure that there isn't already a decent one in CPAN. See http://search.cpan.org/, and run a search on the name of the format you want to render to. Also consider joining the Pod People list http://lists.perl.org/showlist.cgi?name=pod-people and asking whether anyone has a formatter for that format -- maybe someone cobbled one together but just hasn't released it.
The first step in writing a Pod processor is to read perlpodspec, which contains notes information on writing a Pod parser (which has been largely taken care of by Pod::Simple), but also a lot of requirements and recommendations for writing a formatter.
The second step is to actually learn the format you're planning to format to -- or at least as much as you need to know to represent Pod, which probably isn't much.
The third step is to pick which of Pod::Simple's interfaces you want to use -- the basic interface via Pod::Simple or Pod::Simple::Methody is event-based, sort of like HTML::Parser's interface, or sort of like XML::Parser's "Handlers" interface), but Pod::Simple::PullParser provides a token-stream interface, sort of like HTML::TokeParser's interface; Pod::Simple::SimpleTree provides a simple tree interface, rather like XML::Parser's "Tree" interface. Users familiar with XML-handling will find one of these styles relatively familiar; but if you would be even more at home with XML, there are classes that produce an XML representation of the Pod stream, notably Pod::Simple::XMLOutStream; you can feed the output of such a class to whatever XML parsing system you are most at home with.
The last step is to write your code based on how the events (or tokens, or tree-nodes, or the XML, or however you're parsing) will map to constructs in the output format. Also sure to consider how to escape text nodes containing arbitrary text, and also what to do with text nodes that represent preformatted text (from verbatim sections).
TODO intro... mention that events are supplied for implicits, like for missing >'s
In the following section, we use XML to represent the event structure associated with a particular construct. That is, TODO
$parser->_handle_element_start( element_name, attr_hashref )
$parser->_handle_element_end( element_name )
$parser->_handle_text( text_string )
TODO describe
Parsing a document produces this event structure:
<Document start_line="543">
...all events...
</Document>
The value of the start_line attribute will be the line number of the first Pod directive in the document.
If there is no Pod in the given document, then the event structure will be this:
<Document contentless="1" start_line="543">
</Document>
In that case, the value of the start_line attribute will not be meaningful; under current implementations, it will probably be the line number of the last line in the file.
Parsing a plain (non-verbatim, non-directive, non-data) paragraph in a Pod document produces this event structure:
<Para start_line="543">
...all events in this paragraph...
</Para>
The value of the start_line attribute will be the line number of the start of the paragraph.
For example, parsing this paragraph of Pod:
The value of the I<start_line> attribute will be the
line number of the start of the paragraph.
produces this event structure:
<Para start_line="129">
The value of the
<I>
start_line
</I>
attribute will be the line number of the first Pod directive
in the document.
</Para>
Parsing a B<...> formatting code (or of course any of its semantically identical syntactic variants B<< ... >>, or B<<<< ... >>>>, etc.) produces this event structure:
<B>
...stuff...
</B>
Currently, there are no attributes conveyed.
Parsing C, F, or I codes produce the same structure, with only a different element name.
If your parser object has been set to accept other formatting codes, then they will be presented like these B/C/F/I codes -- i.e., without any attributes.
Normally, parsing an S<...> sequence produces this event structure, just as if it were a B/C/F/I code:
<S>
...stuff...
</S>
However, Pod::Simple (and presumably all derived parsers) offers the nbsp_for_S
option which, if enabled, will suppress all S events, and instead change all spaces in the content to non-breaking spaces. This is intended for formatters that output to a format that has no code that means the same as S<...>, but which has a code/character that means non-breaking space.
Normally, parsing an X<...> sequence produces this event structure, just as if it were a B/C/F/I code:
<X>
...stuff...
</X>
However, Pod::Simple (and presumably all derived parsers) offers the nix_X_codes
option which, if enabled, will suppress all X events and ignore their content. For formatters/processors that don't use X events, this is presumably quite useful.