Package trac :: Package util :: Module html

Module html

source code

Classes
  TracHTMLSanitizer
Sanitize HTML constructions which are potentially vector of phishing or XSS attacks, in user-supplied HTML.
  Deuglifier
Help base class used for cleaning up HTML riddled with <FONT COLOR=...> tags and replace them with appropriate <span class="...">.
  FormTokenInjector
Identify and protect forms from CSRF attacks.
Functions
Markup
escape(str, quotes=True)
Create a Markup instance from a string and escape special characters it may contain (<, >, & and ").
source code
unicode
unescape(text)
Reverse-escapes &, <, >, and " and returns a unicode object.
source code
unicode
stripentities(text, keepxmlentities=False)
Return a copy of the given text with any character or numeric entities replaced by the equivalent UTF-8 characters.
source code
unicode
striptags(text)
Return a copy of the text with any XML/HTML tags removed.
source code
 
plaintext(text, keeplinebreaks=True)
Extract the text elements from (X)HTML content
source code
 
find_element(frag, attr=None, cls=None, tag=None)
Return the first element in the fragment having the given attribute, class or tag, using a preorder depth-first search.
source code
 
is_safe_origin(safe_origins, uri, req=None)
Whether the given uri is a safe cross-origin.
source code
 
to_fragment(input)
Convert input to a Fragment object.
source code
 
valid_html_bytes(bytes) source code
Variables
  html = ElementFactory()
  tag = ElementFactory()
Function Details

escape(str, quotes=True)

source code 

Create a Markup instance from a string and escape special characters it may contain (<, >, & and ").

>>> escape('"1 < 2"')
Markup(u'&#34;1 &lt; 2&#34;')
>>> escape(['"1 < 2"'])
Markup(u"['&#34;1 &lt; 2&#34;']")

If the quotes parameter is set to False, the " character is left as is. Escaping quotes is generally only required for strings that are to be used in attribute values.

>>> escape('"1 < 2"', quotes=False)
Markup(u'"1 &lt; 2"')
>>> escape(['"1 < 2"'], quotes=False)
Markup(u'[\'"1 &lt; 2"\']')

However, escape behaves slightly differently with Markup and Fragment behave instances, as they are passed through unmodified.

>>> escape(Markup('"1 < 2 &#39;"'))
Markup(u'"1 < 2 &#39;"')
>>> escape(Markup('"1 < 2 &#39;"'), quotes=False)
Markup(u'"1 < 2 &#39;"')
>>> escape(tag.b('"1 < 2"'))
Markup(u'<b>"1 &lt; 2"</b>')
>>> escape(tag.b('"1 < 2"'), quotes=False)
Markup(u'<b>"1 &lt; 2"</b>')
Parameters:
  • text - the string to escape; if not a string, it is assumed that the input can be converted to a string
  • quotes - if True, double quote characters are escaped in addition to the other special characters
Returns: Markup
the escaped Markup string

unescape(text)

source code 

Reverse-escapes &, <, >, and " and returns a unicode object.

>>> unescape(Markup('1 &lt; 2'))
u'1 < 2'

If the provided text object is not a Markup instance, it is returned unchanged.

>>> unescape('1 &lt; 2')
'1 &lt; 2'
Parameters:
  • text - the text to unescape
Returns: unicode
the unescsaped string

stripentities(text, keepxmlentities=False)

source code 

Return a copy of the given text with any character or numeric entities replaced by the equivalent UTF-8 characters.

>>> stripentities('1 &lt; 2')
u'1 < 2'
>>> stripentities('more &hellip;')
u'more \u2026'
>>> stripentities('&#8230;')
u'\u2026'
>>> stripentities('&#x2026;')
u'\u2026'
>>> stripentities(Markup(u'\u2026'))
u'\u2026'

If the keepxmlentities parameter is provided and is a truth value, the core XML entities (&amp;, &apos;, &gt;, &lt; and &quot;) are left intact.

>>> stripentities('1 &lt; 2 &hellip;', keepxmlentities=True)
u'1 &lt; 2 \u2026'
Returns: unicode
a unicode instance with entities removed

striptags(text)

source code 

Return a copy of the text with any XML/HTML tags removed.

>>> striptags('<span>Foo</span> bar')
u'Foo bar'
>>> striptags('<span class="bar">Foo</span>')
u'Foo'
>>> striptags('Foo<br />')
u'Foo'

HTML/XML comments are stripped, too:

>>> striptags('<!-- <blub>hehe</blah> -->test')
u'test'
Parameters:
  • text - the string to remove tags from
Returns: unicode
a unicode instance with all tags removed

plaintext(text, keeplinebreaks=True)

source code 

Extract the text elements from (X)HTML content

>>> plaintext('<b>1 &lt; 2</b>')
u'1 < 2'
>>> plaintext(tag('1 ', tag.b('<'), ' 2'))
u'1 < 2'
>>> plaintext('''<b>1
... &lt;
... 2</b>''', keeplinebreaks=False)
u'1 < 2'
Parameters:
  • text - unicode or Fragment
  • keeplinebreaks - optionally keep linebreaks