Package trac :: Package util :: Module text

Module text

source code

Classes
  unicode_passwd
Conceal the actual content of the string when repr is called.
  UnicodeTextWrapper
Functions
 
to_unicode(text, charset=None)
Convert input to an unicode object.
source code
 
exception_to_unicode(e, traceback=False)
Convert an Exception to an unicode object.
source code
 
path_to_unicode(path)
Convert a filesystem path to unicode, using the filesystem encoding.
source code
 
stripws(text, leading=True, trailing=True)
Strips unicode white-spaces and ZWSPs from text.
source code
 
strip_line_ws(text, leading=True, trailing=True)
Strips unicode white-spaces and ZWSPs from each line of text.
source code
 
javascript_quote(text)
Quote strings for inclusion in single or double quote delimited Javascript strings
source code
 
to_js_string(text)
Embed the given string in a double quote delimited Javascript string (conform to the JSON spec)
source code
 
unicode_quote(value, safe='/')
A unicode aware version of urllib.quote
source code
 
unicode_quote_plus(value, safe='')
A unicode aware version of urllib.quote_plus.
source code
unicode
unicode_unquote(value)
A unicode aware version of urllib.unquote.
source code
 
unicode_urlencode(params, safe='')
A unicode aware version of urllib.urlencode.
source code
 
quote_query_string(text)
Quote strings for query string
source code
 
to_utf8(text, charset='latin1')
Convert input to a UTF-8 str object.
source code
 
stream_encoding(stream)
Return the appropriate encoding for the given stream.
source code
 
console_print(out, *args, **kwargs)
Output the given arguments to the console, encoding the output as appropriate.
source code
 
printout(*args, **kwargs)
Do a console_print on sys.stdout.
source code
 
printerr(*args, **kwargs)
Do a console_print on sys.stderr.
source code
 
raw_input(prompt)
Input one line from the console and converts it to unicode as appropriate.
source code
 
getpreferredencoding()
Return the encoding, which is retrieved on ahead, according to user preference.
source code
 
text_width(text, ambiwidth=1)
Determine the column width of text in Unicode characters.
source code
 
print_table(data, headers=None, sep=' ', out=None, ambiwidth=None)
Print data according to a tabular layout.
source code
 
shorten_line(text, maxlen=75)
Truncates text to length less than or equal to maxlen characters.
source code
 
wrap(t, cols=75, initial_indent='', subsequent_indent='', linesep='\n', ambiwidth=1)
Wraps the single paragraph in t, which contains unicode characters.
source code
 
obfuscate_email_address(address)
Replace anything looking like an e-mail address ('@something') with a trailing ellipsis ('@…')
source code
 
is_obfuscated(word)
Returns True if the word looks like an obfuscated e-mail address.
source code
 
breakable_path(path)
Make a path breakable after path separators, and conversely, avoid breaking at spaces.
source code
 
normalize_whitespace(text, to_space=u' ', remove=u'')
Normalize whitespace in a string, by replacing special spaces by normal spaces and removing zero-width spaces.
source code
 
unquote_label(txt)
Remove (one level of) enclosing single or double quotes.
source code
 
cleandoc(message)
Removes uniform indentation and leading/trailing whitespace.
source code
 
pretty_size(size, format='%.1f')
Pretty print content size information with appropriate unit.
source code
 
expandtabs(s, tabstop=8, ignoring=None)
Expand tab characters '\t' into spaces.
source code
 
fix_eol(text, eol)
Fix end-of-lines in a text.
source code
 
unicode_to_base64(text, strip_newlines=True)
Safe conversion of text to base64 representation using utf-8 bytes.
source code
 
unicode_from_base64(text)
Safe conversion of text to unicode based on utf-8 bytes.
source code
 
levenshtein_distance(lhs, rhs)
Return the Levenshtein distance between two strings.
source code
 
sub_vars(text, args)
Substitute $XYZ-style variables in a string with provided values.
source code
Variables
  CRLF = '\r\n'
  empty = u''
  sub_vars_re = re.compile(r'\$([A-Z_][A-Z0-9_]*)')
  __package__ = 'trac.util'
  c = u''
  i = 8233

Imports: __builtin__, locale, os, re, sys, textwrap, quote, quote_plus, unquote, east_asian_width


Function Details

to_unicode(text, charset=None)

source code 

Convert input to an unicode object.

For a str object, we'll first try to decode the bytes using the given charset encoding (or UTF-8 if none is specified), then we fall back to the latin1 encoding which might be correct or not, but at least preserves the original byte sequence by mapping each byte to the corresponding unicode code point in the range U+0000 to U+00FF.

For anything else, a simple unicode() conversion is attempted, with special care taken with Exception objects.

exception_to_unicode(e, traceback=False)

source code 

Convert an Exception to an unicode object.

In addition to to_unicode, this representation of the exception also contains the class name and optionally the traceback.

stripws(text, leading=True, trailing=True)

source code 
Strips unicode white-spaces and ZWSPs from text.
Parameters:
  • leading - strips leading spaces from text unless leading is False.
  • trailing - strips trailing spaces from text unless trailing is False.

strip_line_ws(text, leading=True, trailing=True)

source code 
Strips unicode white-spaces and ZWSPs from each line of text.
Parameters:
  • leading - strips leading spaces from text unless leading is False.
  • trailing - strips trailing spaces from text unless trailing is False.

unicode_quote(value, safe='/')

source code 
A unicode aware version of urllib.quote
Parameters:
  • value - anything that converts to a str. If unicode input is given, it will be UTF-8 encoded.
  • safe - as in quote, the characters that would otherwise be quoted but shouldn't here (defaults to '/')

unicode_quote_plus(value, safe='')

source code 
A unicode aware version of urllib.quote_plus.
Parameters:
  • value - anything that converts to a str. If unicode input is given, it will be UTF-8 encoded.
  • safe - as in quote_plus, the characters that would otherwise be quoted but shouldn't here (defaults to '/')

unicode_unquote(value)

source code 
A unicode aware version of urllib.unquote.
Parameters:
  • str - UTF-8 encoded str value (for example, as obtained by unicode_quote).
Returns: unicode

unicode_urlencode(params, safe='')

source code 

A unicode aware version of urllib.urlencode.

Values set to empty are converted to the key alone, without the equal sign.

to_utf8(text, charset='latin1')

source code 

Convert input to a UTF-8 str object.

If the input is not an unicode object, we assume the encoding is already UTF-8, ISO Latin-1, or as specified by the optional charset parameter.

console_print(out, *args, **kwargs)

source code 
Output the given arguments to the console, encoding the output as appropriate.
Parameters:
  • kwargs - newline controls whether a newline will be appended (defaults to True)

getpreferredencoding()

source code 

Return the encoding, which is retrieved on ahead, according to user preference.

We should use this instead of locale.getpreferredencoding() which is not thread-safe.

text_width(text, ambiwidth=1)

source code 

Determine the column width of text in Unicode characters.

The characters in the East Asian Fullwidth (F) or East Asian Wide (W) have a column width of 2. The other characters in the East Asian Halfwidth (H) or East Asian Narrow (Na) have a column width of 1.

That ambiwidth parameter is used for the column width of the East Asian Ambiguous (A). If 1, the same width as characters in US-ASCII. This is expected by most users. If 2, twice the width of US-ASCII characters. This is expected by CJK users.

cf. http://www.unicode.org/reports/tr11/.

print_table(data, headers=None, sep=' ', out=None, ambiwidth=None)

source code 
Print data according to a tabular layout.
Parameters:
  • data - a sequence of rows; assume all rows are of equal length.
  • headers - an optional row containing column headers; must be of the same length as each row in data.
  • sep - column separator
  • out - output file descriptor (None means use sys.stdout)
  • ambiwidth - column width of the East Asian Ambiguous (A). If None, detect ambiwidth with the locale settings. If others, pass to the ambiwidth parameter of text_width.

shorten_line(text, maxlen=75)

source code 

Truncates text to length less than or equal to maxlen characters.

This tries to be (a bit) clever and attempts to find a proper word boundary for doing so.

wrap(t, cols=75, initial_indent='', subsequent_indent='', linesep='\n', ambiwidth=1)

source code 

Wraps the single paragraph in t, which contains unicode characters. The every line is at most cols characters long.

That ambiwidth parameter is used for the column width of the East Asian Ambiguous (A). If 1, the same width as characters in US-ASCII. This is expected by most users. If 2, twice the width of US-ASCII characters. This is expected by CJK users.

is_obfuscated(word)

source code 
Returns True if the word looks like an obfuscated e-mail address.

Since: 1.2

unquote_label(txt)

source code 
Remove (one level of) enclosing single or double quotes.

.. versionadded :: 1.0

pretty_size(size, format='%.1f')

source code 
Pretty print content size information with appropriate unit.
Parameters:
  • size - number of bytes
  • format - can be used to adjust the precision shown

expandtabs(s, tabstop=8, ignoring=None)

source code 
Expand tab characters '\t' into spaces.
Parameters:
  • tabstop - number of space characters per tab (defaults to the canonical 8)
  • ignoring - if not None, the expansion will be "smart" and go from one tabstop to the next. In addition, this parameter lists characters which can be ignored when computing the indent.

unicode_to_base64(text, strip_newlines=True)

source code 

Safe conversion of text to base64 representation using utf-8 bytes.

Strips newlines from output unless strip_newlines is False.

sub_vars(text, args)

source code 
Substitute $XYZ-style variables in a string with provided values.
Parameters:
  • text - string containing variables to substitute.
  • args - dictionary with keys matching the variables to be substituted. The keys should not be prefixed with the $ character.