re

The Python re module provides support for working with regular expressions, allowing you to search, match, and manipulate strings using complex pattern descriptions.

Regular expressions are a powerful tool for text processing and can be used for tasks such as validation, parsing, and string manipulation.

Here’s a quick example:

Python
>>> import re
>>> pattern = r"\d+"
>>> re.findall(pattern, "There are 42 apples and 100 oranges")
['42', '100']

Key Features

  • Supports Perl-style regular expressions
  • Provides functions for searching, matching, and replacing text
  • Allows compilation of regular expressions for reuse
  • Supports advanced pattern features like lookaheads and backreferences

Frequently Used Classes and Functions

Object Type Description
re.compile() Function Compiles a regular expression pattern for reuse
re.match() Function Determines if the regex matches at the start of a string
re.search() Function Searches for a pattern match anywhere in the string
re.findall() Function Finds all non-overlapping matches in a string
re.sub() Function Replaces matches with a specified string
re.Match Class Contains match details and results

Examples

Compile a regular expression pattern for reuse:

Python
>>> pattern = re.compile(r"\b\w+\b")
>>> pattern.findall("This is a test.")
['This', 'is', 'a', 'test']

Search for a pattern match anywhere in the string:

Python
>>> match = re.search(r"\d+", "There are 42 apples")
>>> match.group()
'42'

Replace matches with a specified string:

Python
>>> re.sub(r"\d+", "#", "Room 123, Floor 4")
'Room #, Floor #'

Common Use Cases

  • Validating input data, such as email addresses, phone numbers, and other similar data
  • Parsing and extracting information from text files
  • Replacing or reformatting strings based on patterns
  • Tokenizing text for natural language processing

Real-World Example

Suppose you want to extract all email addresses from a block of text. You can use the re module to accomplish this:

Python
>>> text = "Contact us at support@example.com or sales@example.org"
>>> email_pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
>>> emails = re.findall(email_pattern, text)
>>> emails
['support@example.com', 'sales@example.org']

In this example, the re module efficiently extracts email addresses from the provided text, showcasing its power in text processing tasks.