Skip to content

daft.functions.regexp#

regexp #

regexp(expr: Expression, pattern: str | Expression) -> Expression

Check whether each string matches the given regular expression pattern in a string column.

Parameters:

Name Type Description Default
expr Expression

String expression to search in

required
pattern str | Expression

Regex pattern to search for as string or as a column to pick values from

required

Returns:

Name Type Description
Expression Expression

a Boolean expression indicating whether each value matches the provided pattern

Examples:

1
2
3
4
5
>>> import daft
>>> from daft.functions import regexp
>>>
>>> df = daft.from_pydict({"x": ["foo", "bar", "baz"]})
>>> df.with_column("match", regexp(df["x"], "ba.")).collect()
╭────────┬───────╮
│ x      ┆ match │
│ ---    ┆ ---   │
│ String ┆ Bool  │
╞════════╪═══════╡
│ foo    ┆ false │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ bar    ┆ true  │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ baz    ┆ true  │
╰────────┴───────╯
(Showing first 3 of 3 rows)
Source code in daft/functions/str.py
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
def regexp(expr: Expression, pattern: str | Expression) -> Expression:
    """Check whether each string matches the given regular expression pattern in a string column.

    Args:
        expr: String expression to search in
        pattern: Regex pattern to search for as string or as a column to pick values from

    Returns:
        Expression: a Boolean expression indicating whether each value matches the provided pattern

    Examples:
        >>> import daft
        >>> from daft.functions import regexp
        >>>
        >>> df = daft.from_pydict({"x": ["foo", "bar", "baz"]})
        >>> df.with_column("match", regexp(df["x"], "ba.")).collect()
        ╭────────┬───────╮
        │ x      ┆ match │
        │ ---    ┆ ---   │
        │ String ┆ Bool  │
        ╞════════╪═══════╡
        │ foo    ┆ false │
        ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
        │ bar    ┆ true  │
        ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
        │ baz    ┆ true  │
        ╰────────┴───────╯
        <BLANKLINE>
        (Showing first 3 of 3 rows)

    """
    return Expression._call_builtin_scalar_fn("regexp_match", expr, pattern)