Skip to content

daft.functions.split#

split #

split(expr: Expression, split_on: str | Expression) -> Expression

Splits each string on the given string, into a list of strings.

Parameters:

Name Type Description Default
expr Expression

The expression to split.

required
split_on str | Expression

The string on which each string should be split, or a column to pick such patterns from.

required

Returns:

Name Type Description
Expression Expression

A List[String] expression containing the string splits for each string in the column.

Examples:

1
2
3
4
>>> import daft
>>> from daft.functions import split
>>> df = daft.from_pydict({"data": ["daft.distributed.query", "a.b.c", "1.2.3"]})
>>> df.with_column("split", split(df["data"], ".")).collect()
╭────────────────────────┬────────────────────────────╮
│ data                   ┆ split                      │
│ ---                    ┆ ---                        │
│ String                 ┆ List[String]               │
╞════════════════════════╪════════════════════════════╡
│ daft.distributed.query ┆ [daft, distributed, query] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a.b.c                  ┆ [a, b, c]                  │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1.2.3                  ┆ [1, 2, 3]                  │
╰────────────────────────┴────────────────────────────╯
(Showing first 3 of 3 rows)
Source code in daft/functions/str.py
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
def split(expr: Expression, split_on: str | Expression) -> Expression:
    r"""Splits each string on the given string, into a list of strings.

    Args:
        expr: The expression to split.
        split_on: The string on which each string should be split, or a column to pick such patterns from.

    Returns:
        Expression: A List[String] expression containing the string splits for each string in the column.

    Examples:
        >>> import daft
        >>> from daft.functions import split
        >>> df = daft.from_pydict({"data": ["daft.distributed.query", "a.b.c", "1.2.3"]})
        >>> df.with_column("split", split(df["data"], ".")).collect()
        ╭────────────────────────┬────────────────────────────╮
        │ data                   ┆ split                      │
        │ ---                    ┆ ---                        │
        │ String                 ┆ List[String]               │
        ╞════════════════════════╪════════════════════════════╡
        │ daft.distributed.query ┆ [daft, distributed, query] │
        ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ a.b.c                  ┆ [a, b, c]                  │
        ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ 1.2.3                  ┆ [1, 2, 3]                  │
        ╰────────────────────────┴────────────────────────────╯
        <BLANKLINE>
        (Showing first 3 of 3 rows)
    """
    return Expression._call_builtin_scalar_fn("split", expr, split_on)