flox: fast & furious GroupBy reductions for dask.array

GitHub Workflow CI Status pre-commit.ci status image Documentation Status

PyPI Conda-forge

NASA-80NSSC18M0156 NASA-80NSSC22K0345

Overview

flox mainly provides strategies for fast GroupBy reductions with dask.array. flox uses the MapReduce paradigm (or a “tree reduction”) to run the GroupBy operation in a parallel-native way totally avoiding a sort or shuffle operation. It was motivated by

  1. Dask Dataframe GroupBy blogpost

  2. numpy_groupies in Xarray issue

See a presentation (video, slides) about this package, from the Pangeo Showcase.

Why flox?

  1. flox.groupby_reduce() wraps the numpy-groupies package for performant Groupby reductions on nD arrays.

  2. flox.groupby_reduce() provides parallel-friendly strategies for GroupBy reductions by wrapping numpy-groupies for dask arrays.

  3. flox integrates with xarray to provide more performant Groupby and Resampling operations.

  4. flox.xarray.xarray_reduce()