per_partition_statistics

per_partition_statistics#

Catalog.per_partition_statistics(*, use_default_columns: bool = True, exclude_hats_columns: bool = True, exclude_columns: list[str] | None = None, include_columns: list[str] | None = None, only_numeric_columns: bool = False, include_stats: list[str] | None = None, multi_index: bool = False, per_row_group: bool = False, include_pixels: list[HealpixPixel] | None = None) DataFrame#

Read footer statistics in parquet metadata, and report on min/max values for for each data partition.

Parameters:
use_default_columnsbool, default True

Should we use only the columns that are loaded by default (will be set in the metadata by the catalog provider). Defaults to True.

exclude_hats_columnsbool, default True

Exclude HATS spatial and partitioning fields from the statistics. Defaults to True.

exclude_columnslist[str] or None, default None

Additional columns to exclude from the statistics.

include_columnslist[str] or None, default None

If specified, only return statistics for the column names provided. Defaults to None, and returns all non-hats columns.

only_numeric_columnsbool, default False

Only return statistics for numeric columns. This will prevent the returned dataframe from converting types to string.

include_statslist[str] or None, default None

If specified, only return the kinds of values from list (min_value, max_value, null_count, row_count). Defaults to None, and returns all values.

multi_indexbool, default False

Should the returned frame be created with a multi-index, first on pixel, then on column name? Default is False, and instead indexes on pixel, with separate columns per-data-column and stat value combination.

per_row_groupbool, default False

Should the returned frame contain a row per row-group, or aggregate the statistics to return only one row per data partition?

include_pixelslist[HealpixPixel] or None, default None

If specified, only return statistics for the pixels indicated. Defaults to none, and returns all pixels.

Returns:
pd.Dataframe

Dataframe with granular per-pixel statistics