lsdb.loaders.dataframe.from_dataframe_utils#

Functions#

_generate_dask_dataframe(...)

Create the Dask Dataframe from the list of HEALPix pixel Dataframes

_convert_dtypes_to_pyarrow(→ pandas.DataFrame)

Transform the columns (and index) of a Pandas DataFrame to pyarrow types.

_format_margin_partition_dataframe(...)

Finalizes the dataframe for a margin catalog partition

_extra_property_dict(est_size_bytes)

Create a dictionary of additional fields to store in the properties file.

_has_named_index(→ bool)

Heuristic to determine if a dataframe has some meaningful index.

Module Contents#

_generate_dask_dataframe(pixel_dfs: list[nested_pandas.NestedFrame], pixels: list[hats.pixel_math.HealpixPixel], use_pyarrow_types: bool = True) tuple[lsdb.nested.NestedFrame, int][source]#

Create the Dask Dataframe from the list of HEALPix pixel Dataframes

Parameters:
  • pixel_dfs (List[npd.NestedFrame]) – The list of HEALPix pixel Dataframes

  • pixels (List[HealpixPixel]) – The list of HEALPix pixels in the catalog

  • use_pyarrow_types (bool) – If True, use pyarrow types. Defaults to True.

Returns:

The catalog’s Dask Dataframe and its total number of rows.

_convert_dtypes_to_pyarrow(df: pandas.DataFrame) pandas.DataFrame[source]#

Transform the columns (and index) of a Pandas DataFrame to pyarrow types.

Parameters:

df (pd.DataFrame) – A Pandas DataFrame

Returns:

A new DataFrame, with columns of pyarrow types. The return value is a shallow copy of the initial DataFrame to avoid copying the data.

_format_margin_partition_dataframe(dataframe: nested_pandas.NestedFrame) nested_pandas.NestedFrame[source]#

Finalizes the dataframe for a margin catalog partition

Parameters:

dataframe (pd.DataFrame) – The partition dataframe

Returns:

The dataframe for a margin partition with the data points and the respective pixel information.

_extra_property_dict(est_size_bytes: int)[source]#

Create a dictionary of additional fields to store in the properties file.

_has_named_index(dataframe: nested_pandas.NestedFrame) bool[source]#

Heuristic to determine if a dataframe has some meaningful index.

This will reject dataframes with no index name for a single index, or empty names for multi-index (e.g. [] or [None]).