lsdb.loaders.dataframe.margin_catalog_generator#

Classes#

MarginCatalogGenerator

Creates a HATS formatted margin catalog

Module Contents#

class MarginCatalogGenerator(catalog: lsdb.Catalog, margin_order: int = -1, margin_threshold: float | None = 5.0, use_pyarrow_types: bool = True, **kwargs)[source]#

Creates a HATS formatted margin catalog

dataframe: nested_pandas.NestedFrame[source]#
hc_structure[source]#
margin_threshold = 5.0[source]#
margin_order = -1[source]#
use_pyarrow_types = True[source]#
catalog_info_kwargs[source]#
_resolve_margin_order()[source]#

Calculate the order of the margin cache to be generated. If not provided the margin will be calculated based on the smallest pixel possible for the threshold.

Raises:

ValueError – if the margin order and thresholds are incompatible with the catalog.

create_catalog() lsdb.catalog.margin_catalog.MarginCatalog | None[source]#

Create a margin catalog for another pre-computed catalog.

Only one of margin order / threshold can be specified. If the margin order is not specified: if the threshold is zero the margin is an empty catalog; if the threshold is None, the margin is not generated (it is None).

Returns:

Margin catalog object or None if the margin is not generated.

_create_catalog() lsdb.catalog.margin_catalog.MarginCatalog[source]#

Create a non-empty margin catalog

_create_empty_catalog() lsdb.catalog.margin_catalog.MarginCatalog[source]#

Create an empty margin catalog

_get_margins() tuple[list[hats.pixel_math.HealpixPixel], list[nested_pandas.NestedFrame]][source]#

Generates the list of pixels that have margin data, and the dataframes with the margin data for each partition

Returns:

A tuple of the list of HealpixPixels corresponding to partitions that have margin data, and a list of the dataframes with the margin data for each partition.

_generate_dask_df_and_map(pixels: list[hats.pixel_math.HealpixPixel], partitions: list[pandas.DataFrame]) tuple[lsdb.nested.NestedFrame, dict[hats.pixel_math.HealpixPixel, int], int][source]#

Create the Dask Dataframe containing the data points in the margins for the catalog as well as the mapping of those HEALPix to Dataframes

Parameters:
  • pixels (List[HealpixPixel]) – The list of healpix pixels in the catalog with margins

  • partitions (List[pd.DataFrame]) – The list of dataframes containing the margin rows for each partition, aligned with the pixels list

Returns:

Tuple containing the Dask Dataframe, the mapping of margin HEALPix to the respective partitions and the total number of rows.

_find_margin_pixel_pairs(pixels: list[hats.pixel_math.HealpixPixel]) pandas.DataFrame[source]#

Calculate the pairs of catalog pixels and their margin pixels

Parameters:

pixels (List[HealpixPixel]) – The list of HEALPix to compute margin pixels for. These include the catalog pixels as well as the negative pixels.

Returns:

A Pandas Dataframe with the many-to-many mapping between each catalog HEALPix and the respective margin pixels.

_create_margins(margin_pairs_df: pandas.DataFrame) dict[hats.pixel_math.HealpixPixel, pandas.DataFrame][source]#

Compute the margins for all the pixels in the catalog

Parameters:

margin_pairs_df (pd.DataFrame) – A DataFrame containing all the combinations of catalog pixels and respective margin pixels

Returns:

A dictionary mapping each margin pixel to the respective DataFrame.

_create_catalog_info(catalog_name: str | None = None, **kwargs) hats.catalog.TableProperties[source]#

Create the margin catalog info object

Parameters:
  • catalog_name (str) – name of the PRIMARY catalog being created. this margin catalog will take on a name like <catalog_name>_margin.

  • **kwargs – Arguments to pass to the creation of the catalog info.

Returns:

The margin catalog info object.