lsdb.dask.merge_catalog_functions#

Functions#

concat_partition_and_margin(→ nested_pandas.NestedFrame)

Concatenates a partition and margin dataframe together

remove_hips_columns(df)

Removes any HIPS Norder, Dir, and Npix columns from a dataframe

align_catalogs(→ hats.pixel_tree.PixelAlignment)

Aligns two catalogs, also using the right catalog's margin if it exists

align_and_apply(→ list[dask.delayed.Delayed])

Aligns catalogs to a given ordering of pixels and applies a function each set of aligned partitions

perform_align_and_apply_func(num_partitions, func, ...)

Performs the function inside align_and_apply and updates hive columns

filter_by_spatial_index_to_pixel(...)

Filters a catalog dataframe to the points within a specified HEALPix pixel using the spatial index

construct_catalog_args(...)

Constructs the arguments needed to create a catalog from a list of delayed partitions

get_healpix_pixels_from_alignment(...)

Gets the list of primary and join pixels as the HealpixPixel class from a PixelAlignment

generate_meta_df_for_joined_tables(...)

Generates a Dask meta DataFrame that would result from joining two catalogs

generate_meta_df_for_nested_tables(...)

Generates a Dask meta DataFrame that would result from joining two catalogs, adding the right as a

concat_metas(metas)

Concats the columns of a sequence of dask metas into a single NestedFrame meta

get_partition_map_from_alignment_pixels(...)

Gets a dictionary mapping HEALPix pixel to index of pixel in the pixel_mapping of a PixelAlignment

align_catalog_to_partitions(...)

Aligns the partitions of a Catalog to a dataframe with HEALPix pixels in each row

create_merged_catalog_info(→ hats.catalog.TableProperties)

Creates the catalog info of the resulting catalog from merging two catalogs

Module Contents#

concat_partition_and_margin(partition: nested_pandas.NestedFrame, margin: nested_pandas.NestedFrame | None) nested_pandas.NestedFrame[source]#

Concatenates a partition and margin dataframe together

Parameters:
  • partition (npd.NestedFrame) – The partition dataframe

  • margin (npd.NestedFrame) – The margin dataframe

Returns:

The concatenated dataframe with the partition on top and the margin on the bottom

remove_hips_columns(df: nested_pandas.NestedFrame | None)[source]#

Removes any HIPS Norder, Dir, and Npix columns from a dataframe

Parameters:

df (npd.NestedFrame) – The catalog dataframe

Returns:

The dataframe with the columns removed

align_catalogs(left: lsdb.catalog.catalog.Catalog, right: lsdb.catalog.catalog.Catalog, add_right_margin: bool = True) hats.pixel_tree.PixelAlignment[source]#

Aligns two catalogs, also using the right catalog’s margin if it exists

Parameters:
  • left (lsdb.Catalog) – The left catalog to align

  • right (lsdb.Catalog) – The right catalog to align

  • add_right_margin (bool) – If True, when using MOCs to align catalogs, adds a border to the right catalog’s moc to include the margin of the right catalog, if it exists. Defaults to True.

Returns:

The PixelAlignment object from aligning the catalogs

align_and_apply(catalog_mappings: list[tuple[lsdb.catalog.dataset.healpix_dataset.HealpixDataset | None, list[hats.pixel_math.HealpixPixel]]], func: Callable, *args, **kwargs) list[dask.delayed.Delayed][source]#

Aligns catalogs to a given ordering of pixels and applies a function each set of aligned partitions

Parameters:
  • catalog_mappings (List[Tuple[HealpixDataset, List[HealpixPixel]]]) – The catalogs and their corresponding ordering of pixels to align the partitions to. Catalog cane be None, in which case None will be passed to the function for each partition. Each list of pixels should be the same length. Example input: [(catalog, pixels), (catalog2, pixels2), …]

  • func (Callable) –

    The function to apply to the aligned catalogs. The function should take the aligned partitions of the catalogs as dataframes as the first arguments, followed by the healpix pixel of each partition, the hc_structures of the catalogs, and any additional arguments and keyword arguments. For example:

    def func(
        cat1_partition_df,
        cat2_partition_df,
        cat1_pixel,
        cat2_pixel,
        cat1_hc_structure,
        cat2_hc_structure,
        *args,
        **kwargs
    ):
        ...
    

  • *args – Additional arguments to pass to the function

  • **kwargs – Additional keyword arguments to pass to the function

Returns:

A list of delayed objects, each one representing the result of the function applied to the aligned partitions of the catalogs

perform_align_and_apply_func(num_partitions, func, *args, **kwargs)[source]#

Performs the function inside align_and_apply and updates hive columns

filter_by_spatial_index_to_pixel(dataframe: nested_pandas.NestedFrame, order: int, pixel: int) nested_pandas.NestedFrame[source]#

Filters a catalog dataframe to the points within a specified HEALPix pixel using the spatial index

Parameters:
  • dataframe (npd.NestedFrame) – The dataframe to filter

  • order (int) – The order of the HEALPix pixel to filter to

  • pixel (int) – The pixel number in NESTED numbering of the HEALPix pixel to filter to

Returns:

The filtered dataframe with only the rows that are within the specified HEALPix pixel

construct_catalog_args(partitions: list[dask.delayed.Delayed], meta_df: nested_pandas.NestedFrame, alignment: hats.pixel_tree.PixelAlignment) tuple[nested_dask.NestedFrame, lsdb.types.DaskDFPixelMap, hats.pixel_tree.PixelAlignment][source]#

Constructs the arguments needed to create a catalog from a list of delayed partitions

Parameters:
  • partitions (List[Delayed]) – The list of delayed partitions to create the catalog from

  • meta_df (npd.NestedFrame) – The dask meta schema for the partitions

  • alignment (PixelAlignment) – The alignment used to create the delayed partitions

Returns:

A tuple of (ddf, partition_map, alignment) with the dask dataframe, the partition map, and the alignment needed to create the catalog

get_healpix_pixels_from_alignment(alignment: hats.pixel_tree.PixelAlignment) tuple[list[hats.pixel_math.HealpixPixel], list[hats.pixel_math.HealpixPixel]][source]#

Gets the list of primary and join pixels as the HealpixPixel class from a PixelAlignment

Parameters:

alignment (PixelAlignment) – the PixelAlignment to get pixels from

Returns:

a tuple of (primary_pixels, join_pixels) with lists of HealpixPixel objects

generate_meta_df_for_joined_tables(catalogs: Sequence[lsdb.catalog.catalog.Catalog], suffixes: Sequence[str], extra_columns: pandas.DataFrame | None = None, index_name: str = SPATIAL_INDEX_COLUMN, index_type: numpy.typing.DTypeLike | None = None) nested_pandas.NestedFrame[source]#

Generates a Dask meta DataFrame that would result from joining two catalogs

Creates an empty dataframe with the columns of each catalog appended with a suffix. Allows specifying extra columns that should also be added, and the name of the index of the resulting dataframe.

Parameters:
  • catalogs (Sequence[lsdb.Catalog]) – The catalogs to merge together

  • suffixes (Sequence[Str]) – The column suffixes to apply each catalog

  • extra_columns (pd.Dataframe) – Any additional columns to the merged catalogs

  • index_name (str) – The name of the index in the resulting DataFrame

  • index_type (npt.DTypeLike) – The type of the index in the resulting DataFrame. Default: type of index in the first catalog

Returns:

An empty dataframe with the columns of each catalog with their respective suffix, and any extra columns specified, with the index name set.

generate_meta_df_for_nested_tables(catalogs: Sequence[lsdb.catalog.catalog.Catalog], nested_catalog: lsdb.catalog.catalog.Catalog, nested_column_name: str, join_column_name: str, extra_columns: pandas.DataFrame | None = None, index_name: str = SPATIAL_INDEX_COLUMN, index_type: numpy.typing.DTypeLike | None = None) nested_pandas.NestedFrame[source]#

Generates a Dask meta DataFrame that would result from joining two catalogs, adding the right as a nested frame

Creates an empty dataframe with the columns of the left catalog, and a nested column with the right catalog. Allows specifying extra columns that should also be added, and the name of the index of the resulting dataframe.

Parameters:
  • catalogs (Sequence[lsdb.Catalog]) – The catalogs to merge together

  • nested_catalog (Catalog) – The catalog to add as a nested column

  • nested_column_name (str) – The name of the nested column

  • join_column_name (str) – The name of the column in the right catalog to join on

  • extra_columns (pd.Dataframe) – Any additional columns to the merged catalogs

  • index_name (str) – The name of the index in the resulting DataFrame

  • index_type (npt.DTypeLike) – The type of the index in the resulting DataFrame

Returns:

An empty dataframe with the right catalog joined to the left as a nested column, and any extra columns specified, with the index name set.

concat_metas(metas: Sequence[nested_pandas.NestedFrame | dict])[source]#

Concats the columns of a sequence of dask metas into a single NestedFrame meta

Parameters:

metas (Sequence[dict | DataFrame]) – A collection of dask meta inputs

Returns:

(npd.NestedFrame) An empty NestedFrame with the columns of the input metas concatenated together in the order of the input sequence.

get_partition_map_from_alignment_pixels(join_pixels: pandas.DataFrame) lsdb.types.DaskDFPixelMap[source]#

Gets a dictionary mapping HEALPix pixel to index of pixel in the pixel_mapping of a PixelAlignment

Parameters:

join_pixels (pd.DataFrame) – The pixel_mapping from a PixelAlignment object

Returns:

A dictionary mapping HEALPix pixel to the index that the pixel occurs in the pixel_mapping table

align_catalog_to_partitions(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset | None, pixels: list[hats.pixel_math.HealpixPixel]) list[dask.delayed.Delayed | None][source]#

Aligns the partitions of a Catalog to a dataframe with HEALPix pixels in each row

Parameters:
  • catalog – the catalog to align

  • pixels – the list of HealpixPixels specifying the order of partitions

Returns:

A list of dask delayed objects, each one representing the data in a HEALPix pixel in the order they appear in the input dataframe

create_merged_catalog_info(left_info: hats.catalog.TableProperties, right_info: hats.catalog.TableProperties, updated_name: str, suffixes: tuple[str, str]) hats.catalog.TableProperties[source]#

Creates the catalog info of the resulting catalog from merging two catalogs

Updates the ra and dec columns names, and any default columns by adding the correct suffixes, updates the catalog name, and sets the total rows to 0

Parameters:
  • left_info (TableProperties) – The catalog_info of the left catalog

  • right_info (TableProperties) – The catalog_info of the right catalog

  • updated_name (str) – The updated name of the catalog

  • suffixes (tuple[str, str]) – The suffixes of the catalogs in the merged result