lsdb.dask.merge_catalog_functions#
Functions#
|
Concatenates a partition and margin dataframe together |
Removes any HIPS Norder, Dir, and Npix columns from a dataframe |
|
|
Aligns two catalogs, also using the right catalog's margin if it exists |
|
Aligns catalogs to a given ordering of pixels and applies a function each set of aligned partitions |
|
Performs the function inside align_and_apply and updates hive columns |
Filters a catalog dataframe to the points within a specified HEALPix pixel using the spatial index |
|
Constructs the arguments needed to create a catalog from a list of delayed partitions |
|
Gets the list of primary and join pixels as the HealpixPixel class from a PixelAlignment |
|
Generates a Dask meta DataFrame that would result from joining two catalogs |
|
Generates a Dask meta DataFrame that would result from joining two catalogs, adding the right as a |
|
|
Concats the columns of a sequence of dask metas into a single NestedFrame meta |
Gets a dictionary mapping HEALPix pixel to index of pixel in the pixel_mapping of a PixelAlignment |
|
Aligns the partitions of a Catalog to a dataframe with HEALPix pixels in each row |
|
|
Creates the catalog info of the resulting catalog from merging two catalogs |
Module Contents#
- concat_partition_and_margin(partition: nested_pandas.NestedFrame, margin: nested_pandas.NestedFrame | None) nested_pandas.NestedFrame [source]#
Concatenates a partition and margin dataframe together
- Parameters:
partition (npd.NestedFrame) – The partition dataframe
margin (npd.NestedFrame) – The margin dataframe
- Returns:
The concatenated dataframe with the partition on top and the margin on the bottom
- remove_hips_columns(df: nested_pandas.NestedFrame | None)[source]#
Removes any HIPS Norder, Dir, and Npix columns from a dataframe
- Parameters:
df (npd.NestedFrame) – The catalog dataframe
- Returns:
The dataframe with the columns removed
- align_catalogs(left: lsdb.catalog.catalog.Catalog, right: lsdb.catalog.catalog.Catalog, add_right_margin: bool = True) hats.pixel_tree.PixelAlignment [source]#
Aligns two catalogs, also using the right catalog’s margin if it exists
- Parameters:
left (lsdb.Catalog) – The left catalog to align
right (lsdb.Catalog) – The right catalog to align
add_right_margin (bool) – If True, when using MOCs to align catalogs, adds a border to the right catalog’s moc to include the margin of the right catalog, if it exists. Defaults to True.
- Returns:
The PixelAlignment object from aligning the catalogs
- align_and_apply(catalog_mappings: list[tuple[lsdb.catalog.dataset.healpix_dataset.HealpixDataset | None, list[hats.pixel_math.HealpixPixel]]], func: Callable, *args, **kwargs) list[dask.delayed.Delayed] [source]#
Aligns catalogs to a given ordering of pixels and applies a function each set of aligned partitions
- Parameters:
catalog_mappings (List[Tuple[HealpixDataset, List[HealpixPixel]]]) – The catalogs and their corresponding ordering of pixels to align the partitions to. Catalog cane be None, in which case None will be passed to the function for each partition. Each list of pixels should be the same length. Example input: [(catalog, pixels), (catalog2, pixels2), …]
func (Callable) –
The function to apply to the aligned catalogs. The function should take the aligned partitions of the catalogs as dataframes as the first arguments, followed by the healpix pixel of each partition, the hc_structures of the catalogs, and any additional arguments and keyword arguments. For example:
def func( cat1_partition_df, cat2_partition_df, cat1_pixel, cat2_pixel, cat1_hc_structure, cat2_hc_structure, *args, **kwargs ): ...
*args – Additional arguments to pass to the function
**kwargs – Additional keyword arguments to pass to the function
- Returns:
A list of delayed objects, each one representing the result of the function applied to the aligned partitions of the catalogs
- perform_align_and_apply_func(num_partitions, func, *args, **kwargs)[source]#
Performs the function inside align_and_apply and updates hive columns
- filter_by_spatial_index_to_pixel(dataframe: nested_pandas.NestedFrame, order: int, pixel: int) nested_pandas.NestedFrame [source]#
Filters a catalog dataframe to the points within a specified HEALPix pixel using the spatial index
- Parameters:
dataframe (npd.NestedFrame) – The dataframe to filter
order (int) – The order of the HEALPix pixel to filter to
pixel (int) – The pixel number in NESTED numbering of the HEALPix pixel to filter to
- Returns:
The filtered dataframe with only the rows that are within the specified HEALPix pixel
- construct_catalog_args(partitions: list[dask.delayed.Delayed], meta_df: nested_pandas.NestedFrame, alignment: hats.pixel_tree.PixelAlignment) tuple[nested_dask.NestedFrame, lsdb.types.DaskDFPixelMap, hats.pixel_tree.PixelAlignment] [source]#
Constructs the arguments needed to create a catalog from a list of delayed partitions
- Parameters:
partitions (List[Delayed]) – The list of delayed partitions to create the catalog from
meta_df (npd.NestedFrame) – The dask meta schema for the partitions
alignment (PixelAlignment) – The alignment used to create the delayed partitions
- Returns:
A tuple of (ddf, partition_map, alignment) with the dask dataframe, the partition map, and the alignment needed to create the catalog
- get_healpix_pixels_from_alignment(alignment: hats.pixel_tree.PixelAlignment) tuple[list[hats.pixel_math.HealpixPixel], list[hats.pixel_math.HealpixPixel]] [source]#
Gets the list of primary and join pixels as the HealpixPixel class from a PixelAlignment
- Parameters:
alignment (PixelAlignment) – the PixelAlignment to get pixels from
- Returns:
a tuple of (primary_pixels, join_pixels) with lists of HealpixPixel objects
- generate_meta_df_for_joined_tables(catalogs: Sequence[lsdb.catalog.catalog.Catalog], suffixes: Sequence[str], extra_columns: pandas.DataFrame | None = None, index_name: str = SPATIAL_INDEX_COLUMN, index_type: numpy.typing.DTypeLike | None = None) nested_pandas.NestedFrame [source]#
Generates a Dask meta DataFrame that would result from joining two catalogs
Creates an empty dataframe with the columns of each catalog appended with a suffix. Allows specifying extra columns that should also be added, and the name of the index of the resulting dataframe.
- Parameters:
catalogs (Sequence[lsdb.Catalog]) – The catalogs to merge together
suffixes (Sequence[Str]) – The column suffixes to apply each catalog
extra_columns (pd.Dataframe) – Any additional columns to the merged catalogs
index_name (str) – The name of the index in the resulting DataFrame
index_type (npt.DTypeLike) – The type of the index in the resulting DataFrame. Default: type of index in the first catalog
- Returns:
An empty dataframe with the columns of each catalog with their respective suffix, and any extra columns specified, with the index name set.
- generate_meta_df_for_nested_tables(catalogs: Sequence[lsdb.catalog.catalog.Catalog], nested_catalog: lsdb.catalog.catalog.Catalog, nested_column_name: str, join_column_name: str, extra_columns: pandas.DataFrame | None = None, index_name: str = SPATIAL_INDEX_COLUMN, index_type: numpy.typing.DTypeLike | None = None) nested_pandas.NestedFrame [source]#
Generates a Dask meta DataFrame that would result from joining two catalogs, adding the right as a nested frame
Creates an empty dataframe with the columns of the left catalog, and a nested column with the right catalog. Allows specifying extra columns that should also be added, and the name of the index of the resulting dataframe.
- Parameters:
catalogs (Sequence[lsdb.Catalog]) – The catalogs to merge together
nested_catalog (Catalog) – The catalog to add as a nested column
nested_column_name (str) – The name of the nested column
join_column_name (str) – The name of the column in the right catalog to join on
extra_columns (pd.Dataframe) – Any additional columns to the merged catalogs
index_name (str) – The name of the index in the resulting DataFrame
index_type (npt.DTypeLike) – The type of the index in the resulting DataFrame
- Returns:
An empty dataframe with the right catalog joined to the left as a nested column, and any extra columns specified, with the index name set.
- concat_metas(metas: Sequence[nested_pandas.NestedFrame | dict])[source]#
Concats the columns of a sequence of dask metas into a single NestedFrame meta
- Parameters:
metas (Sequence[dict | DataFrame]) – A collection of dask meta inputs
- Returns:
(npd.NestedFrame) An empty NestedFrame with the columns of the input metas concatenated together in the order of the input sequence.
- get_partition_map_from_alignment_pixels(join_pixels: pandas.DataFrame) lsdb.types.DaskDFPixelMap [source]#
Gets a dictionary mapping HEALPix pixel to index of pixel in the pixel_mapping of a PixelAlignment
- Parameters:
join_pixels (pd.DataFrame) – The pixel_mapping from a PixelAlignment object
- Returns:
A dictionary mapping HEALPix pixel to the index that the pixel occurs in the pixel_mapping table
- align_catalog_to_partitions(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset | None, pixels: list[hats.pixel_math.HealpixPixel]) list[dask.delayed.Delayed | None] [source]#
Aligns the partitions of a Catalog to a dataframe with HEALPix pixels in each row
- Parameters:
catalog – the catalog to align
pixels – the list of HealpixPixels specifying the order of partitions
- Returns:
A list of dask delayed objects, each one representing the data in a HEALPix pixel in the order they appear in the input dataframe
- create_merged_catalog_info(left_info: hats.catalog.TableProperties, right_info: hats.catalog.TableProperties, updated_name: str, suffixes: tuple[str, str]) hats.catalog.TableProperties [source]#
Creates the catalog info of the resulting catalog from merging two catalogs
Updates the ra and dec columns names, and any default columns by adding the correct suffixes, updates the catalog name, and sets the total rows to 0
- Parameters:
left_info (TableProperties) – The catalog_info of the left catalog
right_info (TableProperties) – The catalog_info of the right catalog
updated_name (str) – The updated name of the catalog
suffixes (tuple[str, str]) – The suffixes of the catalogs in the merged result