lsdb.core.crossmatch.abstract_crossmatch_algorithm#
Classes#
Abstract class used to write a crossmatch algorithm |
Module Contents#
- class AbstractCrossmatchAlgorithm(left: nested_pandas.NestedFrame, right: nested_pandas.NestedFrame, left_order: int, left_pixel: int, right_order: int, right_pixel: int, left_catalog_info: hats.catalog.TableProperties, right_catalog_info: hats.catalog.TableProperties, right_margin_catalog_info: hats.catalog.TableProperties | None)[source]#
Bases:
abc.ABC
Abstract class used to write a crossmatch algorithm
To specify a custom function, write a class that subclasses the AbstractCrossmatchAlgorithm class, and overwrite the perform_crossmatch function.
The function should be able to perform a crossmatch on two pandas DataFrames from a partition from each catalog. It should return two 1d numpy arrays of equal lengths with the indices of the matching rows from the left and right dataframes, and a dataframe with any extra columns generated by the crossmatch algorithm, also with the same length. These columns are specified in {AbstractCrossmatchAlgorithm.extra_columns}, with their respective data types, by means of an empty pandas dataframe. As an example, the KdTreeCrossmatch algorithm outputs a “_dist_arcsec” column with the distance between data points. Its extra_columns attribute is specified as follows:
pd.DataFrame({"_dist_arcsec": pd.Series(dtype=np.dtype("float64"))})
The class will have been initialized with the following parameters, which the crossmatch function should use:
left: npd.NestedFrame,
right: npd.NestedFrame,
left_order: int,
left_pixel: int,
right_order: int,
right_pixel: int,
left_metadata: hc.catalog.Catalog,
right_metadata: hc.catalog.Catalog,
right_margin_hc_structure: hc.margin.MarginCatalog,
suffixes: Tuple[str, str]
You may add any additional keyword argument parameters to the crossmatch function definition, and the user will be able to pass them in as kwargs in the Catalog.crossmatch method. Any additional keyword arguments must also be added to the CrossmatchAlgorithm.validate classmethod by overwriting the method.
- extra_columns: pandas.DataFrame | None = None[source]#
The metadata for the columns generated by the crossmatch algorithm
- crossmatch_nested(nested_column_name, **kwargs) nested_pandas.NestedFrame [source]#
Perform a crossmatch
- abstract perform_crossmatch() tuple[numpy.ndarray, numpy.ndarray, pandas.DataFrame] [source]#
Performs a crossmatch to get the indices of the matching rows and any extra columns
Any additional keyword arguments needed can be added to this method in the subclass, and the user will be able to pass them through the Catalog.crossmatch method.
- Returns:
- a numpy array with the indices of the matching rows from the left table
a numpy array with the indices of the matching rows from the right table
a pandas dataframe with any additional columns generated by the algorithm
These all must have the same lengths
- Return type:
A tuple of
- classmethod validate(left: lsdb.catalog.Catalog, right: lsdb.catalog.Catalog)[source]#
Validate the metadata and arguments.
This method will be called once, after the algorithm object has been initialized, during the lazy construction of the execution graph. This can be used to catch simple errors without waiting for an expensive
.compute()
call.This must accept any additional arguments the crossmatch method accepts.
- classmethod _append_extra_columns(dataframe: nested_pandas.NestedFrame, extra_columns: pandas.DataFrame | None = None)[source]#
Adds crossmatch extra columns to the resulting Dataframe.
- _create_crossmatch_df(left_idx: numpy.typing.NDArray[numpy.int64], right_idx: numpy.typing.NDArray[numpy.int64], extra_cols: pandas.DataFrame, suffixes: tuple[str, str]) nested_pandas.NestedFrame [source]#
Creates a df containing the crossmatch result from matching indices and additional columns
- Parameters:
left_idx (np.ndarray) – indices of the matching rows from the left table
right_idx (np.ndarray) – indices of the matching rows from the right table
extra_cols (pd.DataFrame) – dataframe containing additional columns from crossmatching
- Returns:
A dataframe with the matching rows from the left and right table concatenated together, with the additional columns added
- _create_nested_crossmatch_df(left_idx: numpy.typing.NDArray[numpy.int64], right_idx: numpy.typing.NDArray[numpy.int64], extra_cols: pandas.DataFrame, nested_column_name: str) nested_pandas.NestedFrame [source]#
Creates a df containing the crossmatch result from matching indices and additional columns
- Parameters:
left_idx (np.ndarray) – indices of the matching rows from the left table
right_idx (np.ndarray) – indices of the matching rows from the right table
extra_cols (pd.DataFrame) – dataframe containing additional columns from crossmatching
- Returns:
A dataframe with the matching rows from the left and right table concatenated together, with the additional columns added