lsdb.io.to_association#
Functions#
|
Writes a pandas dataframe to a single parquet file and returns the total count |
|
Writes a crossmatching product to disk, in HATS association table format. |
|
Saves catalog partitions as parquet to disk and computes the sparse |
|
Helper function to perform validation of user-inputted catalog and column arguments. |
Module Contents#
- perform_write(df: nested_pandas.NestedFrame, hp_pixel: hats.pixel_math.HealpixPixel, base_catalog_dir: str | pathlib.Path | upath.UPath, **kwargs) int [source]#
Writes a pandas dataframe to a single parquet file and returns the total count for the partition as well as a count histogram at the specified order.
- Parameters:
df (npd.NestedFrame) – dataframe to write to file
hp_pixel (HealpixPixel) – HEALPix pixel of file to be written
base_catalog_dir (path-like) – Location of the base catalog directory to write to
histogram_order (int) – Order of the count histogram
**kwargs – other kwargs to pass to pq.write_table method
- Returns:
The total number of points on the partition and the sparse count histogram at the specified order.
- to_association(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset, *, base_catalog_path: str | pathlib.Path | upath.UPath, catalog_name: str | None = None, primary_catalog_dir: str | pathlib.Path | upath.UPath | None = None, primary_column_association: str | None = None, primary_id_column: str | None = None, join_catalog_dir: str | pathlib.Path | upath.UPath | None = None, join_column_association: str | None = None, join_to_primary_id_column: str | None = None, join_id_column: str | None = None, overwrite: bool = False, **kwargs)[source]#
Writes a crossmatching product to disk, in HATS association table format. The output catalog comprises partition parquet files and respective metadata.
The column name arguments should reflect the column names on the corresponding primary and join OBJECT catalogs, so that the association table can be used to perform equijoins on the two sides and recreate the crossmatch.
To configure the appropriate column names, consider two tables that do not share an identifier space (e.g. two surveys), and the way you could go about joining them together with an association table:
TABLE GAIA_SOURCE { DESIGNATION <primary key> } TABLE SDSS { SDSS_ID <primary key> }
And a SQL query to join them with as association table would look like:
SELECT g.DESIGNATION as gaia_id, s.SDSS_ID as sdss_id FROM GAIA_SOURCE g JOIN association_table a ON a.primary_id_column = g.DESIGNATION JOIN SDSS s ON a.join_id_column = s.SDSS_ID
Consider instead an object table, joining to a detection table:
TABLE OBJECT { ID <primary key> } TABLE DETECTION { DETECTION_ID <primary key> OBJECT_ID <foreign key> }
And a SQL query to join them would look like:
SELECT o.ID as object_id, d.DETECTION_ID as detection_id FROM OBJECT o JOIN DETECTION d ON o.ID = d.OBJECT_ID
This is important, as there are three different column names, but really only two meaningful identifiers. For this example, the arguments for this method would be as follows:
primary_id_column = "ID", join_to_primary_id_column = "OBJECT_ID", join_id_column = "DETECTION_ID",
- Parameters:
catalog (HealpixDataset) – A catalog to export
base_catalog_path (str) – Location where catalog is saved to
catalog_name (str) – The name of the output catalog
overwrite (bool) – If True existing catalog is overwritten
**kwargs – Arguments to pass to the parquet write operations
- write_partitions(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset, base_catalog_dir_fp: str | pathlib.Path | upath.UPath, **kwargs) tuple[list[hats.pixel_math.HealpixPixel], list[int]] [source]#
Saves catalog partitions as parquet to disk and computes the sparse count histogram for each partition. The histogram is either of order 8 or the maximum pixel order in the catalog, whichever is greater.
- Parameters:
catalog (HealpixDataset) – A catalog to export
base_catalog_dir_fp (path-like) – Path to the base directory of the catalog
histogram_order – The order of the count histogram to generate
**kwargs – Arguments to pass to the parquet write operations
- Returns:
A tuple with the array of non-empty pixels, the array with the total counts as well as the array with the sparse count histograms.
- _check_catalogs_and_columns(catalog_columns, primary_catalog_dir: str | pathlib.Path | upath.UPath | None = None, primary_column_association: str | None = None, primary_id_column: str | None = None, join_catalog_dir: str | pathlib.Path | upath.UPath | None = None, join_column_association: str | None = None, join_to_primary_id_column: str | None = None, join_id_column: str | None = None)[source]#
Helper function to perform validation of user-inputted catalog and column arguments.
- Returns:
dictionary to be used in creation of TableProperties