lsdb.io.to_association#

Functions#

perform_write(→ int)

Writes a pandas dataframe to a single parquet file and returns the total count

to_association(catalog, *, base_catalog_path[, ...])

Writes a crossmatching product to disk, in HATS association table format.

write_partitions(...)

Saves catalog partitions as parquet to disk and computes the sparse

_check_catalogs_and_columns(catalog_columns[, ...])

Helper function to perform validation of user-inputted catalog and column arguments.

Module Contents#

perform_write(df: nested_pandas.NestedFrame, hp_pixel: hats.pixel_math.HealpixPixel, base_catalog_dir: str | pathlib.Path | upath.UPath, **kwargs) int[source]#

Writes a pandas dataframe to a single parquet file and returns the total count for the partition as well as a count histogram at the specified order.

Parameters:
  • df (npd.NestedFrame) – dataframe to write to file

  • hp_pixel (HealpixPixel) – HEALPix pixel of file to be written

  • base_catalog_dir (path-like) – Location of the base catalog directory to write to

  • histogram_order (int) – Order of the count histogram

  • **kwargs – other kwargs to pass to pq.write_table method

Returns:

The total number of points on the partition and the sparse count histogram at the specified order.

to_association(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset, *, base_catalog_path: str | pathlib.Path | upath.UPath, catalog_name: str | None = None, primary_catalog_dir: str | pathlib.Path | upath.UPath | None = None, primary_column_association: str | None = None, primary_id_column: str | None = None, join_catalog_dir: str | pathlib.Path | upath.UPath | None = None, join_column_association: str | None = None, join_to_primary_id_column: str | None = None, join_id_column: str | None = None, overwrite: bool = False, **kwargs)[source]#

Writes a crossmatching product to disk, in HATS association table format. The output catalog comprises partition parquet files and respective metadata.

The column name arguments should reflect the column names on the corresponding primary and join OBJECT catalogs, so that the association table can be used to perform equijoins on the two sides and recreate the crossmatch.

To configure the appropriate column names, consider two tables that do not share an identifier space (e.g. two surveys), and the way you could go about joining them together with an association table:

TABLE GAIA_SOURCE {
    DESIGNATION <primary key>
}

TABLE SDSS {
    SDSS_ID <primary key>
}

And a SQL query to join them with as association table would look like:

SELECT g.DESIGNATION as gaia_id, s.SDSS_ID as sdss_id
FROM GAIA_SOURCE g
JOIN association_table a
    ON a.primary_id_column = g.DESIGNATION
JOIN SDSS s
    ON a.join_id_column = s.SDSS_ID

Consider instead an object table, joining to a detection table:

TABLE OBJECT {
    ID <primary key>
}

TABLE DETECTION {
    DETECTION_ID <primary key>
    OBJECT_ID <foreign key>
}

And a SQL query to join them would look like:

SELECT o.ID as object_id, d.DETECTION_ID as detection_id
FROM OBJECT o
JOIN DETECTION d
    ON o.ID = d.OBJECT_ID

This is important, as there are three different column names, but really only two meaningful identifiers. For this example, the arguments for this method would be as follows:

primary_id_column = "ID",
join_to_primary_id_column = "OBJECT_ID",
join_id_column = "DETECTION_ID",
Parameters:
  • catalog (HealpixDataset) – A catalog to export

  • base_catalog_path (str) – Location where catalog is saved to

  • catalog_name (str) – The name of the output catalog

  • overwrite (bool) – If True existing catalog is overwritten

  • **kwargs – Arguments to pass to the parquet write operations

write_partitions(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset, base_catalog_dir_fp: str | pathlib.Path | upath.UPath, **kwargs) tuple[list[hats.pixel_math.HealpixPixel], list[int]][source]#

Saves catalog partitions as parquet to disk and computes the sparse count histogram for each partition. The histogram is either of order 8 or the maximum pixel order in the catalog, whichever is greater.

Parameters:
  • catalog (HealpixDataset) – A catalog to export

  • base_catalog_dir_fp (path-like) – Path to the base directory of the catalog

  • histogram_order – The order of the count histogram to generate

  • **kwargs – Arguments to pass to the parquet write operations

Returns:

A tuple with the array of non-empty pixels, the array with the total counts as well as the array with the sparse count histograms.

_check_catalogs_and_columns(catalog_columns, primary_catalog_dir: str | pathlib.Path | upath.UPath | None = None, primary_column_association: str | None = None, primary_id_column: str | None = None, join_catalog_dir: str | pathlib.Path | upath.UPath | None = None, join_column_association: str | None = None, join_to_primary_id_column: str | None = None, join_id_column: str | None = None)[source]#

Helper function to perform validation of user-inputted catalog and column arguments.

Returns:

dictionary to be used in creation of TableProperties