lsdb.io.to_hats#

Functions#

perform_write(→ tuple[int, ...)

Writes a pandas dataframe to a single parquet file and returns the total count

calculate_histogram(...)

Splits a partition into pixels at a specified order and computes

to_hats(catalog, *, base_catalog_path[, catalog_name, ...])

Writes a catalog to disk, in HATS format. The output catalog comprises

write_partitions(...)

Saves catalog partitions as parquet to disk and computes the sparse

create_modified_catalog_structure(...)

Creates a modified version of the HATS catalog structure

Module Contents#

perform_write(df: nested_pandas.NestedFrame, hp_pixel: hats.pixel_math.HealpixPixel, base_catalog_dir: str | pathlib.Path | upath.UPath, histogram_order: int, **kwargs) tuple[int, hats.pixel_math.sparse_histogram.SparseHistogram][source]#

Writes a pandas dataframe to a single parquet file and returns the total count for the partition as well as a count histogram at the specified order.

Parameters:
  • df (npd.NestedFrame) – dataframe to write to file

  • hp_pixel (HealpixPixel) – HEALPix pixel of file to be written

  • base_catalog_dir (path-like) – Location of the base catalog directory to write to

  • histogram_order (int) – Order of the count histogram

  • **kwargs – other kwargs to pass to pq.write_table method

Returns:

The total number of points on the partition and the sparse count histogram at the specified order.

calculate_histogram(df: nested_pandas.NestedFrame, histogram_order: int) hats.pixel_math.sparse_histogram.SparseHistogram[source]#

Splits a partition into pixels at a specified order and computes the sparse histogram with the respective counts.

Parameters:
  • df (npd.NestedFrame) – Partition data frame

  • histogram_order (int) – Order of the count histogram

Returns:

The sparse count histogram for the partition, at the specified order.

to_hats(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset, *, base_catalog_path: str | pathlib.Path | upath.UPath, catalog_name: str | None = None, default_columns: list[str] | None = None, histogram_order: int = 8, overwrite: bool = False, **kwargs)[source]#

Writes a catalog to disk, in HATS format. The output catalog comprises partition parquet files and respective metadata, as well as JSON files detailing partition, catalog and provenance info.

Parameters:
  • catalog (HealpixDataset) – A catalog to export

  • base_catalog_path (str) – Location where catalog is saved to

  • catalog_name (str) – The name of the output catalog

  • default_columns (list[str]) – A metadata property with the list of the columns in the catalog to be loaded by default. Uses the default columns from the original hats catalogs if they exist.

  • histogram_order (int) – The default order for the count histogram. Defaults to 8.

  • overwrite (bool) – If True existing catalog is overwritten

  • **kwargs – Arguments to pass to the parquet write operations

write_partitions(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset, base_catalog_dir_fp: str | pathlib.Path | upath.UPath, histogram_order: int, **kwargs) tuple[list[hats.pixel_math.HealpixPixel], list[int], list[hats.pixel_math.sparse_histogram.SparseHistogram]][source]#

Saves catalog partitions as parquet to disk and computes the sparse count histogram for each partition. The histogram is either of order 8 or the maximum pixel order in the catalog, whichever is greater.

Parameters:
  • catalog (HealpixDataset) – A catalog to export

  • base_catalog_dir_fp (path-like) – Path to the base directory of the catalog

  • histogram_order – The order of the count histogram to generate

  • **kwargs – Arguments to pass to the parquet write operations

Returns:

A tuple with the array of non-empty pixels, the array with the total counts as well as the array with the sparse count histograms.

create_modified_catalog_structure(catalog_structure: hats.catalog.healpix_dataset.healpix_dataset.HealpixDataset, catalog_base_dir: str | pathlib.Path | upath.UPath, catalog_name: str, **kwargs) hats.catalog.healpix_dataset.healpix_dataset.HealpixDataset[source]#

Creates a modified version of the HATS catalog structure

Parameters:
  • catalog_structure (hc.catalog.Catalog) – HATS catalog structure

  • catalog_base_dir (UPath) – Base location for the catalog

  • catalog_name (str) – The name of the catalog to be saved

  • **kwargs – The remaining parameters to be updated in the catalog info object

Returns:

A HATS structure, modified with the parameters provided.