lsdb.loaders.dataframe.dataframe_catalog_loader#

Classes#

DataframeCatalogLoader

Creates a HATS formatted Catalog from a Pandas Dataframe

Module Contents#

class DataframeCatalogLoader(dataframe: pandas.DataFrame, *, ra_column: str = 'ra', dec_column: str = 'dec', lowest_order: int = 0, highest_order: int = 7, drop_empty_siblings: bool = False, partition_size: int | None = None, threshold: int | None = None, should_generate_moc: bool = True, moc_max_order: int = 10, use_pyarrow_types: bool = True, schema: pyarrow.Schema | None = None, **kwargs)[source]#

Creates a HATS formatted Catalog from a Pandas Dataframe

dataframe[source]#
lowest_order = 0[source]#
highest_order = 7[source]#
drop_empty_siblings = False[source]#
threshold = None[source]#
catalog_info[source]#
should_generate_moc = True[source]#
moc_max_order = 10[source]#
use_pyarrow_types = True[source]#
schema = None[source]#
_calculate_threshold(partition_size: int | None = None, threshold: int | None = None) int[source]#

Calculates the number of pixels per HEALPix pixel (threshold) for the desired partition size.

Parameters:
  • partition_size (int) – The desired partition size, in number of rows

  • threshold (int) – The maximum number of data points per pixel

Returns:

The HEALPix pixel threshold

_create_catalog_info(catalog_name: str = 'from_lsdb_dataframe', ra_column: str = 'ra', dec_column: str = 'dec', catalog_type: hats.catalog.CatalogType = CatalogType.OBJECT, **kwargs) hats.catalog.TableProperties[source]#

Creates the catalog info object

Parameters:
  • catalog_name – it is recommended to provide a new name for your catalog

  • ra_column – column to find right ascension coordinate

  • dec_column – column to find declination coordinate

  • catalog_type – type of table being created (e.g. OBJECT, SOURCE, MAP)

  • **kwargs – Arguments to pass to the creation of the catalog info

Returns:

The catalog info object

load_catalog() lsdb.catalog.catalog.Catalog[source]#

Load a catalog from a Pandas Dataframe

Returns:

Catalog object with data from the source given at loader initialization

_set_spatial_index()[source]#

Generates the spatial indices for each data point and assigns the spatial index column as the Dataframe index.

_compute_pixel_list() list[hats.pixel_math.HealpixPixel][source]#

Compute object histogram and generate the sorted list of HEALPix pixels. The pixels are sorted by ascending spatial index.

Returns:

List of HEALPix pixels for the final partitioning.

_generate_dask_df_and_map(pixel_list: list[hats.pixel_math.HealpixPixel]) tuple[lsdb.nested.NestedFrame, lsdb.types.DaskDFPixelMap, int][source]#

Load Dask DataFrame from HEALPix pixel Dataframes and generate a mapping of HEALPix pixels to HEALPix Dataframes

Parameters:

pixel_list (List[HealpixPixel]) – final partitioning of data

Returns:

Tuple containing the Dask Dataframe, the mapping of HEALPix pixels to the respective Pandas Dataframes and the total number of rows.

_generate_moc()[source]#