lsdb.loaders.hats.read_hats#
Functions#
|
Load catalog from a HATS path. See open_catalog(). |
|
Load catalog from a HATS path. |
|
The path to the collection margin. |
|
|
|
Create the modified schema of the catalog after all the processing on the read_hats call |
|
Load a catalog from the configuration specified when the loader was created |
|
Load a catalog from the configuration specified when the loader was created |
|
Load a catalog from the configuration specified when the loader was created |
|
Load a catalog from the configuration specified when the loader was created |
|
Loads the Dask meta DataFrame from the parquet _metadata file. |
|
Load Dask DF from parquet files and make dict of HEALPix pixel to partition index |
|
Utility method to read a single pixel's parquet file from disk. |
|
Module Contents#
- read_hats(path: str | pathlib.Path | upath.UPath, search_filter: lsdb.core.search.abstract_search.AbstractSearch | None = None, columns: list[str] | str | None = None, margin_cache: str | pathlib.Path | upath.UPath | None = None, **kwargs) lsdb.catalog.dataset.dataset.Dataset [source]#
Load catalog from a HATS path. See open_catalog().
- open_catalog(path: str | pathlib.Path | upath.UPath, search_filter: lsdb.core.search.abstract_search.AbstractSearch | None = None, columns: list[str] | str | None = None, margin_cache: str | pathlib.Path | upath.UPath | None = None, **kwargs) lsdb.catalog.dataset.dataset.Dataset [source]#
Load catalog from a HATS path.
Catalogs exist in collections or stand-alone.
Catalogs in a HATS collection are composed of a main catalog, and margin and index catalogs. LSDB will load exactly ONE main object catalog, at most ONE margin catalog, and at most ONE index catalog. The collection.properties file specifies which margins and indexes are available, and which are the default:
my_collection_dir/ ├── main_catalog/ ├── margin_catalog/ ├── margin_catalog_2/ ├── index_catalog/ ├── collection.properties
All arguments passed to the read_hats call are applied to the reading calls of the main and margin catalogs.
Typical usage example, where we load a collection with a subset of columns:
lsdb.read_hats(path='./my_collection_dir', columns=['ra','dec'])
Typical usage example, where we load a collection from a cone search:
lsdb.read_hats( path='./my_collection_dir', columns=['ra','dec'], search_filter=lsdb.core.search.ConeSearch(ra, dec, radius_arcsec), )
Typical usage example, where we load a collection with a non-default margin:
lsdb.read_hats(path='./my_collection_dir', margin_cache='margin_catalog_2')
Note that this margin still needs to be specified in the all_margins attribute of the collection.properties file.
We can also load each catalog separately, if needed:
lsdb.read_hats(path='./my_collection_dir/main_catalog')
- Parameters:
path (UPath | Path) – The path that locates the root of the HATS collection or stand-alone catalog.
search_filter (Type[AbstractSearch]) – Default None. The filter method to be applied.
columns (list[str] | str) – Default None. The set of columns to filter the catalog on. If None, the catalog’s default columns will be loaded. To load all catalog columns, use columns=”all”.
margin_cache (path-like) – Default None. The margin for the main catalog, provided as a path.
dtype_backend (str) – Backend data type to apply to the catalog. Defaults to “pyarrow”. If None, no type conversion is performed.
**kwargs – Arguments to pass to the pandas parquet file reader
- Returns:
A CatalogCollection object if the provided path is for a HATS collection, a Catalog object if the path is for a stand-alone HATS catalog. Both are loaded from the given parameters.
Examples
To read a collection from a public S3 bucket, call it as follows:
from upath import UPath collection = lsdb.read_hats(UPath(..., anon=True))
- _get_collection_margin(collection: hats.catalog.catalog_collection.CatalogCollection, margin_cache: str | pathlib.Path | upath.UPath | None) upath.UPath | None [source]#
The path to the collection margin.
- The margin_cache should be provided as:
An identifier to the margin catalog name (it needs to be a string and be specified in the all_margins attribute of the collection.properties).
The absolute path to a margin, hosted locally or remote.
By default, if no margin_cache is provided, the absolute path to the default collection margin is returned.
- _load_catalog(hc_catalog: hats.catalog.Dataset, search_filter: lsdb.core.search.abstract_search.AbstractSearch | None = None, columns: list[str] | str | None = None, margin_cache: str | pathlib.Path | upath.UPath | None = None, **kwargs) lsdb.catalog.dataset.dataset.Dataset [source]#
- _update_hc_structure(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset)[source]#
Create the modified schema of the catalog after all the processing on the read_hats call
- _load_association_catalog(hc_catalog, config)[source]#
Load a catalog from the configuration specified when the loader was created
- Returns:
Catalog object with data from the source given at loader initialization
- _load_margin_catalog(hc_catalog, config)[source]#
Load a catalog from the configuration specified when the loader was created
- Returns:
Catalog object with data from the source given at loader initialization
- _load_object_catalog(hc_catalog, config)[source]#
Load a catalog from the configuration specified when the loader was created
- Returns:
Catalog object with data from the source given at loader initialization
- _load_map_catalog(hc_catalog, config)[source]#
Load a catalog from the configuration specified when the loader was created
- Returns:
Catalog object with data from the source given at loader initialization
- _load_dask_meta_schema(hc_catalog, config) nested_pandas.NestedFrame [source]#
Loads the Dask meta DataFrame from the parquet _metadata file.
- _load_dask_df_and_map(catalog: hats.catalog.healpix_dataset.healpix_dataset.HealpixDataset, config) tuple[lsdb.nested.NestedFrame, lsdb.catalog.catalog.DaskDFPixelMap] [source]#
Load Dask DF from parquet files and make dict of HEALPix pixel to partition index
- read_pixel(pixel: hats.pixel_math.HealpixPixel, catalog: hats.catalog.healpix_dataset.healpix_dataset.HealpixDataset, *, query_url_params: dict | None = None, columns=None, schema=None, **kwargs)[source]#
Utility method to read a single pixel’s parquet file from disk.
NB: columns is necessary as an argument, even if None, so that dask-expr optimizes the execution plan.