Exporting results#
You can save the catalogs that result from running your workflow to disk, in parquet format, using the to_hats
call.
You must provide a base_catalog_path
, which is the output path for your catalog directory, and (optionally) a name for your catalog, catalog_name
. The catalog_name
is the catalog’s internal name and therefore may differ from the catalog’s base directory name. If the directory already exists and you want to overwrite its content, set the overwrite
flag to True. Do not forget to provide the necessary credentials, as storage_options
to the UPath construction, when trying to
export the catalog to protected remote storage.
For example, to save a catalog that contains the results of crossmatching Gaia with ZTF to "./my_catalogs/gaia_x_ztf"
, one could run:
gaia_x_ztf_catalog.to_hats(base_catalog_path="./my_catalogs/gaia_x_ztf", catalog_name="gaia_x_ztf")
The HATS catalogs on disk follow a well-defined directory structure:
gaia_x_ztf/
├── Norder={}/
│ ├── Dir={}/
│ │ ├── Npix={}.parquet
│ │ └── ...
│ └── ...
├── Norder={}/
│ ├── Dir={}/
│ │ ├── Npix={}.parquet
│ │ └── ...
│ └── ...
├── _metadata
├── _common_metadata
├── catalog_info.json
├── partition_info.csv
└── provenance_info.json
The data is partitioned spatially and is stored, in parquet
format, according to the area of the respective partitions in the sky. Each parquet file represents a partition. The higher the Norder
for a partition, the smaller it is in area. As a result, because partitions contain (approximately) the same number of points, those in a directory with a larger Norder
hold data for denser regions of the sky. All this information is encoded in metadata files that exist at the root of our
catalog.