Column filtering (e.g. columns=)#
In this tutorial, we will demonstrate how to:
Load a HATS catalog with the default set of columns
Load a HATS catalog with all available columns
Load a HATS catalog with a specified subset of columns
Introduction#
To improve performance and readability when working with large catalogs, users can specify the column names they want to load instead of loading the entire dataset. This approach reduces memory usage and speeds up data processing by avoiding unnecessary data retrieval.
[1]:
# Import LSDB.
import lsdb
[2]:
# Start a Dask client.
from dask.distributed import Client
client = Client(n_workers=4, memory_limit="auto")
[3]:
# Specify the path to the LSDB catalog you want to use.
surveys_path = "https://data.lsdb.io/hats/"
ztf_object_path = f"{surveys_path}/ztf_dr14/ztf_object"
1. Load the catalog with default columns#
[4]:
ztf_object = lsdb.open_catalog(ztf_object_path)
ztf_object
[4]:
lsdb Catalog ztf_dr14:
ps1_objid | ra | dec | ps1_gMeanPSFMag | ps1_rMeanPSFMag | ps1_iMeanPSFMag | nobs_g | nobs_r | nobs_i | mean_mag_g | mean_mag_r | mean_mag_i | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
npartitions=2352 | ||||||||||||
Order: 3, Pixel: 0 | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | int32[pyarrow] | int32[pyarrow] | int32[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] |
Order: 3, Pixel: 1 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Order: 4, Pixel: 3070 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Order: 4, Pixel: 3071 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
12 out of 15 columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema
2. Load the catalog with all columns#
[5]:
ztf_object = lsdb.open_catalog(ztf_object_path, columns="all")
ztf_object
[5]:
lsdb Catalog ztf_dr14:
ps1_objid | ra | dec | ps1_gMeanPSFMag | ps1_rMeanPSFMag | ps1_iMeanPSFMag | nobs_g | nobs_r | nobs_i | mean_mag_g | mean_mag_r | mean_mag_i | Norder | Dir | Npix | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
npartitions=2352 | |||||||||||||||
Order: 3, Pixel: 0 | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | int32[pyarrow] | int32[pyarrow] | int32[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | int8[pyarrow] | int64[pyarrow] | int64[pyarrow] |
Order: 3, Pixel: 1 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Order: 4, Pixel: 3070 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Order: 4, Pixel: 3071 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
15 out of 15 columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema
3. Load the catalog with a specific subset of columns#
[6]:
ztf_object = lsdb.open_catalog(ztf_object_path, columns=["ps1_objid", "ra", "dec", "mean_mag_r"])
ztf_object
[6]:
lsdb Catalog ztf_dr14:
ps1_objid | ra | dec | mean_mag_r | |
---|---|---|---|---|
npartitions=2352 | ||||
Order: 3, Pixel: 0 | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] |
Order: 3, Pixel: 1 | ... | ... | ... | ... |
... | ... | ... | ... | ... |
Order: 4, Pixel: 3070 | ... | ... | ... | ... |
Order: 4, Pixel: 3071 | ... | ... | ... | ... |
4 out of 15 columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema
Closing the Dask client#
[7]:
client.close()
About#
Authors: Olivia Lynn
Last updated on: May 20, 2025
If you use lsdb
for published research, please cite following instructions.