Column filtering (e.g. columns=)

Column filtering (e.g. columns=)#

In this tutorial, we will:

load a HATS catalog with the default set of columns
load a HATS catalog with all available columns
load a HATS catalog with a specified subset of columns

Introduction#

To improve performance and readability when working with large catalogs, users can specify the column names they want to load instead of loading the entire dataset. This approach reduces memory usage and speeds up data processing by avoiding unnecessary data retrieval.

[1]:

# Import LSDB.

import lsdb

[2]:

# Start a Dask client.

from dask.distributed import Client

client = Client(n_workers=4, memory_limit="auto")

[3]:

# Specify the path to the LSDB catalog you want to use.

surveys_path = "https://data.lsdb.io/hats/"
ztf_object_path = f"{surveys_path}/ztf_dr14/ztf_object"

1. Load the catalog with default columns#

[4]:

ztf_object = lsdb.open_catalog(ztf_object_path)
ztf_object

[4]:

lsdb Catalog ztf_dr14:

	ps1_objid	ra	dec	ps1_gMeanPSFMag	ps1_rMeanPSFMag	ps1_iMeanPSFMag	nobs_g	nobs_r	nobs_i	mean_mag_g	mean_mag_r	mean_mag_i
npartitions=2352
Order: 3, Pixel: 0	int64[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]	int32[pyarrow]	int32[pyarrow]	int32[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]
Order: 3, Pixel: 1	...	...	...	...	...	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 4, Pixel: 3070	...	...	...	...	...	...	...	...	...	...	...	...
Order: 4, Pixel: 3071	...	...	...	...	...	...	...	...	...	...	...	...

12 out of 15 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema

This catalog has an estimated size of 56.0 GB

2. Load the catalog with all columns#

[5]:

ztf_object = lsdb.open_catalog(ztf_object_path, columns="all")
ztf_object

[5]:

lsdb Catalog ztf_dr14:

	ps1_objid	ra	dec	ps1_gMeanPSFMag	ps1_rMeanPSFMag	ps1_iMeanPSFMag	nobs_g	nobs_r	nobs_i	mean_mag_g	mean_mag_r	mean_mag_i	Norder	Dir	Npix
npartitions=2352
Order: 3, Pixel: 0	int64[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]	int32[pyarrow]	int32[pyarrow]	int32[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]	int8[pyarrow]	int64[pyarrow]	int64[pyarrow]
Order: 3, Pixel: 1	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 4, Pixel: 3070	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 4, Pixel: 3071	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...

15 out of 15 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema

This catalog has an estimated size of 66.3 GB

3. Load the catalog with a specific subset of columns#

[6]:

ztf_object = lsdb.open_catalog(ztf_object_path, columns=["ps1_objid", "ra", "dec", "mean_mag_r"])
ztf_object

[6]:

lsdb Catalog ztf_dr14:

	ps1_objid	ra	dec	mean_mag_r
npartitions=2352
Order: 3, Pixel: 0	int64[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]
Order: 3, Pixel: 1	...	...	...	...
...	...	...	...	...
Order: 4, Pixel: 3070	...	...	...	...
Order: 4, Pixel: 3071	...	...	...	...

4 out of 15 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema

This catalog has an estimated size of 24.3 GB

Closing the Dask client#

[7]:

client.close()

About#

Authors: Olivia Lynn

Last updated on: May 20, 2025

If you use lsdb for published research, please cite following instructions.

Column filtering (e.g. columns=)

Contents

Column filtering (e.g. columns=)#

Introduction#

1. Load the catalog with default columns#

2. Load the catalog with all columns#

3. Load the catalog with a specific subset of columns#

Closing the Dask client#

About#