Region Selection#

TODO - Help wanted

astronomy-commons/lsdb#664

In this tutorial, we will demonstrate how to:

  • Set up a Dask client and load an object catalog

  • Select data from regions in the sky

    • cone

    • radec box

    • polygon

Introduction#

Large astronomical surveys contain a massive volume of data. Billion-object, multi-terabyte-sized catalogs are challenging to store and manipulate because they demand state-of-the-art hardware. Processing them is expensive, both in terms of runtime and memory consumption, and doing so on a single machine has become impractical. LSDB is a solution that enables scalable algorithm execution. It handles loading, querying, filtering, and crossmatching astronomical data (of HATS format) in a distributed environment.

[1]:
import lsdb

1. Load a catalog#

We create a basic dask client, and load an existing HATS catalog - the ZTF DR22 catalog.

Additional Help

For additional information on dask client creation, please refer to the official Dask documentation and our Dask cluster configuration page for LSDB-specific tips. Note that dask also provides its own best practices, which may also be useful to consult.

For tips on accessing remote data, see our Accessing remote data tutorial

[2]:
from dask.distributed import Client

client = Client(n_workers=4, memory_limit="auto")
client
[2]:

Client

Client-228f3ff8-2b78-11f0-8cd8-42cb0b321d21

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

[3]:
ztf_object_path = "https://data.lsdb.io/hats/ztf_dr22/ztf_lc"
ztf_object = lsdb.read_hats(ztf_object_path)
ztf_object
[3]:
lsdb Catalog ztf_lc:
objectid filterid fieldid rcid objra objdec nepochs hmjd mag magerr clrcoeff catflags Norder Dir Npix
npartitions=10839
Order: 4, Pixel: 0 int64[pyarrow] int8[pyarrow] int16[pyarrow] int8[pyarrow] float[pyarrow] float[pyarrow] int64[pyarrow] list<element: double>[pyarrow] list<element: float>[pyarrow] list<element: float>[pyarrow] list<element: float>[pyarrow] list<element: int32>[pyarrow] uint8[pyarrow] uint64[pyarrow] uint64[pyarrow]
Order: 4, Pixel: 1 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Order: 5, Pixel: 12286 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Order: 5, Pixel: 12287 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
The catalog has been loaded lazily, meaning no data has been read, only the catalog schema

2. Selecting a region of the sky#

There are 3 common types of spatial filters to select a portion of the sky: cone, polygon and box.

Filtering consists of two main steps:

  • A coarse stage, in which we find what pixels cover our desired region in the sky. These may overlap with the region and only be partially contained within the region boundaries. This means that some data points inside that pixel may fall outside of the region.

  • A fine stage, where we filter the data points from each pixel to make sure they fall within the specified region.

The fine parameter allows us to specify whether or not we desire to run the fine stage, for each search. It brings some overhead, so if your intention is to get a rough estimate of the data points for a region, you may disable it. It is always executed by default.

catalog.box_search(..., fine=False)
catalog.cone_search(..., fine=False)
catalog.polygon_search(..., fine=False)

Throughout this notebook, we will use the Catalog’s plot_pixels method to display the HEALPix of each resulting catalog as filters are applied.

[4]:
ztf_object.plot_pixels(plot_title="ZTF_DR14 - pixel map")
[4]:
(<Figure size 1000x500 with 2 Axes>,
 <WCSAxes: title={'center': 'ZTF_DR14 - pixel map'}>)
../_images/tutorials_region_selection_9_1.png

4. The Search object#

To perform a search on a catalog, there are two modes: a shape-specific call, or passing a search object to the search() method. The above case uses the cone shape call.

Using a search object can be useful if you intend to re-use the shape for filtering multiple catalogs. We also provide some basic plotting for cone and box searches. The 5 degree cone search is outlined in red in the below plot.

[7]:
from lsdb.core.search import ConeSearch

cone_search = ConeSearch(ra=-60.3, dec=20.5, radius_arcsec=5 * 3600)
[8]:
ztf_object.plot_pixels(plot_title="ZTF_DR14 - pixel map")
cone_search.plot(fc="#00000000", ec="red")
[8]:
(<Figure size 1000x500 with 2 Axes>, <WCSAxes: >)
../_images/tutorials_region_selection_15_1.png

Closing the Dask client#

[13]:
client.close()

About#

Authors: Sandro Campos and Melissa DeLucchi

Last updated on: April 14, 2025

If you use lsdb for published research, please cite following instructions.