Accessing Rubin Data Preview 1 (DP1)

Accessing Rubin Data Preview 1 (DP1)#

In this tutorial, we will:

access Rubin’s Data Preview 1 with LSDB
- at RSP (Rubin Science Platform), based in the USA and the UK
- at NERSC (National Energy Research Scientific Computing Center) for the LSST DESC members
- at CANFAR (Canadian Advanced Network for Astronomical Research), Canadian Independent Data Access Center
- at LIneA (Laboratório Interinstitucional de e-Astronomia), Brazilian Independent Data Access Center

Introduction#

Prerequisites#

In order to access Rubin data, you must be a Rubin data rights holder.

Available data#

This page focuses on Rubin Data Preview 1. For information about Rubin Data Preview 2, please visit Accessing Rubin Data Preview 2 (DP2).

1. Accessing the data on Rubin Science Platform (RSP)#

1.1 Prepare your RSP container#

Visit https://data.lsst.cloud, unless you are accessing RSP through UK IDAC participation program - in that case, visit https://rsp.lsst.ac.uk.
Log in using your identity provider.
Once in, you will see Portal, Notebooks, and APIs. Choose Notebooks.
When it asks you what container to start, choose either “Recommended” or one of the latest weekly releases on the left, and “Large” on the right.
- At the time of this writing, “Recommended” comes with an older version of lsdb - run the cell below to see if you’ll need to target a weekly release instead.
Once this has started, create a new notebook.

1.1.1 Ensure your notebook kernel has the right version of lsdb#

Make sure you’ve got at least version 0.9.0 of lsdb. Try the following.

[1]:

%pip list | grep -E 'lsdb|hats'

hats                                0.9.0
lsdb                                0.9.0
Note: you may need to restart the kernel to use updated packages.

[2]:

# Or for even more detail:

import lsdb

lsdb.show_versions()


--------      SYSTEM INFO      --------
python        : 3.13.9
python-bits   : 64
OS            : Linux
OS-release    : 6.12.68+
Version       : #1 SMP Wed Apr  1 02:12:36 UTC 2026
machine       : x86_64
processor     :
byteorder     : little
LC_ALL        :
LANG          :
--------   INSTALLED VERSIONS   --------
lsdb          : 0.9.0
hats          : 0.9.0
nested-pandas : 0.6.9
pandas        : 2.3.3
numpy         : 2.3.5
dask          : 2026.3.0
pyarrow       : 21.0.0
fsspec        : 2026.4.0

If the above shows a version that is very old, we suggest using RSP with a latest weekly release rather than the “Recommended” build.

1.2. Create a Dask Client#

When working on RSP, with a Large container you have access to 16 GB of RAM. To ensure each worker has enough memory to work with the data, we recommend using 4 workers with 1 thread each, and memory limit of “auto” (which will divide the available memory across the workers). Dask also will sometimes spill to disk when it needs to, so we recommend setting the local directory to the /deleted-sundays directory, which is a large scratch space available on RSP that will be cleared every Sunday. You can set up your Dask client with the following code:

[2]:

# Dask puts out more advisory logging than we care for in this tutorial.
# It takes some doing to quiet all of it, but this recipe works.

import dask
import os

dask.config.set({"logging.distributed": "critical"})

import logging

# This also has to be done, for the above to be effective
logger = logging.getLogger("distributed")
logger.setLevel(logging.CRITICAL)

import warnings

# Finally, suppress the specific warning about Dask dashboard port usage
warnings.filterwarnings("ignore", message="Port 8787 is already in use.")

[3]:

from dask.distributed import Client

client = Client(
    n_workers=4,
    threads_per_worker=1,
    memory_limit="auto",
    local_directory=f"/deleted-sundays/{os.environ.get('USER', 'dask_scratch')}",
)
client

[3]:

Client

Client-89d90a8e-4ef3-11f1-8041-c9f0f749e5ba

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: https://olynn.nb.data.lsst.cloud/nb/user/olynn/proxy/8787/status

Cluster Info

LocalCluster

72e090b6

Dashboard: https://olynn.nb.data.lsst.cloud/nb/user/olynn/proxy/8787/status	Workers: 4
Total threads: 4	Total memory: 32.00 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-cf553a1c-6b6e-44fe-9624-741b0e19fdb5

Comm: tcp://127.0.0.1:33035	Workers: 0
Dashboard: https://olynn.nb.data.lsst.cloud/nb/user/olynn/proxy/8787/status	Total threads: 0
Started: Just now	Total memory: 0 B

Workers

Worker: 0

Comm: tcp://127.0.0.1:46357	Total threads: 1
Dashboard: https://olynn.nb.data.lsst.cloud/nb/user/olynn/proxy/41785/status	Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:39687
Local directory: /tmp/dask-scratch-space/worker-lnninpvq

Worker: 1

Comm: tcp://127.0.0.1:34729	Total threads: 1
Dashboard: https://olynn.nb.data.lsst.cloud/nb/user/olynn/proxy/39647/status	Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:41001
Local directory: /tmp/dask-scratch-space/worker-h8pa0fr3

Worker: 2

Comm: tcp://127.0.0.1:37247	Total threads: 1
Dashboard: https://olynn.nb.data.lsst.cloud/nb/user/olynn/proxy/38497/status	Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:38305
Local directory: /tmp/dask-scratch-space/worker-hsdfofe8

Worker: 3

Comm: tcp://127.0.0.1:36605	Total threads: 1
Dashboard: https://olynn.nb.data.lsst.cloud/nb/user/olynn/proxy/41697/status	Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:37231
Local directory: /tmp/dask-scratch-space/worker-eg_cpzjy

Your Dask dashboard will be accessible at https://{username}.nb.data.lsst.cloud/nb/user/{username}/proxy/{port}/status.

1.3 Opening a Catalog#

The data is divided into objects and dia_objects. Let’s open both catalogs:

[4]:

from upath import UPath

base_path = UPath("/rubin/lsdb_data/dp1")

object_cat = lsdb.open_catalog(base_path / "object_collection")
dia_object_cat = lsdb.open_catalog(base_path / "dia_object_collection")

[5]:

object_cat

[5]:

lsdb Catalog object_lc:

	coord_dec	coord_decErr	coord_ra	coord_raErr	g_psfFlux	g_psfFluxErr	g_psfMag	g_psfMagErr	i_psfFlux	i_psfFluxErr	i_psfMag	i_psfMagErr	objectId	patch	r_psfFlux	r_psfFluxErr	r_psfMag	r_psfMagErr	refBand	refFwhm	shape_flag	shape_xx	shape_xy	shape_yy	tract	u_psfFlux	u_psfFluxErr	u_psfMag	u_psfMagErr	x	xErr	y	y_psfFlux	y_psfFluxErr	y_psfMag	y_psfMagErr	yErr	z_psfFlux	z_psfFluxErr	z_psfMag	z_psfMagErr	objectForcedSource
npartitions=389
Order: 6, Pixel: 130	double[pyarrow]	float[pyarrow]	double[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	int64[pyarrow]	int64[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	string[pyarrow]	float[pyarrow]	bool[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	int64[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	double[pyarrow]	float[pyarrow]	double[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	nested<coord_ra: [double], coord_dec: [double]...
Order: 8, Pixel: 2176	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 9, Pixel: 2302101	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 7, Pixel: 143884	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...

42 out of 1304 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema

[6]:

dia_object_cat

[6]:

lsdb Catalog dia_object_lc:

	dec	diaObjectId	nDiaSources	ra	radecMjdTai	tract	diaObjectForcedSource	diaSource
npartitions=208
Order: 6, Pixel: 130	double[pyarrow]	int64[pyarrow]	int64[pyarrow]	double[pyarrow]	double[pyarrow]	int64[pyarrow]	nested<band: [string], coord_dec: [double], co...	nested<band: [string], centroid_flag: [bool], ...
Order: 6, Pixel: 136	...	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...	...
Order: 11, Pixel: 36833621	...	...	...	...	...	...	...	...
Order: 7, Pixel: 143884	...	...	...	...	...	...	...	...

8 out of 140 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema

You’ve accessed the data! You can see the schema of both catalogs.

To learn how to use this data, see Using Rubin Data and follow the lsdb quickstart at Getting Started.

If you have questions about data from the Rubin Observatory, the best place to ask is https://community.lsst.org in the LSST Data Products category.

1.4 Downloading data to your machine#

If you need to work on the data on your own machine, you can scp your data from the container to your own machine. Suppose you have an account named myself on your machine named big-box.astro.somewhere.edu, and the data is in a directory called ./some_data. The below command will copy that directory to one of the same name in your home directory on your machine.

scp -r ./some_data myself@big-box.astro.somewhere.edu:some_data

2. Accessing the data at NERSC (Perlmutter)#

If you are a part of the LSST DESC collaboration and have a NERSC account, you can access Rubin DP1 via Perlmutter cluster. You can use both batch jobs and jupyter.nersc.gov, bellow we assume that you use NERSC’s Jupyter Hub.

2.1 Launch Jupyter#

Login to NERSC at https://jupyter.nersc.gov. Select “Login Node” for data exploration, configuration and code development. Use “Exclusive CPU Node” for larger tasks, such as full-catalog analysis.

Please also see NERSC documentation for Dask configuration.

2.2a Start kernel with LSDB#

LSDB is available in a Jupyter kernel desc-td_env-dev.

If you haven’t already set up the DESC Jupyter kernels at NERSC, run the one-time setup step on the Perlmutter command line:

source /global/common/software/lsst/common/miniconda/kernels/setup.sh

Then the next time you start up jupyter.nersc.gov, you’ll have access to a few desc-* jupyter kernels. More information can be found here.

2.2b Alternative: Install LSDB#

For conda installation run conda install -c conda-forge lsdb in the terminal. For pip installation run python -m pip install lsdb or the following cell in a Jupyter notebook:

[ ]:

%pip install lsdb

Restart the kernel and check that the lsdb version is up-to-date:

[3]:

import lsdb

lsdb.__version__

[3]:

'0.6.7'

2.4 Open a catalog#

The data is divided into objects and dia_objects. Let’s open both catalogs:

[6]:

from upath import UPath

base_path = UPath("/global/cfs/cdirs/lsst/shared/rubin/DP1/HATS/dp1_full/hats/v29_0_0")

object_cat = lsdb.open_catalog(base_path / "object_collection")
dia_object_cat = lsdb.open_catalog(base_path / "dia_object_collection")

[7]:

object_cat

[7]:

lsdb Catalog object_lc:

	coord_dec	coord_decErr	coord_ra	coord_raErr	g_psfFlux	g_psfFluxErr	g_psfMag	g_psfMagErr	i_psfFlux	i_psfFluxErr	i_psfMag	i_psfMagErr	objectId	patch	r_psfFlux	r_psfFluxErr	r_psfMag	r_psfMagErr	refBand	refFwhm	shape_flag	shape_xx	shape_xy	shape_yy	tract	u_psfFlux	u_psfFluxErr	u_psfMag	u_psfMagErr	x	xErr	y	y_psfFlux	y_psfFluxErr	y_psfMag	y_psfMagErr	yErr	z_psfFlux	z_psfFluxErr	z_psfMag	z_psfMagErr	objectForcedSource
npartitions=389
Order: 6, Pixel: 130	double[pyarrow]	float[pyarrow]	double[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	int64[pyarrow]	int64[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	string[pyarrow]	float[pyarrow]	bool[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	int64[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	double[pyarrow]	float[pyarrow]	double[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	nested<band: [string], coord_dec: [double], co...
Order: 8, Pixel: 2176	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 9, Pixel: 2302101	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 7, Pixel: 143884	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...

42 out of 1304 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema

[8]:

dia_object_cat

[8]:

lsdb Catalog dia_object_lc:

	dec	diaObjectId	nDiaSources	ra	radecMjdTai	tract	diaObjectForcedSource	diaSource
npartitions=208
Order: 6, Pixel: 130	double[pyarrow]	int64[pyarrow]	int64[pyarrow]	double[pyarrow]	double[pyarrow]	int64[pyarrow]	nested<band: [string], coord_dec: [double], co...	nested<band: [string], centroid_flag: [bool], ...
Order: 6, Pixel: 136	...	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...	...
Order: 11, Pixel: 36833621	...	...	...	...	...	...	...	...
Order: 7, Pixel: 143884	...	...	...	...	...	...	...	...

8 out of 140 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema

3. Accessing the data at CANFAR (Canada Independent Data Access Center)#

3.1 Launch science platform#

If you are a member of the Canadian astronomical community, you can access Rubin DP1 via the CANFAR science portal. Open https://www.canfar.net/science-portal. We recommend the defaults, i.e., project skaha and container image astroml:latest.

3.2 Opening a Catalog#

The data is divided into objects and dia_objects. Let’s open both catalogs:

[ ]:

from upath import UPath

base_path = UPath("/arc/projects/hats/lsst/dp1")

object_cat = lsdb.open_catalog(base_path / "object_collection")
dia_object_cat = lsdb.open_catalog(base_path / "dia_object_collection")

An authenticated user could also get the DP1 through an authenticated sshfs session, or using VOSpace client. More information can be found at https://www.opencadc.org/canfar/latest/platform/storage. If you get permission error, but you are Canadian Rubin data rights holder, contact CADC for assistance.

4. Accessing the data at LIneA (Laboratório Interinstitucional de e-Astronomia)#

Detailed instructions for access via Brazilian IDAC are available at https://data.linea.org.br/en/lsdb/how_to_access_rubin_dp1.html.

About#

Authors: Neven Caplar, Derek Jones, Konstantin Malanchev, Olivia Lynn

Last updated on: May 13, 2026

Last run: May 13, 2026 (RSP section only)

If you use lsdb for published research, please cite following instructions.

Accessing Rubin Data Preview 1 (DP1)

Contents

Accessing Rubin Data Preview 1 (DP1)#

Introduction#

Prerequisites#

Available data#

1. Accessing the data on Rubin Science Platform (RSP)#

1.1 Prepare your RSP container#

1.1.1 Ensure your notebook kernel has the right version of lsdb#

1.2. Create a Dask Client#

Client

Cluster Info

LocalCluster

Scheduler Info

Scheduler

Workers

Worker: 0

Worker: 1

Worker: 2

Worker: 3

1.3 Opening a Catalog#

1.4 Downloading data to your machine#

2. Accessing the data at NERSC (Perlmutter)#

2.1 Launch Jupyter#

2.2a Start kernel with LSDB#

2.2b Alternative: Install LSDB#

2.4 Open a catalog#

3. Accessing the data at CANFAR (Canada Independent Data Access Center)#

3.1 Launch science platform#

3.2 Opening a Catalog#

4. Accessing the data at LIneA (Laboratório Interinstitucional de e-Astronomia)#

About#