Search for SN-like light curves in ZTF alerts

Search for SN-like light curves in ZTF alerts#

We will use the LSDB package to load a HATS catalog with ZTF alerts. The dataset contains all alerts sent from the beginning of the survey from 2018-05-04 to 2023-09-13 corresponding to objects having at least 20 detections.

The dataset is provided by the ALeRCE broker team.

The goal is to find supernova (SN) candidates in this dataset using the goodness of the Bazin (Bazin+2009) function fit, a simple parametric model for SN light curves.

The pipeline will be as follows:

Load the dataset with LSDB
Convert it to a nested format with nested-dask package
Fit Bazin function and extract some other light-curve features with light-curve package
Filter data and plot few light curves

1. Install and import required packages#

We need LSDB for data loading and analysis (includes dask, nested-pandas and nested-dask) and light-curve package for feature extraction.

[1]:

%pip install -q lsdb
# This --only-binary flag is required to avoid installation errors on some systems
%pip install -q --only-binary=light-curve light-curve

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.

[2]:

import light_curve as licu  # light-curve feature extraction library
import matplotlib.pyplot as plt  # Plotting
import numpy as np  # Array operations
from dask.distributed import Client  # Create multi-processing cluster and connect to it
from lsdb import open_catalog  # LSDB entry-point, load HATS catalog
from matplotlib.ticker import MaxNLocator  # Fix ticks on plots

2. Helper function for light-curve plotting#

The function accepts a pandas data frame and plots a light curve.

[3]:

def plot_lc(lc, nondet, title=None, rband_model=None):
    """Plot light curve with non-detections."""
    for fid, band in zip([1, 2], "gr"):
        idx = lc["lc_fid"] == fid
        plt.scatter(
            lc["lc_mjd"][idx], lc["lc_magpsf"][idx], label=f"{band} detection", color=band, marker="x", s=70
        )
        plt.errorbar(
            lc["lc_mjd"][idx], lc["lc_magpsf"][idx], lc["lc_sigmapsf"][idx], color=band, ls="", capsize=5
        )

        idx = nondet["nondet_fid"] == fid
        plt.scatter(
            nondet["nondet_mjd"][idx],
            nondet["nondet_diffmaglim"][idx],
            marker="v",
            color=band,
            alpha=0.25,
            s=70,
            label=f"{band} non-detection",
        )

    if rband_model is not None:
        xmin, xmax = plt.xlim()
        ymin, ymax = plt.ylim()

        r_times = np.linspace(xmin, xmax, 1000)
        r_mags = rband_model(r_times)
        plt.plot(r_times, r_mags, color="k", linestyle="-", label="r-band Bazin fit")

        plt.xlim(xmin, xmax)
        plt.ylim(ymin, ymax)

    plt.legend(fontsize=14)
    plt.title(title, fontsize=18)
    plt.xlabel("MJD", fontsize=18)
    plt.ylabel("Magnitude", fontsize=18)
    plt.xticks(fontsize=14)
    plt.yticks(fontsize=14)
    plt.gca().xaxis.set_major_locator(MaxNLocator(nbins=6))
    plt.gca().invert_yaxis()
    plt.grid(True)

3. Load catalog structure and “nest” columns#

Here we load the catalog structure and transform list-columns to a compact representation. These transformed columns are “nested data frames”, so each item could be represented by a small pandas dataframe. We are going to have three nested columns:

“lc”, for light curves, each point corresponding to some alert (detection)
“nondet”, for non-detections (upper limits)
“ref”, for ZTF reference objects associated with alerts

Here we have not downloaded any data yet, all data access and analysis happens only after .compute() is called.

Here we display two versions of the catalog: the first one is the raw catalog with nested lists, and the second one is the catalog with nested columns.

Additional Help

For tips on accessing remote data, see our Accessing remote data guide

[4]:

ZTF_ALERTS = "https://data.lsdb.io/hats/alerce/"

# Load catalog with nested lists
raw_catalog = open_catalog(
    ZTF_ALERTS,
)
display(raw_catalog)

# Pack all list-columns into single column
catalog_with_lc = raw_catalog.nest_lists(
    base_columns=[col for col in raw_catalog.columns if not col.startswith("lc_")],
    name="lc",
)

# Pack non-detections
catalog_with_nondet = catalog_with_lc.nest_lists(
    base_columns=[col for col in catalog_with_lc.columns if not col.startswith("nondet_")],
    name="nondet",
)

# Pack ZTF references
catalog = catalog_with_nondet.nest_lists(
    base_columns=[col for col in catalog_with_nondet.columns if not col.startswith("ref_")],
    name="ref",
)

catalog

lsdb Catalog alerce_nested:

	oid	mean_ra	mean_dec	lc_ra	lc_dec	lc_candid	lc_mjd	lc_fid	lc_pid	lc_diffmaglim	lc_isdiffpos	lc_nid	lc_magpsf	lc_sigmapsf	lc_magap	lc_sigmagap	lc_distnr	lc_rb	lc_rbversion	lc_drb	lc_drbversion	lc_magapbig	lc_sigmagapbig	lc_rfid	lc_magpsf_corr	lc_sigmapsf_corr	lc_sigmapsf_corr_ext	lc_corrected	lc_dubious	lc_parent_candid	lc_has_stamp	lc_step_id_corr	nondet_mjd	nondet_fid	nondet_diffmaglim	ref_rfid	ref_candid	ref_fid	ref_rcid	ref_field	ref_magnr	ref_sigmagnr	ref_chinr	ref_sharpnr	ref_ranr	ref_decnr	ref_mjdstartref	ref_mjdendref	ref_nframesref	Norder	Dir	Npix
npartitions=113
Order: 1, Pixel: 0	string[pyarrow]	double[pyarrow]	double[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: int64>[pyarrow]	list<element: double>[pyarrow]	list<element: int64>[pyarrow]	list<element: int64>[pyarrow]	list<element: double>[pyarrow]	list<element: int64>[pyarrow]	list<element: int64>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: string>[pyarrow]	list<element: double>[pyarrow]	list<element: string>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: bool>[pyarrow]	list<element: bool>[pyarrow]	list<element: double>[pyarrow]	list<element: bool>[pyarrow]	list<element: string>[pyarrow]	list<element: double>[pyarrow]	list<element: int64>[pyarrow]	list<element: double>[pyarrow]	list<element: int64>[pyarrow]	list<element: int64>[pyarrow]	list<element: int64>[pyarrow]	list<element: int64>[pyarrow]	list<element: int64>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: double>[pyarrow]	list<element: int64>[pyarrow]	int8[pyarrow]	int64[pyarrow]	int64[pyarrow]
Order: 2, Pixel: 4	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 0, Pixel: 10	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 0, Pixel: 11	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...

The catalog has been loaded lazily, meaning no data has been read, only the catalog schema

[4]:

lsdb Catalog alerce_nested:

	oid	mean_ra	mean_dec	Norder	Dir	Npix	lc	nondet	ref
npartitions=113
Order: 1, Pixel: 0	string[pyarrow]	double[pyarrow]	double[pyarrow]	int8[pyarrow]	int64[pyarrow]	int64[pyarrow]	nested<lc_ra: [double], lc_dec: [double], lc_c...	nested<nondet_mjd: [double], nondet_fid: [int6...	nested<ref_rfid: [int64], ref_candid: [int64],...
Order: 2, Pixel: 4	...	...	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...	...	...
Order: 0, Pixel: 10	...	...	...	...	...	...	...	...	...
Order: 0, Pixel: 11	...	...	...	...	...	...	...	...	...

The catalog has been loaded lazily, meaning no data has been read, only the catalog schema

4. Execute and plot example light curves#

It is time to run our first .compute()! Here we select the first catalog partition (e.g. a single catalog file), load it and plot first five light curves from this file.

Note how we use nested “lc” column here: each value is represented by a pandas light curve, which we use for plotting.

Additional Help

For additional information on dask client creation, please refer to the official Dask documentation and our Dask cluster configuration page for LSDB-specific tips. Note that dask also provides its own best practices, which may also be useful to consult.

[5]:

# "Compute" the first HATS partition and plot few light curves

# Single partition requires a single worker
with Client(n_workers=1) as client:
    display(client)
    ndf = catalog.partitions[0].compute()

Client

Client-2614eba0-08b7-11f0-a83c-1ecfd5b80ac0

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

LocalCluster

b6c8c35c

Dashboard: http://127.0.0.1:8787/status	Workers: 1
Total threads: 12	Total memory: 32.00 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-7203fe32-488e-4744-bd6a-ddbc20364e54

Comm: tcp://127.0.0.1:49903	Workers: 1
Dashboard: http://127.0.0.1:8787/status	Total threads: 12
Started: Just now	Total memory: 32.00 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:49908	Total threads: 12
Dashboard: http://127.0.0.1:49909/status	Memory: 32.00 GiB
Nanny: tcp://127.0.0.1:49906
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-c6dmhp1q

[6]:

for i in range(5):
    lc = ndf["lc"].iloc[i]
    oid = ndf["oid"].iloc[i]
    nondet = ndf["nondet"].iloc[i]
    plt.figure()
    plot_lc(lc, nondet, title=str(oid))

../../_images/tutorials_pre_executed_ztf-alerts-sne_12_0.png

../../_images/tutorials_pre_executed_ztf-alerts-sne_12_1.png

../../_images/tutorials_pre_executed_ztf-alerts-sne_12_2.png

../../_images/tutorials_pre_executed_ztf-alerts-sne_12_3.png

../../_images/tutorials_pre_executed_ztf-alerts-sne_12_4.png

5. Extract features and filter#

Here we use the light-curve package to fit each r-band light-curve with Bazin function (Bazin+2009).

We also extract number of observations and reduced χ² of the fit with constant function to filter objects with low variability.

Here we use the .map_rows() method of the Catalog object to apply the feature extraction function to each row of the catalog. It is very similar to the pandas.DataFrame.apply() method but it works more efficiently and conveniently with nested data. See nested-pandas documentation for more details.

[ ]:

# Initialize feature extractor to be used in extract_features() function
bazin_fit = licu.BazinFit(algorithm="ceres", ceres_niter=20, ceres_loss_reg=3)
feature_extractor = licu.Extractor(
    bazin_fit,
    licu.ObservationCount(),
    licu.ReducedChi2(),
)


def extract_features(band, t, m, sigma, oid):
    # Select r-band only and remove observations with NaN errors
    valid_r = (band == 2) & np.isfinite(sigma)
    t, m, sigma = t[valid_r], m[valid_r], sigma[valid_r]

    # Sort time and remove duplicated, required by light-curve package
    _, idx = np.unique(t, return_index=True)
    t, m, sigma = t[idx], m[idx], sigma[idx]

    # Convert magnitude to fluxes
    flux = 10 ** (-0.4 * (m - 8.9))
    flux_err = 0.4 * np.log(10) * sigma * flux

    # Output NaN feature value if we cannot compute it,
    # e.g. we don't have enough r-band observations
    values = feature_extractor(t, flux, flux_err, fill_value=np.nan)
    return dict(zip(feature_extractor.names, values))


# Extract features and filter objects by them
catalog_with_features = catalog.map_rows(
    extract_features,  # function
    columns=[
        "lc.lc_fid",
        "lc.lc_mjd",
        "lc.lc_magpsf",
        "lc.lc_sigmapsf",
        "oid",
    ],  # columns to use
    row_container="args",  # Pass columns as individual arguments
    meta=dict.fromkeys(feature_extractor.names, float),  # Dask meta
    append_columns=True,  # Add the result feature columns to the catalog
)

# Filter features to get nice light-curves
sn_candidates = catalog_with_features.query(
    "bazin_fit_reduced_chi2 > 0.8 and bazin_fit_reduced_chi2 < 3.0"
    " and bazin_fit_rise_time > 3 and bazin_fit_rise_time < 10"
    " and bazin_fit_fall_time < 30 and bazin_fit_fall_time > 10"
    " and observation_count >= 10"
    " and chi2 > 5.0"
)

sn_candidates

lsdb Catalog alerce_nested:

	oid	mean_ra	mean_dec	Norder	Dir	Npix	lc	nondet	ref	bazin_fit_amplitude	bazin_fit_baseline	bazin_fit_reference_time	bazin_fit_rise_time	bazin_fit_fall_time	bazin_fit_reduced_chi2	observation_count	chi2
npartitions=113
Order: 1, Pixel: 0	string[pyarrow]	double[pyarrow]	double[pyarrow]	int8[pyarrow]	int64[pyarrow]	int64[pyarrow]	nested<lc_ra: [double], lc_dec: [double], lc_c...	nested<nondet_mjd: [double], nondet_fid: [int6...	nested<ref_rfid: [int64], ref_candid: [int64],...	float64	float64	float64	float64	float64	float64	float64	float64
Order: 2, Pixel: 4	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 0, Pixel: 10	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 0, Pixel: 11	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...

The catalog has been loaded lazily, meaning no data has been read, only the catalog schema

6. Run analysis#

Let’s run our analysis over the whole catalog and output the result to a nested-pandas data-frame object. We run the analysis in parallel using Dask. See the Dask cluster configuration documentation page for tips on Dask Client initialization.

The code will download ~21GB of data. Check your Internet connection speed and give it some time. You can track the progress with the Dask dashboard. (You can also select a single partition as we did before, just replace .compute() with .partitions[0].compute().)

[8]:

%%time

with Client(n_workers=6, memory_limit="10GB", threads_per_worker=2) as client:
    display(client)
    ndf = sn_candidates.compute()

ndf

Client

Client-5ac1e9c0-08b7-11f0-a83c-1ecfd5b80ac0

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

LocalCluster

fb4eb787

Dashboard: http://127.0.0.1:8787/status	Workers: 6
Total threads: 12	Total memory: 55.88 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-312f204d-7534-49b0-9721-be5ea9b92d28

Comm: tcp://127.0.0.1:49939	Workers: 6
Dashboard: http://127.0.0.1:8787/status	Total threads: 12
Started: Just now	Total memory: 55.88 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:49956	Total threads: 2
Dashboard: http://127.0.0.1:49965/status	Memory: 9.31 GiB
Nanny: tcp://127.0.0.1:49942
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-2196cug_

Worker: 1

Comm: tcp://127.0.0.1:49954	Total threads: 2
Dashboard: http://127.0.0.1:49962/status	Memory: 9.31 GiB
Nanny: tcp://127.0.0.1:49944
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-axkphbol

Worker: 2

Comm: tcp://127.0.0.1:49955	Total threads: 2
Dashboard: http://127.0.0.1:49964/status	Memory: 9.31 GiB
Nanny: tcp://127.0.0.1:49946
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-oa7p2xgd

Worker: 3

Comm: tcp://127.0.0.1:49959	Total threads: 2
Dashboard: http://127.0.0.1:49963/status	Memory: 9.31 GiB
Nanny: tcp://127.0.0.1:49948
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-vnj7r004

Worker: 4

Comm: tcp://127.0.0.1:49957	Total threads: 2
Dashboard: http://127.0.0.1:49960/status	Memory: 9.31 GiB
Nanny: tcp://127.0.0.1:49950
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-vsdk5zjx

Worker: 5

Comm: tcp://127.0.0.1:49958	Total threads: 2
Dashboard: http://127.0.0.1:49961/status	Memory: 9.31 GiB
Nanny: tcp://127.0.0.1:49952
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-rp62ar8b

CPU times: user 1min 7s, sys: 20.8 s, total: 1min 28s
Wall time: 20min 37s

[8]:

oid

mean_ra

mean_dec

Norder

Dir

Npix

lc

nondet

ref

bazin_fit_amplitude

bazin_fit_baseline

bazin_fit_reference_time

bazin_fit_rise_time

bazin_fit_fall_time

bazin_fit_reduced_chi2

observation_count

chi2

2405857242818793

ZTF20abwzqzo

41.513817

3.329906

1

0

lc_ra	lc_dec	...	lc_has_stamp	lc_step_id_corr

41.513804

3.329934

...

False

bulk_1.0.0

38 rows × 29 columns

nondet_mjd	nondet_fid	nondet_diffmaglim

59062.398843

1

18.9327

54 rows × 3 columns

ref_rfid	ref_candid	...	ref_mjdendref	ref_nframesref

452120223

1333447212315015004

...

58379.420868

15

2 rows × 14 columns

0.000408

0.000030

59100.573583

3.111346

11.490058

0.879535

28.000000

56.120445

3203066719741431

ZTF22abmxzxm

42.606750

5.793783

1

0

42.606741

5.793787

...

True

1.1.6

23 rows × 29 columns

59840.406748

1

20.4905

22 rows × 3 columns

452120233

2111382493315015009

...

58372.5

15

2 rows × 14 columns

0.000363

0.000032

59870.560379

3.061167

20.848320

1.679472

13.000000

36.492215

10935697778842557

ZTF21abvdazf

41.059097

10.256258

1

0

41.059126

10.256229

...

True

correction_1.0.6

24 rows × 29 columns

59424.455359

1

19.567

19 rows × 3 columns

505120231

1697473413115015011

...

58447.35191

35

2 rows × 14 columns

0.000111

0.000005

59455.898052

5.144836

24.962658

1.132158

14.000000

11.381434

15846987093732276

ZTF20acqlfwb

42.891443

13.021577

1

0

42.89143

13.021509

...

False

bulk_1.0.0

30 rows × 29 columns

59140.410324

1

20.5699

37 rows × 3 columns

505120240

1413317324015015032

...

58447.35191

34

2 rows × 14 columns

0.000187

0.000003

59174.930188

3.182938

21.147838

0.898472

20.000000

26.000655

29759010088767360

ZTF22abfyzir

48.358801

19.789209

1

0

48.358792

19.789219

...

True

dev

82 rows × 29 columns

59811.468692

1

20.1277

52 rows × 3 columns

557120146

2085339474615015001

...

58426.25

15

4 rows × 14 columns

0.000369

0.000006

59844.007861

3.360073

23.618371

0.888075

44.000000

72.378785

...

796 rows x 17 columns

7. Plot SN candidate light curves#

Let’s plot light curves of our objects and compare them to what we had before. They look like really nice SN candidates, go check them with the ALeRCE online viewer!

[9]:

head = ndf.head(5)

for _healpix29, row in head.iterrows():
    plt.figure(figsize=(7, 5))

    def model(time):
        time = np.asarray(time)
        params = np.asarray(row[bazin_fit.names], dtype=time.dtype)
        flux = bazin_fit.model(time, params)
        mag = 8.9 - 2.5 * np.log10(flux)
        return mag

    plot_lc(row["lc"], row["nondet"], title=row["oid"], rband_model=model)
    print(f"https://alerce.online/object/{oid}")

https://alerce.online/object/ZTF19aaajdww
https://alerce.online/object/ZTF19aaajdww
https://alerce.online/object/ZTF19aaajdww
https://alerce.online/object/ZTF19aaajdww
https://alerce.online/object/ZTF19aaajdww

../../_images/tutorials_pre_executed_ztf-alerts-sne_18_1.png

../../_images/tutorials_pre_executed_ztf-alerts-sne_18_2.png

../../_images/tutorials_pre_executed_ztf-alerts-sne_18_3.png

../../_images/tutorials_pre_executed_ztf-alerts-sne_18_4.png

../../_images/tutorials_pre_executed_ztf-alerts-sne_18_5.png

About#

Authors: Konstantin Malanchev and Mi Dai

Last updated on: April 17, 2025

If you use lsdb for published research, please cite following instructions.

Search for SN-like light curves in ZTF alerts

Contents

Search for SN-like light curves in ZTF alerts#

1. Install and import required packages#

2. Helper function for light-curve plotting#

3. Load catalog structure and “nest” columns#

4. Execute and plot example light curves#

Client

Cluster Info

LocalCluster

Scheduler Info

Scheduler

Workers

Worker: 0

5. Extract features and filter#

6. Run analysis#

Client

Cluster Info

LocalCluster

Scheduler Info

Scheduler

Workers

Worker: 0

Worker: 1

Worker: 2

Worker: 3

Worker: 4

Worker: 5

7. Plot SN candidate light curves#

About#