Search for SN-like light curves in ZTF alerts#
We will use lsdb package to load a HATS catalog with ZTF alerts. The dataset contains all alerts sent from the beginning of the survey from 2018-05-04 to 2023-09-13 corresponding to objects having at least 20 detections.
The dataset is provided by the ALeRCE broker team.
The goal is to find supernova (SN) candidates in this dataset using the goodness of the Bazin (Bazin+2009) function fit, a simple parametric model for SN light curves.
The pipeline will be as follows:
Load the dataset with LSDB
Convert it to a nested format with
nested-dask
packageFit Bazin function and extract some other light-curve features with
light-curve
packageFilter data and plot few light curves
Install and import required packages#
We need LSDB for data loading and analysis (includes dask, nested-pandas and nested-dask) and light-curve package for feature extraction.
[1]:
%pip install -q lsdb
# This --only-binary flag is required to avoid installation errors on some systems
%pip install -q --only-binary=light-curve light-curve
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
[2]:
import light_curve as licu # light-curve feature extraction library
import matplotlib.pyplot as plt # Plotting
import numpy as np # Array operations
from dask.distributed import Client # Create multi-processing cluster and connect to it
from lsdb import read_hats # LSDB entry-point, load HATS catalog
from matplotlib.ticker import MaxNLocator # Fix ticks on plots
Helper function for light-curve plotting#
The function accepts a pandas data frame and plots a light curve.
[3]:
def plot_lc(lc, nondet, title=None, rband_model=None):
"""Plot light curve with non-detections."""
for fid, band in zip([1, 2], "gr"):
idx = lc["lc_fid"] == fid
plt.scatter(
lc["lc_mjd"][idx], lc["lc_magpsf"][idx], label=f"{band} detection", color=band, marker="x", s=70
)
plt.errorbar(
lc["lc_mjd"][idx], lc["lc_magpsf"][idx], lc["lc_sigmapsf"][idx], color=band, ls="", capsize=5
)
idx = nondet["nondet_fid"] == fid
plt.scatter(
nondet["nondet_mjd"][idx],
nondet["nondet_diffmaglim"][idx],
marker="v",
color=band,
alpha=0.25,
s=70,
label=f"{band} non-detection",
)
if rband_model is not None:
xmin, xmax = plt.xlim()
ymin, ymax = plt.ylim()
r_times = np.linspace(xmin, xmax, 1000)
r_mags = rband_model(r_times)
plt.plot(r_times, r_mags, color="k", linestyle="-", label="r-band Bazin fit")
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)
plt.legend(fontsize=14)
plt.title(title, fontsize=18)
plt.xlabel("MJD", fontsize=18)
plt.ylabel("Magnitude", fontsize=18)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.gca().xaxis.set_major_locator(MaxNLocator(nbins=6))
plt.gca().invert_yaxis()
plt.grid(True)
Load catalog structure and “nest” columns#
Here we load the catalog structure and transform list-columns to a compact representation. These transformed columns are “nested data frames”, so each item could be represented by a small pandas dataframe. We are going to have three nested columns:
“lc”, for light curves, each point corresponding to some alert (detection)
“nondet”, for non-detections (upper limits)
“ref”, for ZTF reference objects associated with alerts
Here we have not downloaded any data yet, all data access and analysis happens only after .compute()
is called.
Here we display two versions of the catalog: the first one is the raw catalog with nested lists, and the second one is the catalog with nested columns.
[4]:
ZTF_ALERTS = "https://data.lsdb.io/hats/alerce/"
# Load catalog with nested lists
raw_catalog = read_hats(
ZTF_ALERTS,
)
display(raw_catalog)
# Pack all list-columns into single column
catalog_with_lc = raw_catalog.nest_lists(
base_columns=[col for col in raw_catalog.columns if not col.startswith("lc_")],
name="lc",
)
# Pack non-detections
catalog_with_nondet = catalog_with_lc.nest_lists(
base_columns=[col for col in catalog_with_lc.columns if not col.startswith("nondet_")],
name="nondet",
)
# Pack ZTF references
catalog = catalog_with_nondet.nest_lists(
base_columns=[col for col in catalog_with_nondet.columns if not col.startswith("ref_")],
name="ref",
)
catalog
oid | mean_ra | mean_dec | lc_ra | lc_dec | lc_candid | lc_mjd | lc_fid | lc_pid | lc_diffmaglim | lc_isdiffpos | lc_nid | lc_magpsf | lc_sigmapsf | lc_magap | lc_sigmagap | lc_distnr | lc_rb | lc_rbversion | lc_drb | lc_drbversion | lc_magapbig | lc_sigmagapbig | lc_rfid | lc_magpsf_corr | lc_sigmapsf_corr | lc_sigmapsf_corr_ext | lc_corrected | lc_dubious | lc_parent_candid | lc_has_stamp | lc_step_id_corr | nondet_mjd | nondet_fid | nondet_diffmaglim | ref_rfid | ref_candid | ref_fid | ref_rcid | ref_field | ref_magnr | ref_sigmagnr | ref_chinr | ref_sharpnr | ref_ranr | ref_decnr | ref_mjdstartref | ref_mjdendref | ref_nframesref | Norder | Dir | Npix | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
npartitions=113 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Order: 1, Pixel: 0 | string[pyarrow] | double[pyarrow] | double[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: int64>[pyarrow] | list<element: double>[pyarrow] | list<element: int64>[pyarrow] | list<element: int64>[pyarrow] | list<element: double>[pyarrow] | list<element: int64>[pyarrow] | list<element: int64>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: string>[pyarrow] | list<element: double>[pyarrow] | list<element: string>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: bool>[pyarrow] | list<element: bool>[pyarrow] | list<element: double>[pyarrow] | list<element: bool>[pyarrow] | list<element: string>[pyarrow] | list<element: double>[pyarrow] | list<element: int64>[pyarrow] | list<element: double>[pyarrow] | list<element: int64>[pyarrow] | list<element: int64>[pyarrow] | list<element: int64>[pyarrow] | list<element: int64>[pyarrow] | list<element: int64>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: double>[pyarrow] | list<element: int64>[pyarrow] | int8[pyarrow] | int64[pyarrow] | int64[pyarrow] |
Order: 2, Pixel: 4 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Order: 0, Pixel: 10 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Order: 0, Pixel: 11 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
[4]:
oid | mean_ra | mean_dec | Norder | Dir | Npix | lc | nondet | ref | |
---|---|---|---|---|---|---|---|---|---|
npartitions=113 | |||||||||
Order: 1, Pixel: 0 | string[pyarrow] | double[pyarrow] | double[pyarrow] | int8[pyarrow] | int64[pyarrow] | int64[pyarrow] | nested<lc_ra: [double], lc_dec: [double], lc_c... | nested<nondet_mjd: [double], nondet_fid: [int6... | nested<ref_rfid: [int64], ref_candid: [int64],... |
Order: 2, Pixel: 4 | ... | ... | ... | ... | ... | ... | ... | ... | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Order: 0, Pixel: 10 | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Order: 0, Pixel: 11 | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Plot example light curves#
It is time to run our first .compute()
! Here we select the first catalog partition (e.g. a single catalog file), load it and plot first five light curves from this file.
Note how we use nested “lc” column here: each value is represented by a pandas light curve, which we use for plotting.
[5]:
# "Compute" the first HATS partition and plot few light curves
# Single partition requires a single worker
with Client(n_workers=1) as client:
display(client)
ndf = catalog.partitions[0].compute()
Client
Client-2614eba0-08b7-11f0-a83c-1ecfd5b80ac0
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
b6c8c35c
Dashboard: http://127.0.0.1:8787/status | Workers: 1 |
Total threads: 12 | Total memory: 32.00 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-7203fe32-488e-4744-bd6a-ddbc20364e54
Comm: tcp://127.0.0.1:49903 | Workers: 1 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 12 |
Started: Just now | Total memory: 32.00 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:49908 | Total threads: 12 |
Dashboard: http://127.0.0.1:49909/status | Memory: 32.00 GiB |
Nanny: tcp://127.0.0.1:49906 | |
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-c6dmhp1q |
[6]:
for i in range(5):
lc = ndf["lc"].iloc[i]
oid = ndf["oid"].iloc[i]
nondet = ndf["nondet"].iloc[i]
plt.figure()
plot_lc(lc, nondet, title=str(oid))





Extract features and filter#
Here we use the light-curve package to fit each r-band light-curve with Bazin function (Bazin+2009).
We also extract number of observations and reduced χ² of the fit with constant function to filter objects with low variability.
Here we use the .reduce()
method of the Catalog
object to apply the feature extraction function to each row of the catalog. It is very similar to the pandas.DataFrame.apply()
method but it works more efficiently and conveniently with nested data. See `nested-pandas
documentation <https://nested-pandas.readthedocs.io/en/latest/gettingstarted/quickstart.html#Reduce-Function>`__ for more details.
[7]:
# Initialize feature extractor to be used in extract_features() function
bazin_fit = licu.BazinFit(algorithm="ceres", ceres_niter=20, ceres_loss_reg=3)
feature_extractor = licu.Extractor(
bazin_fit,
licu.ObservationCount(),
licu.ReducedChi2(),
)
def extract_features(band, t, m, sigma, oid):
# Select r-band only and remove observations with NaN errors
valid_r = (band == 2) & np.isfinite(sigma)
t, m, sigma = t[valid_r], m[valid_r], sigma[valid_r]
# Sort time and remove duplicated, required by light-curve package
_, idx = np.unique(t, return_index=True)
t, m, sigma = t[idx], m[idx], sigma[idx]
# Convert magnitude to fluxes
flux = 10 ** (-0.4 * (m - 8.9))
flux_err = 0.4 * np.log(10) * sigma * flux
# Output NaN feature value if we cannot compute it,
# e.g. we don't have enough r-band observations
values = feature_extractor(t, flux, flux_err, fill_value=np.nan)
return dict(zip(feature_extractor.names, values))
# Extract features and filter objects by them
catalog_with_features = catalog.reduce(
extract_features, # function
"lc.lc_fid",
"lc.lc_mjd",
"lc.lc_magpsf",
"lc.lc_sigmapsf",
"oid", # columns to use
meta=dict.fromkeys(feature_extractor.names, float), # Dask meta
append_columns=True, # Add the result feature columns to the catalog
)
# Filter features to get nice light-curves
sn_candidates = catalog_with_features.query(
"bazin_fit_reduced_chi2 > 0.8 and bazin_fit_reduced_chi2 < 3.0"
" and bazin_fit_rise_time > 3 and bazin_fit_rise_time < 10"
" and bazin_fit_fall_time < 30 and bazin_fit_fall_time > 10"
" and observation_count >= 10"
" and chi2 > 5.0"
)
sn_candidates
[7]:
oid | mean_ra | mean_dec | Norder | Dir | Npix | lc | nondet | ref | bazin_fit_amplitude | bazin_fit_baseline | bazin_fit_reference_time | bazin_fit_rise_time | bazin_fit_fall_time | bazin_fit_reduced_chi2 | observation_count | chi2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
npartitions=113 | |||||||||||||||||
Order: 1, Pixel: 0 | string[pyarrow] | double[pyarrow] | double[pyarrow] | int8[pyarrow] | int64[pyarrow] | int64[pyarrow] | nested<lc_ra: [double], lc_dec: [double], lc_c... | nested<nondet_mjd: [double], nondet_fid: [int6... | nested<ref_rfid: [int64], ref_candid: [int64],... | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 |
Order: 2, Pixel: 4 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Order: 0, Pixel: 10 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Order: 0, Pixel: 11 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Run analysis#
Let’s run our analysis over the whole catalog and output the result to a nested-pandas
data-frame object. We run the analysis in parallel using Dask. See the Dask cluster configuration documentation page for tips on Dask Client
initialization.
The code will download ~21GB of data. Check your Internet connection speed and give it some time. You can track the progress with the Dask dashboard. (You can also select a single partition as we did before, just replace .compute()
with .partitions[0].compute()
.)
[8]:
%%time
with Client(n_workers=6, memory_limit="10GB", threads_per_worker=2) as client:
display(client)
ndf = sn_candidates.compute()
ndf
Client
Client-5ac1e9c0-08b7-11f0-a83c-1ecfd5b80ac0
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
fb4eb787
Dashboard: http://127.0.0.1:8787/status | Workers: 6 |
Total threads: 12 | Total memory: 55.88 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-312f204d-7534-49b0-9721-be5ea9b92d28
Comm: tcp://127.0.0.1:49939 | Workers: 6 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 12 |
Started: Just now | Total memory: 55.88 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:49956 | Total threads: 2 |
Dashboard: http://127.0.0.1:49965/status | Memory: 9.31 GiB |
Nanny: tcp://127.0.0.1:49942 | |
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-2196cug_ |
Worker: 1
Comm: tcp://127.0.0.1:49954 | Total threads: 2 |
Dashboard: http://127.0.0.1:49962/status | Memory: 9.31 GiB |
Nanny: tcp://127.0.0.1:49944 | |
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-axkphbol |
Worker: 2
Comm: tcp://127.0.0.1:49955 | Total threads: 2 |
Dashboard: http://127.0.0.1:49964/status | Memory: 9.31 GiB |
Nanny: tcp://127.0.0.1:49946 | |
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-oa7p2xgd |
Worker: 3
Comm: tcp://127.0.0.1:49959 | Total threads: 2 |
Dashboard: http://127.0.0.1:49963/status | Memory: 9.31 GiB |
Nanny: tcp://127.0.0.1:49948 | |
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-vnj7r004 |
Worker: 4
Comm: tcp://127.0.0.1:49957 | Total threads: 2 |
Dashboard: http://127.0.0.1:49960/status | Memory: 9.31 GiB |
Nanny: tcp://127.0.0.1:49950 | |
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-vsdk5zjx |
Worker: 5
Comm: tcp://127.0.0.1:49958 | Total threads: 2 |
Dashboard: http://127.0.0.1:49961/status | Memory: 9.31 GiB |
Nanny: tcp://127.0.0.1:49952 | |
Local directory: /var/folders/w1/lh3h4s7d5g10rdlfj4h0mshw0000gn/T/dask-scratch-space/worker-rp62ar8b |
CPU times: user 1min 7s, sys: 20.8 s, total: 1min 28s
Wall time: 20min 37s
[8]:
oid | mean_ra | mean_dec | Norder | Dir | Npix | lc | nondet | ref | bazin_fit_amplitude | bazin_fit_baseline | bazin_fit_reference_time | bazin_fit_rise_time | bazin_fit_fall_time | bazin_fit_reduced_chi2 | observation_count | chi2 | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2405857242818793 | ZTF20abwzqzo | 41.513817 | 3.329906 | 1 | 0 | 0 |
38 rows × 29 columns |
54 rows × 3 columns |
2 rows × 14 columns |
0.000408 | 0.000030 | 59100.573583 | 3.111346 | 11.490058 | 0.879535 | 28.000000 | 56.120445 | ||||||||||||||||||||||||||
3203066719741431 | ZTF22abmxzxm | 42.606750 | 5.793783 | 1 | 0 | 0 |
23 rows × 29 columns |
22 rows × 3 columns |
2 rows × 14 columns |
0.000363 | 0.000032 | 59870.560379 | 3.061167 | 20.848320 | 1.679472 | 13.000000 | 36.492215 | ||||||||||||||||||||||||||
10935697778842557 | ZTF21abvdazf | 41.059097 | 10.256258 | 1 | 0 | 0 |
24 rows × 29 columns |
19 rows × 3 columns |
2 rows × 14 columns |
0.000111 | 0.000005 | 59455.898052 | 5.144836 | 24.962658 | 1.132158 | 14.000000 | 11.381434 | ||||||||||||||||||||||||||
15846987093732276 | ZTF20acqlfwb | 42.891443 | 13.021577 | 1 | 0 | 0 |
30 rows × 29 columns |
37 rows × 3 columns |
2 rows × 14 columns |
0.000187 | 0.000003 | 59174.930188 | 3.182938 | 21.147838 | 0.898472 | 20.000000 | 26.000655 | ||||||||||||||||||||||||||
29759010088767360 | ZTF22abfyzir | 48.358801 | 19.789209 | 1 | 0 | 0 |
82 rows × 29 columns |
52 rows × 3 columns |
4 rows × 14 columns |
0.000369 | 0.000006 | 59844.007861 | 3.360073 | 23.618371 | 0.888075 | 44.000000 | 72.378785 | ||||||||||||||||||||||||||
... | ... | ... | ... | ... | ... | ... |
Plot SN candidate light curves#
Let’s plot light curves of our objects and compare them to what we had before. They look like really nice SN candidates, go check them with the ALeRCE online viewer!
[9]:
head = ndf.head(5)
for _healpix29, row in head.iterrows():
plt.figure(figsize=(7, 5))
def model(time):
time = np.asarray(time)
params = np.asarray(row[bazin_fit.names], dtype=time.dtype)
flux = bazin_fit.model(time, params)
mag = 8.9 - 2.5 * np.log10(flux)
return mag
plot_lc(row["lc"], row["nondet"], title=row["oid"], rband_model=model)
print(f"https://alerce.online/object/{oid}")
https://alerce.online/object/ZTF19aaajdww
https://alerce.online/object/ZTF19aaajdww
https://alerce.online/object/ZTF19aaajdww
https://alerce.online/object/ZTF19aaajdww
https://alerce.online/object/ZTF19aaajdww




