query

Contents

query#

Catalog.query(expr: str) Catalog[source]#

Filters catalog and respective margin, if it exists, using a complex query expression

Parameters:
exprstr

Query expression to evaluate. The column names that are not valid Python variables names should be wrapped in backticks, and any variable values can be injected using f-strings. The use of ‘@’ to reference variables is not supported. More information about pandas query strings is available here.

Returns:
Catalog

A catalog that contains the data from the original catalog that complies with the query expression. If a margin exists, it is filtered according to the same query expression.

Examples

Filter a small synthetic catalog using a pandas-style query string:

>>> import lsdb
>>> from lsdb.nested.datasets import generate_data
>>> nf = generate_data(1000, 10, seed=0, ra_range=(0.0, 300.0), dec_range=(-50.0, 50.0))
>>> catalog = lsdb.from_dataframe(nf.compute())
>>> filtered = catalog.query("ra < 100 and dec > 0")
>>> filtered.head()[["ra", "dec", "id"]]
                           ra        dec    id
_healpix_29
118362963675428450  52.696686  39.675892  8154
98504457942331510   89.913567  46.147079  3437
70433374600953220   40.528952  35.350965  8214
154968715224527848   17.57041    29.8936  9853
67780378363846894    45.08384   31.95611  8297

Filter nested values:

>>> filtered = filtered.query("nested.flux > 50.0")
>>> filtered.head()[["nested", "id"]]
                                                               nested    id
_healpix_29
118362963675428450  [{t: 5.431006, flux: 88.466194, flux_error: 1....  8154
98504457942331510   [{t: 12.235667, flux: 67.145637, flux_error: 1...  3437
70433374600953220   [{t: 1.395766, flux: 61.888264, flux_error: 1....  8214
154968715224527848  [{t: 5.057078, flux: 60.744756, flux_error: 1....  9853
67780378363846894   [{t: 0.001474, flux: 76.631059, flux_error: 1....  8297

Most of the Series and NestedSeries attributes and methods are available. This will filter by the light curve length:

>>> filtered = filtered.query("nested.list_lengths >= 5")
>>> filtered.head()[["nested", "id"]]
                                                               nested    id
_healpix_29
98504457942331510   [{t: 12.235667, flux: 67.145637, flux_error: 1...  3437
70433374600953220   [{t: 1.395766, flux: 61.888264, flux_error: 1....  8214
67780378363846894   [{t: 0.001474, flux: 76.631059, flux_error: 1....  8297
153045793800159522  [{t: 11.363245, flux: 72.987868, flux_error: 1...  5758
81822373343408413   [{t: 15.681661, flux: 77.580224, flux_error: 1...  9413