lsdb.catalog.dataset.dataset#

Classes#

Dataset

Base HATS Dataset

Module Contents#

class Dataset(ddf: lsdb.nested.NestedFrame, hc_structure: hats.catalog.Dataset)[source]#

Base HATS Dataset

_ddf[source]#
hc_structure[source]#
__repr__()[source]#
_repr_html_()[source]#
_repr_data()[source]#
compute() nested_pandas.NestedFrame[source]#

Compute dask distributed dataframe to pandas dataframe

to_delayed(optimize_graph: bool = True) list[dask.delayed.Delayed][source]#

Get a list of Dask Delayed objects for each partition in the dataset

Used for more advanced custom operations, but to use again with LSDB, the delayed objects must be converted to a Dask DataFrame and used with extra metadata to construct an LSDB Dataset.

Parameters:

optimize_graph (bool) – If True [default], the graph is optimized before converting into dask.delayed objects.

property name[source]#

The name of the catalog

property dtypes[source]#

Returns the datatypes of the columns in the Dataset

property columns[source]#

Returns the columns in the Dataset

property all_columns[source]#

Returns all columns in the original Dataset

property original_schema[source]#

Returns the schema of the original Dataset

_check_unloaded_columns(column_names: collections.abc.Sequence[str | None] | None)[source]#

Check the list of given column names for any that are valid but unavailable because they were not loaded.