Accessing remote data

Contents

Accessing remote data#

If you’re accessing HATS catalogs on a local file system, a typical path string like "/path/to/catalogs" will be sufficient. This tutorial will help you get started if you need to access data over HTTP/S, cloud storage, or have some additional parameters for connecting to your data.

We use fsspec and universal_pathlib to create connections to remote data sources. Please refer to their documentation for a list of supported filesystems and any filesystem-specific parameters.

If you’re using PyPI/pip for package management, you can install ALL of the fsspec implementations, as well as some other nice-to-have dependencies with pip install 'lsdb[full]'.

Below, we provide some a basic workflow for accessing remote data, as well as filesystem-specific hints.

HTTP / HTTPS#

Firstly, make sure to install the fsspec http package:

pip install aiohttp

OR

conda install aiohttp
[1]:
from upath import UPath

test_path = UPath("https://data.lsdb.io/hats/gaia_dr3/gaia/")
test_path.exists()
[1]:
True
[2]:
import lsdb

cat = lsdb.read_hats("https://data.lsdb.io/hats/gaia_dr3/gaia/")
cat
[2]:
lsdb Catalog gaia:
solution_id designation source_id ref_epoch ra ra_error dec dec_error parallax parallax_error pm pmra pmra_error pmdec pmdec_error phot_g_n_obs phot_g_mean_flux phot_g_mean_flux_error phot_g_mean_mag phot_bp_n_obs phot_bp_mean_flux phot_bp_mean_flux_error phot_bp_mean_mag phot_rp_n_obs phot_rp_mean_flux phot_rp_mean_flux_error phot_rp_mean_mag
npartitions=3933
Order: 2, Pixel: 0 int64[pyarrow] string[pyarrow] int64[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] int64[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] int64[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] int64[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow]
Order: 3, Pixel: 4 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Order: 4, Pixel: 3067 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Order: 3, Pixel: 767 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
The catalog has been loaded lazily, meaning no data has been read, only the catalog schema

Occasionally, with HTTPS data, you may see issues with missing certificates. If you encounter a FileNotFoundError, but you’re pretty sure the file should be found:

  1. Check your network and server availability

  2. On Linux, be sure that openSSL and ca-certificates are in place

  3. On Mac, run /Applications/Python\ 3.*/Install\ Certificates.command