Accessing remote data#
If you’re accessing HATS catalogs on a local file system, a typical path string like "/path/to/catalogs"
will be sufficient. This tutorial will help you get started if you need to access data over HTTP/S, cloud storage, or have some additional parameters for connecting to your data.
We use fsspec and universal_pathlib to create connections to remote data sources. Please refer to their documentation for a list of supported filesystems and any filesystem-specific parameters.
If you’re using PyPI/pip for package management, you can install ALL of the fsspec implementations, as well as some other nice-to-have dependencies with pip install 'lsdb[full]'
.
Below, we provide some a basic workflow for accessing remote data, as well as filesystem-specific hints.
HTTP / HTTPS#
Firstly, make sure to install the fsspec http package:
pip install aiohttp
OR
conda install aiohttp
[1]:
from upath import UPath
test_path = UPath("https://data.lsdb.io/hats/gaia_dr3/gaia/")
test_path.exists()
[1]:
True
[2]:
import lsdb
cat = lsdb.read_hats("https://data.lsdb.io/hats/gaia_dr3/gaia/")
cat
[2]:
solution_id | designation | source_id | ref_epoch | ra | ra_error | dec | dec_error | parallax | parallax_error | pm | pmra | pmra_error | pmdec | pmdec_error | phot_g_n_obs | phot_g_mean_flux | phot_g_mean_flux_error | phot_g_mean_mag | phot_bp_n_obs | phot_bp_mean_flux | phot_bp_mean_flux_error | phot_bp_mean_mag | phot_rp_n_obs | phot_rp_mean_flux | phot_rp_mean_flux_error | phot_rp_mean_mag | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
npartitions=3933 | |||||||||||||||||||||||||||
Order: 2, Pixel: 0 | int64[pyarrow] | string[pyarrow] | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] |
Order: 3, Pixel: 4 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Order: 4, Pixel: 3067 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Order: 3, Pixel: 767 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Occasionally, with HTTPS data, you may see issues with missing certificates. If you encounter a FileNotFoundError
, but you’re pretty sure the file should be found:
Check your network and server availability
On Linux, be sure that openSSL and ca-certificates are in place
On Mac, run
/Applications/Python\ 3.*/Install\ Certificates.command