labretriever¶
A Python package for querying and managing genomic and transcriptomic datasets hosted on HuggingFace Hub. It provides a unified SQL interface (via DuckDB) across heterogeneous datasets, with local caching and structured metadata exploration.
See the documentation for full usage guides and API reference. The BrentLab yeast resources collection is an example of datasets designed to work with this package.
Installation¶
Install the latest release from PyPI:
To get the most recent changes ahead of a PyPI release, install directly from the main branch on GitHub:
Set your HuggingFace token if accessing private datasets:
Usage¶
from labretriever import VirtualDB
vdb = VirtualDB("config.yaml")
# Discover available views
vdb.tables()
vdb.describe("harbison")
# Query with SQL
df = vdb.query("SELECT * FROM harbison_meta WHERE carbon_source = $cs", cs="glucose")
VirtualDB loads datasets from HuggingFace (caching locally), constructs DuckDB
views over Parquet files, and exposes metadata and full-data views for SQL
querying. See the docs for how to write a config.yaml and structure your
HuggingFace dataset cards.