Skip to content

MCP Server Setup

labretriever ships an MCP server that exposes a VirtualDB instance as a set of tools callable by Claude Code (or any other MCP client). Once configured, Claude can discover datasets, inspect schema, and execute DuckDB SQL queries against your collection without any manual Python.

Quick Install (Claude Code Plugin)

First, install labretriever so that labretriever-mcp is available on your PATH.

Then add the marketplace and install the plugin:

/plugin marketplace add cmatKhan/labretriever
/plugin install labretriever@labretriever

The plugin will prompt you for a VirtualDB config file path and an optional HuggingFace token at enable time. If labretriever-mcp is not found on PATH when a session starts, Claude will display installation instructions.

For the BrentLab yeast resources collection, download the ready-to-use config from:

https://github.com/BrentLab/tfbpshiny/blob/main/tfbpshiny/brentlab_yeast_collection.yaml

Save it to a stable path and provide that path when the plugin prompts you.

Manual Configuration (without the plugin)

Install the package first — see Installation.

LABRETRIEVER_CONFIG must point to a VirtualDB YAML file that you create or download — it tells the server which HuggingFace datasets to expose and how to map their fields. See the VirtualDB Configuration docs for the full format.

Add the following to .claude/settings.json (or ~/.claude/settings.json for user-level):

{
  "mcpServers": {
    "labretriever": {
      "command": "labretriever-mcp",
      "type": "stdio",
      "env": {
        "LABRETRIEVER_CONFIG": "/absolute/path/to/brentlab_yeast_collection.yaml",
        "HF_TOKEN": "${HF_TOKEN}"
      }
    }
  }
}

HF_TOKEN is only required for private HuggingFace repositories. If it is not set and a query touches a private or gated repository, the server will return a clear error naming the repository.

Available Tools

Once the server is running, Claude has access to these tools:

Tool Description
list_datasets List all registered dataset names (call this first).
describe_dataset Return column names and types for a {name} or {name}_meta view.
get_column_metadata Return semantic roles and condition-level definitions for each column.
get_tags Return provenance tags (assay type, publication, etc.) for a dataset.
get_common_fields Return column names shared across all _meta views.
query Execute DuckDB SQL; returns shape by default, rows when return_data=True.

Example Session

After connecting, a typical workflow in Claude Code looks like:

  1. list_datasets - discover available views (harbison, callingcards, etc.)
  2. describe_dataset("harbison_meta") - inspect sample-level columns
  3. get_column_metadata("harbison") - understand condition values and measurement roles
  4. query("SELECT * FROM harbison_meta WHERE condition = 'GAL'", return_data=True) - explore
  5. query("SELECT regulator_symbol, COUNT(*) FROM harbison WHERE condition = 'GAL' AND pvalue < 0.001 GROUP BY 1 ORDER BY 2 DESC") - full analysis

See the VirtualDB tutorial for more query patterns.