Skip to content

bin_by_binding_rank

Assigns a rank bin to each row in a DataFrame based on binding signal.

This function divides the DataFrame into partitions based on the specified bin size, assigns a rank to each row within these partitions, and then sorts the DataFrame based on the ‘effect’ and ‘binding_pvalue’ columns. The ranking is assigned such that rows within each bin get the same rank, and the rank value is determined by the bin size.

Parameters:

Name Type Description Default
df DataFrame

The DataFrame to be ranked and sorted. It must contain ‘effect’ and ‘binding_pvalue’ columns.

required
bin_size int

The size of each bin for partitioning the DataFrame for ranking.

required
rank_by_binding_effect bool

If True, the DataFrame is sorted by abs(‘effect’) in descending order first with ties broken by pvalue. If False, sort by pvalue first with ties broken by effect size. Defaults to False

False

Returns:

Type Description

pd.DataFrame: The input DataFrame with an added ‘rank’ column, sorted by ‘effect’ in descending order or ‘binding_pvalue’ in ascending order depending on rank_by_binding_effect.

Example

df = pd.DataFrame({‘effect’: [1.2, 0.5, 0.8], … ‘binding_pvalue’: [5, 3, 4]}) bin_by_binding_rank(df, 2)

Returns a DataFrame with added ‘rank’ column and sorted as per

the specified criteria.

Source code in callingcardstools/Analysis/yeast/rank_response.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
def bin_by_binding_rank(
    df: pd.DataFrame, bin_size: int, rank_by_binding_effect: bool = False
):
    """
    Assigns a rank bin to each row in a DataFrame based on binding signal.

    This function divides the DataFrame into partitions based on the specified
    bin size, assigns a rank to each row within these partitions, and then
    sorts the DataFrame based on the 'effect' and 'binding_pvalue' columns. The
    ranking is assigned such that rows within each bin get the same rank, and
    the rank value is determined by the bin size.

    Args:
        df (pd.DataFrame): The DataFrame to be ranked and sorted.
            It must contain 'effect' and 'binding_pvalue' columns.
        bin_size (int): The size of each bin for partitioning the DataFrame
            for ranking.
        rank_by_binding_effect (bool, optional): If True, the DataFrame is sorted by
            abs('effect') in descending order first with ties broken by pvalue.
            If False, sort by pvalue first with ties broken by effect size.
            Defaults to False

    Returns:
        pd.DataFrame: The input DataFrame with an added 'rank' column, sorted
            by 'effect' in descending order or 'binding_pvalue' in
            ascending order depending on `rank_by_binding_effect`.

    Example:
        >>> df = pd.DataFrame({'effect': [1.2, 0.5, 0.8],
        ...                    'binding_pvalue': [5, 3, 4]})
        >>> bin_by_binding_rank(df, 2)
        # Returns a DataFrame with added 'rank' column and sorted as per
        # the specified criteria.
    """
    if "binding_pvalue" not in df.columns:
        raise KeyError("Column 'binding_pvalue' is not in the data")
    if "binding_effect" not in df.columns:
        raise KeyError("Column 'binding_effect' is not in the data")

    parts = min(len(df), bin_size)
    df_abs = df.assign(abs_binding_effect=df["binding_effect"].abs())

    df_sorted = df_abs.sort_values(
        by=(
            ["abs_binding_effect", "binding_pvalue"]
            if rank_by_binding_effect
            else ["binding_pvalue", "abs_binding_effect"]
        ),
        ascending=[False, True] if rank_by_binding_effect else [True, False],
    )

    return (
        df_sorted.drop(columns=["abs_binding_effect"])
        .reset_index(drop=True)
        .assign(rank_bin=create_partitions(len(df_sorted), parts) * parts)
    )