Skip to content

compute_rank_response

Computes rank-based statistics and binomial test results for a DataFrame.

This function groups the DataFrame by ‘rank_bin’ and aggregates it to calculate the number of responsive items in each rank bin, as well as various statistics related to a binomial test. It calculates the cumulative number of successes, response ratio, p-value, and confidence intervals for each rank bin.

Parameters:

Name Type Description Default
df DataFrame

DataFrame containing the columns ‘rank_bin’, ‘responsive’, and ‘random’. ‘rank_bin’ is an integer representing the rank bin, ‘responsive’ is a boolean indicating responsiveness, and ‘random’ is a float representing the random expectation.

required
Additional keyword arguments

Additional keyword arguments are passed to the binomtest function, including arguments to the proportional_ci method of the BinomTestResults object (see scipy documentation for details)

required

Returns:

Type Description
DataFrame

pd.DataFrame: A DataFrame indexed by ‘rank_bin’ with columns for the number of responsive items in each bin (‘n_responsive_in_rank’), cumulative number of successes (‘n_successes’), response ratio (‘response_ratio’), p-value (‘p_value’), and confidence interval bounds (‘ci_lower’ and ‘ci_upper’).

Example

df = pd.DataFrame({‘rank_bin’: [1, 1, 2], … ‘responsive’: [True, False, True], … ‘random’: [0.5, 0.5, 0.5]}) compute_rank_response(df)

Returns a DataFrame with rank-based statistics and binomial

test results.

Source code in callingcardstools/Analysis/yeast/rank_response.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
def compute_rank_response(df: pd.DataFrame, **kwargs) -> pd.DataFrame:
    """
    Computes rank-based statistics and binomial test results for a DataFrame.

    This function groups the DataFrame by 'rank_bin' and aggregates it to
    calculate the number of responsive items in each rank bin, as well as
    various statistics related to a binomial test.  It calculates the
    cumulative number of successes, response ratio, p-value, and confidence
    intervals for each rank bin.

    Args:
        df (pd.DataFrame): DataFrame containing the columns 'rank_bin',
            'responsive', and 'random'. 'rank_bin' is an integer representing
            the rank bin, 'responsive' is a boolean indicating responsiveness,
            and 'random' is a float representing the random expectation.
        Additional keyword arguments: Additional keyword arguments are passed
            to the binomtest function, including arguments to the
            proportional_ci method of the BinomTestResults object (see scipy
            documentation for details)

    Returns:
        pd.DataFrame: A DataFrame indexed by 'rank_bin' with columns for the
            number of responsive items in each bin ('n_responsive_in_rank'),
            cumulative number of successes ('n_successes'), response ratio
            ('response_ratio'), p-value ('p_value'), and confidence interval
            bounds ('ci_lower' and 'ci_upper').

    Example:
        >>> df = pd.DataFrame({'rank_bin': [1, 1, 2],
        ...                    'responsive': [True, False, True],
        ...                    'random': [0.5, 0.5, 0.5]})
        >>> compute_rank_response(df)
        # Returns a DataFrame with rank-based statistics and binomial
        # test results.
    """
    rank_response_df = (
        df.groupby("rank_bin")
        .agg(
            n_responsive_in_rank=pd.NamedAgg(column="responsive", aggfunc="sum"),
            random=pd.NamedAgg(column="random", aggfunc="first"),
        )
        .reset_index()
    )

    rank_response_df["n_successes"] = rank_response_df["n_responsive_in_rank"].cumsum()

    # Binomial Test and Confidence Interval
    rank_response_df[["response_ratio", "pvalue", "ci_lower", "ci_upper"]] = (
        rank_response_df.apply(
            lambda row: parse_binomtest_results(
                binomtest(
                    int(row["n_successes"]),
                    int(row.rank_bin),
                    float(row["random"]),
                    alternative=kwargs.get("alternative", "two-sided"),
                ),
                **kwargs,
            ),
            axis=1,
            result_type="expand",
        )
    )

    return rank_response_df