Skip to content

enrichment_vectorized

Compute the Calling Cards effect (enrichment) for the given hops counts.

:param total_background_hops: a pandas Series (column of a dataframe) of total number of hops in the background. :type total_background_hops: Series :param total_experiment_hops: a pandas Series (column of a dataframe) of total number of hops in the experiment. :type total_experiment_hops: Series :param background_hops: a pandas Series (column of a dataframe) of number of hops in the background by promoter region. :type background_hops: Series :param experiment_hops: a pandas Series (column of a dataframe) of number of hops in the experiment by promoter region. :type experiment_hops: Series :param pseudocount: Added to the background hops to avoid division by zero, :type pseudocount: float, optional :param kwargs: Additional keyword arguments. None are currently used

:return: a pandas Series of length equal to the input Series with the Calling Cards effect (enrichment) value for each row. :rtype: Series

Source code in callingcardstools/PeakCalling/yeast/enrichment_vectorized.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def enrichment_vectorized(
    total_background_hops: Series,
    total_experiment_hops: Series,
    background_hops: Series,
    experiment_hops: Series,
    pseudocount: float = 0.1,
    **kwargs
) -> Series:
    """
    Compute the Calling Cards effect (enrichment) for the given hops counts.

    :param total_background_hops: a pandas Series (column of a dataframe)
        of total number of hops in the background.
    :type total_background_hops: Series
    :param total_experiment_hops: a pandas Series (column of a dataframe)
        of total number of hops in the experiment.
    :type total_experiment_hops: Series
    :param background_hops: a pandas Series (column of a dataframe)
        of number of hops in the background by promoter region.
    :type background_hops: Series
    :param experiment_hops: a pandas Series (column of a dataframe)
        of number of hops in the experiment by promoter region.
    :type experiment_hops: Series
    :param pseudocount: Added to the background hops to avoid division by zero,
    :type pseudocount: float, optional
    :param kwargs: Additional keyword arguments. None are currently used

    :return: a pandas Series of length equal to the input Series with the
        Calling Cards effect (enrichment) value for each row.
    :rtype: Series
    """
    # raise an error if any one of the 4 input Series is not a Series
    if not all(
        isinstance(x, Series)
        for x in [
            total_background_hops,
            total_experiment_hops,
            background_hops,
            experiment_hops,
        ]
    ):
        raise ValueError(
            "`total_background_hops`, `total_experiment_hops`, ",
            "`background_hops` and `experiment_hops` must all ",
            "be pandas Series. At least one is not.",
        )
    # validate that all input Series are the same length
    if (
        not len(total_background_hops)
        == len(total_experiment_hops)
        == len(background_hops)
        == len(experiment_hops)
    ):
        raise ValueError("All input Series must be the same length.")

    # validate that pseudocount is numeric (int or float). Cast to float if int
    if not isinstance(pseudocount, (int, float)):
        raise ValueError("pseudocount must be a number.")
    if isinstance(pseudocount, int):
        logger.warning("pseudocount is an integer. It will be cast to a float.")
        pseudocount = float(pseudocount)

    # NOTE: the total_experiment_hops and total_background_hops must be > 0 based on
    # input data verification. See read_in_experiment_data()
    # and read_in_background_data() in read_in_data.py
    numerator = experiment_hops / total_experiment_hops

    # Add a small pseudocount to background_hops to avoid division by zero in the
    # enrichment calculation below
    # Consider a `min` where the minimum value is 0.1/total_background_hops
    denominator = (background_hops + pseudocount) / total_background_hops

    enrichment = numerator / denominator

    # Check for invalid values
    if (enrichment < 0).any():
        raise ValueError("Enrichment values must be non-negative.")
    if enrichment.isnull().any():
        raise ValueError("Enrichment values must not be NaN.")
    if np.isinf(enrichment).any():
        raise ValueError("Enrichment values must not be infinite.")

    return enrichment