Compute the Calling Cards effect (enrichment) for the given hops counts.
:param total_background_hops: a pandas Series (column of a dataframe)
of total number of hops in the background.
:type total_background_hops: Series
:param total_experiment_hops: a pandas Series (column of a dataframe)
of total number of hops in the experiment.
:type total_experiment_hops: Series
:param background_hops: a pandas Series (column of a dataframe)
of number of hops in the background by promoter region.
:type background_hops: Series
:param experiment_hops: a pandas Series (column of a dataframe)
of number of hops in the experiment by promoter region.
:type experiment_hops: Series
:param pseudocount: Added to the background hops to avoid division by zero,
:type pseudocount: float, optional
:param kwargs: Additional keyword arguments. None are currently used
:return: a pandas Series of length equal to the input Series with the
Calling Cards effect (enrichment) value for each row.
:rtype: Series
Source code in callingcardstools/PeakCalling/yeast/enrichment_vectorized.py
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91 | def enrichment_vectorized(
total_background_hops: Series,
total_experiment_hops: Series,
background_hops: Series,
experiment_hops: Series,
pseudocount: float = 0.1,
**kwargs
) -> Series:
"""
Compute the Calling Cards effect (enrichment) for the given hops counts.
:param total_background_hops: a pandas Series (column of a dataframe)
of total number of hops in the background.
:type total_background_hops: Series
:param total_experiment_hops: a pandas Series (column of a dataframe)
of total number of hops in the experiment.
:type total_experiment_hops: Series
:param background_hops: a pandas Series (column of a dataframe)
of number of hops in the background by promoter region.
:type background_hops: Series
:param experiment_hops: a pandas Series (column of a dataframe)
of number of hops in the experiment by promoter region.
:type experiment_hops: Series
:param pseudocount: Added to the background hops to avoid division by zero,
:type pseudocount: float, optional
:param kwargs: Additional keyword arguments. None are currently used
:return: a pandas Series of length equal to the input Series with the
Calling Cards effect (enrichment) value for each row.
:rtype: Series
"""
# raise an error if any one of the 4 input Series is not a Series
if not all(
isinstance(x, Series)
for x in [
total_background_hops,
total_experiment_hops,
background_hops,
experiment_hops,
]
):
raise ValueError(
"`total_background_hops`, `total_experiment_hops`, ",
"`background_hops` and `experiment_hops` must all ",
"be pandas Series. At least one is not.",
)
# validate that all input Series are the same length
if (
not len(total_background_hops)
== len(total_experiment_hops)
== len(background_hops)
== len(experiment_hops)
):
raise ValueError("All input Series must be the same length.")
# validate that pseudocount is numeric (int or float). Cast to float if int
if not isinstance(pseudocount, (int, float)):
raise ValueError("pseudocount must be a number.")
if isinstance(pseudocount, int):
logger.warning("pseudocount is an integer. It will be cast to a float.")
pseudocount = float(pseudocount)
# NOTE: the total_experiment_hops and total_background_hops must be > 0 based on
# input data verification. See read_in_experiment_data()
# and read_in_background_data() in read_in_data.py
numerator = experiment_hops / total_experiment_hops
# Add a small pseudocount to background_hops to avoid division by zero in the
# enrichment calculation below
# Consider a `min` where the minimum value is 0.1/total_background_hops
denominator = (background_hops + pseudocount) / total_background_hops
enrichment = numerator / denominator
# Check for invalid values
if (enrichment < 0).any():
raise ValueError("Enrichment values must be non-negative.")
if enrichment.isnull().any():
raise ValueError("Enrichment values must not be NaN.")
if np.isinf(enrichment).any():
raise ValueError("Enrichment values must not be infinite.")
return enrichment
|