Compute the Poisson p-value for the given hops counts.
:param total_background_hops: a pandas Series (column of a dataframe)
of total number of hops in the background.
:type total_background_hops: Series[int64]
:param total_experiment_hops: a pandas Series (column of a dataframe)
of total number of hops in the experiment.
:type total_experiment_hops: Series[int64]
:param background_hops: a pandas Series (column of a dataframe)
of number of hops in the background by promoter region.
:type background_hops: Series[int64]
:param experiment_hops: a pandas Series (column of a dataframe)
of number of hops in the experiment by promoter region.
:type experiment_hops: Series[int64]
:param pseudocount: , defaults to 1e-10
:type pseudocount: float, optional
:return: a pandas Series of length equal to the input Series with the
Poisson p-value for each row.
:rtype: Series[float]
.. note:: This function is vectorized, so it can be applied to
pandas Series (columns of dataframes) to compute the
Poisson p-value for each row.
:raises ValueError: If any of the input Series contain negative values or
the input Series are not all the same length.
:Example:
import pandas as pd
total_background_hops = pd.Series([100, 200, 300])
total_experiment_hops = pd.Series([10, 20, 30])
background_hops = pd.Series([5, 10, 15])
experiment_hops = pd.Series([2, 4, 6])
vectorized_poisson_pval(
… total_background_hops,
… total_experiment_hops,
… background_hops,
… experiment_hops)
array([0.01438768, 0.00365985, 0.00092599])
Source code in callingcardstools/PeakCalling/yeast/poisson_pval_vectorized.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77 | def poisson_pval_vectorized(total_background_hops: Series,
total_experiment_hops: Series,
background_hops: Series,
experiment_hops: Series,
pseudocount: float = 1e-10) -> Series:
"""
Compute the Poisson p-value for the given hops counts.
:param total_background_hops: a pandas Series (column of a dataframe)
of total number of hops in the background.
:type total_background_hops: Series[int64]
:param total_experiment_hops: a pandas Series (column of a dataframe)
of total number of hops in the experiment.
:type total_experiment_hops: Series[int64]
:param background_hops: a pandas Series (column of a dataframe)
of number of hops in the background by promoter region.
:type background_hops: Series[int64]
:param experiment_hops: a pandas Series (column of a dataframe)
of number of hops in the experiment by promoter region.
:type experiment_hops: Series[int64]
:param pseudocount: , defaults to 1e-10
:type pseudocount: float, optional
:return: a pandas Series of length equal to the input Series with the
Poisson p-value for each row.
:rtype: Series[float]
.. note:: This function is vectorized, so it can be applied to
pandas Series (columns of dataframes) to compute the
Poisson p-value for each row.
:raises ValueError: If any of the input Series contain negative values or
the input Series are not all the same length.
:Example:
>>> import pandas as pd
>>> total_background_hops = pd.Series([100, 200, 300])
>>> total_experiment_hops = pd.Series([10, 20, 30])
>>> background_hops = pd.Series([5, 10, 15])
>>> experiment_hops = pd.Series([2, 4, 6])
>>> vectorized_poisson_pval(
... total_background_hops,
... total_experiment_hops,
... background_hops,
... experiment_hops)
array([0.01438768, 0.00365985, 0.00092599])
"""
# check input
if not len(total_background_hops) == len(total_experiment_hops) == \
len(background_hops) == len(experiment_hops):
raise ValueError('All input Series must be the same length.')
if total_background_hops.min() < 0 \
or total_background_hops.dtype != 'int64':
raise ValueError(('total_background_hops must '
'be a non-negative integer.'))
if total_experiment_hops.min() < 0 \
or total_background_hops.dtype != 'int64':
raise ValueError(('total_experiment_hops must '
'be a non-negative integer'))
# cast to `float` b/c of scipy
hop_ratio = (total_experiment_hops
/ (total_background_hops + pseudocount)).astype('float')
mu = ((background_hops * hop_ratio)
+ pseudocount).astype('float')
x = (experiment_hops + pseudocount).astype('float')
return 1 - poisson.cdf(x, mu)
|