Skip to content

poisson_pval_vectorized

Compute the Poisson p-value for the given hops counts.

:param total_background_hops: a pandas Series (column of a dataframe) of total number of hops in the background. :type total_background_hops: Series[int64] :param total_experiment_hops: a pandas Series (column of a dataframe) of total number of hops in the experiment. :type total_experiment_hops: Series[int64] :param background_hops: a pandas Series (column of a dataframe) of number of hops in the background by promoter region. :type background_hops: Series[int64] :param experiment_hops: a pandas Series (column of a dataframe) of number of hops in the experiment by promoter region. :type experiment_hops: Series[int64] :param pseudocount: , defaults to 1.0 :type pseudocount: float, optional :return: a pandas Series of length equal to the input Series with the Poisson p-value for each row. :rtype: Series[float]

.. note:: This function is vectorized, so it can be applied to pandas Series (columns of dataframes) to compute the Poisson p-value for each row.

:raises ValueError: If any of the input Series contain negative values or the input Series are not all the same length.

:Example:

import pandas as pd total_background_hops = pd.Series([100, 200, 300]) total_experiment_hops = pd.Series([10, 20, 30]) background_hops = pd.Series([5, 10, 15]) experiment_hops = pd.Series([2, 4, 6]) vectorized_poisson_pval( … total_background_hops, … total_experiment_hops, … background_hops, … experiment_hops) array([0.01438768, 0.00365985, 0.00092599])

Source code in callingcardstools/PeakCalling/yeast/poisson_pval_vectorized.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def poisson_pval_vectorized(
    total_background_hops: Series,
    total_experiment_hops: Series,
    background_hops: Series,
    experiment_hops: Series,
    pseudocount: float = 0.1,
    **kwargs
) -> Series:
    """
    Compute the Poisson p-value for the given hops counts.

    :param total_background_hops: a pandas Series (column of a dataframe)
        of total number of hops in the background.
    :type total_background_hops: Series[int64]
    :param total_experiment_hops: a pandas Series (column of a dataframe)
        of total number of hops in the experiment.
    :type total_experiment_hops: Series[int64]
    :param background_hops: a pandas Series (column of a dataframe)
        of number of hops in the background by promoter region.
    :type background_hops: Series[int64]
    :param experiment_hops: a pandas Series (column of a dataframe)
        of number of hops in the experiment by promoter region.
    :type experiment_hops: Series[int64]
    :param pseudocount: , defaults to 1.0
    :type pseudocount: float, optional
    :return: a pandas Series of length equal to the input Series with the
        Poisson p-value for each row.
    :rtype: Series[float]

    .. note:: This function is vectorized, so it can be applied to
        pandas Series (columns of dataframes) to compute the
        Poisson p-value for each row.

    :raises ValueError: If any of the input Series contain negative values or
        the input Series are not all the same length.

    :Example:

    >>> import pandas as pd
    >>> total_background_hops = pd.Series([100, 200, 300])
    >>> total_experiment_hops = pd.Series([10, 20, 30])
    >>> background_hops = pd.Series([5, 10, 15])
    >>> experiment_hops = pd.Series([2, 4, 6])
    >>> vectorized_poisson_pval(
    ...     total_background_hops,
    ...     total_experiment_hops,
    ...     background_hops,
    ...     experiment_hops)
    array([0.01438768, 0.00365985, 0.00092599])
    """
    # check input
    if (
        not len(total_background_hops)
        == len(total_experiment_hops)
        == len(background_hops)
        == len(experiment_hops)
    ):
        raise ValueError("All input Series must be the same length.")
    if total_background_hops.min() < 0 or total_background_hops.dtype != "int64":
        raise ValueError(("total_background_hops must " "be a non-negative integer."))
    if total_experiment_hops.min() < 0 or total_background_hops.dtype != "int64":
        raise ValueError(("total_experiment_hops must " "be a non-negative integer"))

    # cast to `float` b/c of scipy

    # note that read_in_background_data and read_in_experiment_data in
    # read_in_data.py require that there be at least 1 hop in both the background
    # and the experiment. Therefore the total_background_hops and total_experiment_hops
    # is always defined
    hop_ratio = (total_experiment_hops / total_background_hops).astype("float")

    # It is possible that there are promoters with no background hops. Add a small
    # pseudocount to require that mu > 0, per poisson definition
    mu = ((background_hops + pseudocount) * hop_ratio).astype("float")

    # there has been a pseudocount added to experiment hops. Not necessary, removed
    # 20240624
    # The way this is calculated, with pyranges and a sum, this value will always be
    # at minimum 0
    x = experiment_hops.astype("float")

    # return the p-value
    return Series(1 - poisson.cdf(x, mu))