Skip to content

read_in_chipexo_data

Read in the data from the chipexo file. This is data parsed from yeastepigenome.org. see yeastepigenome.org and https://github.com/cmatKhan/parsing_yeast_database_data

Parameters:

Name Type Description Default
chipexo_allevents_data_path str

path to the chipexo data

required
chipexo_orig_chr_convention str

chromosome convention of the chipexo allevents file

required
unified_chr_convention str

chromosome convention to convert to

required

Returns:

Type Description
DataFrame

pandas.DataFrame: A pandas DataFrame containing the chipexo allevents data

Raises:

Type Description
AttributeError

If the chipexo table does not contain at least the following columns: chr, start, end, YPD_log2Fold, YPD_log2P. Note that the start column is the original coord column from the yeastepigenome.org data and end is simply coord + 1. It is in this format to make it somewhat easier to input to other processes that accept bed-like files.

Source code in callingcardstools/Analysis/yeast/chipexo_promoter_sig.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
def read_in_chipexo_data(
        chipexo_data_path: str,
        curr_chr_convention: str,
        new_chr_convention: str,
        chrmap_df: pd.DataFrame) -> pd.DataFrame:
    """
    Read in the data from the chipexo file. This is data parsed from
    yeastepigenome.org. see yeastepigenome.org and
    https://github.com/cmatKhan/parsing_yeast_database_data 

    Args:
            chipexo_allevents_data_path (str): path to the chipexo data
            chipexo_orig_chr_convention (str): chromosome convention of the
                chipexo allevents file
            unified_chr_convention (str): chromosome convention to convert to

    Returns:
            pandas.DataFrame: A pandas DataFrame containing the chipexo
                allevents data

    Raises:
            AttributeError: If the chipexo table does not contain at least the
                following columns: `chr`, `start`, `end`, `YPD_log2Fold`,
                `YPD_log2P`. Note that the `start` column is the original
                `coord` column from the yeastepigenome.org data and `end` is
                simply `coord` + 1. It is in this format to make it somewhat
                easier to input to other processes that accept bed-like files.
    """
    df = pd.read_csv(chipexo_data_path,
                     header=0,
                     index_col=False)

    if not {'chr', 'start', 'end',
            'YPD_log2Fold', 'YPD_log2P'}.issubset(df.columns):
        raise AttributeError('The chipexo table must contain at least the '
                             'following columns: `chr`, `start`, `end`, '
                             '`YPD_log2Fold`, `YPD_log2P`. Note that the '
                             '`start` column is the original `coord` column '
                             'from the yeastepigenome.org data and `end` '
                             'is simply `coord` + 1. It is in this format '
                             'to make it somewhat easier to input to other '
                             'processes that accept bed-like files.')

    df.rename(columns={'start': 'chipexo_start',
                       'end': 'chipexo_end'},
              inplace=True)

    return relabel_chr_column(df,
                              chrmap_df,
                              curr_chr_convention,
                              new_chr_convention)