read_in_chipexo_data

Read in the data from the chipexo file. This is data parsed from yeastepigenome.org. see yeastepigenome.org and https://github.com/cmatKhan/parsing_yeast_database_data

Parameters:

Name	Type	Description	Default
`chipexo_allevents_data_path`	`str`	path to the chipexo data	required
`chipexo_orig_chr_convention`	`str`	chromosome convention of the chipexo allevents file	required
`unified_chr_convention`	`str`	chromosome convention to convert to	required

Returns:

Type	Description
`DataFrame`	pandas.DataFrame: A pandas DataFrame containing the chipexo allevents data

Raises:

Type	Description
`AttributeError`	If the chipexo table does not contain at least the following columns: `chr`, `start`, `end`, `YPD_log2Fold`, `YPD_log2P`. Note that the `start` column is the original `coord` column from the yeastepigenome.org data and `end` is simply `coord` + 1. It is in this format to make it somewhat easier to input to other processes that accept bed-like files.

Source code in callingcardstools/Analysis/yeast/chipexo_promoter_sig.py

def read_in_chipexo_data(
        chipexo_data_path: str,
        curr_chr_convention: str,
        new_chr_convention: str,
        chrmap_df: pd.DataFrame) -> pd.DataFrame:
    """
    Read in the data from the chipexo file. This is data parsed from
    yeastepigenome.org. see yeastepigenome.org and
    https://github.com/cmatKhan/parsing_yeast_database_data 

    Args:
            chipexo_allevents_data_path (str): path to the chipexo data
            chipexo_orig_chr_convention (str): chromosome convention of the
                chipexo allevents file
            unified_chr_convention (str): chromosome convention to convert to

    Returns:
            pandas.DataFrame: A pandas DataFrame containing the chipexo
                allevents data

    Raises:
            AttributeError: If the chipexo table does not contain at least the
                following columns: `chr`, `start`, `end`, `YPD_log2Fold`,
                `YPD_log2P`. Note that the `start` column is the original
                `coord` column from the yeastepigenome.org data and `end` is
                simply `coord` + 1. It is in this format to make it somewhat
                easier to input to other processes that accept bed-like files.
    """
    df = pd.read_csv(chipexo_data_path,
                     header=0,
                     index_col=False)

    if not {'chr', 'start', 'end',
            'YPD_log2Fold', 'YPD_log2P'}.issubset(df.columns):
        raise AttributeError('The chipexo table must contain at least the '
                             'following columns: `chr`, `start`, `end`, '
                             '`YPD_log2Fold`, `YPD_log2P`. Note that the '
                             '`start` column is the original `coord` column '
                             'from the yeastepigenome.org data and `end` '
                             'is simply `coord` + 1. It is in this format '
                             'to make it somewhat easier to input to other '
                             'processes that accept bed-like files.')

    df.rename(columns={'start': 'chipexo_start',
                       'end': 'chipexo_end'},
              inplace=True)

    return relabel_chr_column(df,
                              chrmap_df,
                              curr_chr_convention,
                              new_chr_convention)