Read in the data from the chipexo file. This is data parsed from
yeastepigenome.org. see yeastepigenome.org and
https://github.com/cmatKhan/parsing_yeast_database_data
Parameters:
Name |
Type |
Description |
Default |
chipexo_allevents_data_path |
str
|
|
required
|
chipexo_orig_chr_convention |
str
|
chromosome convention of the
chipexo allevents file
|
required
|
unified_chr_convention |
str
|
chromosome convention to convert to
|
required
|
Returns:
Type |
Description |
DataFrame
|
pandas.DataFrame: A pandas DataFrame containing the chipexo
allevents data
|
Raises:
Type |
Description |
AttributeError
|
If the chipexo table does not contain at least the
following columns: chr , start , end , YPD_log2Fold ,
YPD_log2P . Note that the start column is the original
coord column from the yeastepigenome.org data and end is
simply coord + 1. It is in this format to make it somewhat
easier to input to other processes that accept bed-like files.
|
Source code in callingcardstools/Analysis/yeast/chipexo_promoter_sig.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64 | def read_in_chipexo_data(
chipexo_data_path: str,
curr_chr_convention: str,
new_chr_convention: str,
chrmap_df: pd.DataFrame) -> pd.DataFrame:
"""
Read in the data from the chipexo file. This is data parsed from
yeastepigenome.org. see yeastepigenome.org and
https://github.com/cmatKhan/parsing_yeast_database_data
Args:
chipexo_allevents_data_path (str): path to the chipexo data
chipexo_orig_chr_convention (str): chromosome convention of the
chipexo allevents file
unified_chr_convention (str): chromosome convention to convert to
Returns:
pandas.DataFrame: A pandas DataFrame containing the chipexo
allevents data
Raises:
AttributeError: If the chipexo table does not contain at least the
following columns: `chr`, `start`, `end`, `YPD_log2Fold`,
`YPD_log2P`. Note that the `start` column is the original
`coord` column from the yeastepigenome.org data and `end` is
simply `coord` + 1. It is in this format to make it somewhat
easier to input to other processes that accept bed-like files.
"""
df = pd.read_csv(chipexo_data_path,
header=0,
index_col=False)
if not {'chr', 'start', 'end',
'YPD_log2Fold', 'YPD_log2P'}.issubset(df.columns):
raise AttributeError('The chipexo table must contain at least the '
'following columns: `chr`, `start`, `end`, '
'`YPD_log2Fold`, `YPD_log2P`. Note that the '
'`start` column is the original `coord` column '
'from the yeastepigenome.org data and `end` '
'is simply `coord` + 1. It is in this format '
'to make it somewhat easier to input to other '
'processes that accept bed-like files.')
df.rename(columns={'start': 'chipexo_start',
'end': 'chipexo_end'},
inplace=True)
return relabel_chr_column(df,
chrmap_df,
curr_chr_convention,
new_chr_convention)
|