Skip to content

combine_data

Read in multiple data files and combine the effect and pvalue columns using specified functions (defaults are additive mean for effect and log mean for pvalue).

Parameters:

Name Type Description Default
data_paths List[str]

List of data file paths

required
identifier_col str

Name of the feature identifier column

required
effect_col str

Name of the effect column

required
pval_col str

Name of the pvalue column

required
source str

Source of the data

required
data_type str

Type of data, either ‘binding’ or ‘expression’

required
combine_effect_func Callable

Function to combine effect columns

mean
combine_pval_func Callable

Function to combine pvalue columns

combine_pvals_detect_logged

Returns:

Type Description
DataFrame

pd.DataFrame: Combined dataframe with averaged effect and pvalue

Source code in callingcardstools/Analysis/yeast/read_in_data.py
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
def combine_data(
    data_paths: List[str],
    identifier_col: str,
    effect_col: str,
    pval_col: str,
    source: str,
    data_type: Literal["binding", "expression"],
    combine_effect_func: Callable[[pd.Series], float] = np.mean,
    combine_pval_func: Callable[[pd.Series], float] = combine_pvals_detect_logged,
) -> pd.DataFrame:
    """
    Read in multiple data files and combine the effect and pvalue columns
    using specified functions (defaults are additive mean for effect and
    log mean for pvalue).

    Args:
        data_paths (List[str]): List of data file paths
        identifier_col (str): Name of the feature identifier column
        effect_col (str): Name of the effect column
        pval_col (str): Name of the pvalue column
        source (str): Source of the data
        data_type (str): Type of data, either 'binding' or 'expression'
        combine_effect_func (Callable): Function to combine effect columns
        combine_pval_func (Callable): Function to combine pvalue columns

    Returns:
        pd.DataFrame: Combined dataframe with averaged effect and pvalue
    """
    logger.info("combining data for data type {data_type} from {data_paths}")
    all_dfs = []

    for data_path in data_paths:
        df = read_in_data(
            data_path=data_path,
            identifier_col=identifier_col,
            effect_col=effect_col,
            pval_col=pval_col,
            source=source,
            data_type=data_type,
        )
        all_dfs.append(df)

    combined_df = (
        pd.concat(all_dfs)
        .groupby(["feature", f"{data_type}_source"])
        .agg(
            {
                f"{data_type}_effect": combine_effect_func,
                f"{data_type}_pvalue": combine_pval_func,
            }
        )
        .reset_index()
    )

    return combined_df