read_in_experiment_data
Read in experiment (hops) data from a qbed file. The qbed file may be
plain text or gzipped and may or may not have column headers. If the
column headers are present, they must be in the following order:
chr
, start
, end
, strand
, depth
. If the column headers are
not present, the columns must be in same order and number. Datatypes
are checked but will not be coerced – errors are raised if they do not
match the expected datatypes. the chr
column is relabeled from the
curr_chr_name_convention
to the new_chr_name_convention
using the
chrmap_df
.
Additional keyword arguments
- genomic_only (bool): Whether to return only records with type == “genomic”.
See
relabel_chr_column
for more information. Defaults to True.
:param experiment_data_path: Path to the qbed file, plain text or gzipped,
with or without column headers
:type experiment_data_path: str
:param curr_chr_name_convention: The current chromosome name convention.
:type curr_chr_name_convention: str
:param new_chr_name_convention: The new chromosome name convention.
:type new_chr_name_convention: str
:param chrmap_df: The chrmap dataframe.
:type chrmap_df: pd.DataFrame
:param deduplicate: Whether to deduplicate the experiment data based on
chr
, start
, end
such that if an insertion occurs at the same
coordinate but on opposite strands, only one record is retained.
:type deduplicate: bool
:return: The experiment data as a dataframe with the chr
column
refactored to the new_chr_name_convention
:rtype: pd.DataFrame
:raises ValueError: If the experiment_data_path
does not exist or
is not a file; if the column headers exist but do not match expectation
or if the datatypes do not match expectation.
:Example:
import pandas as pd import os import tempfile tmp_qbed = tempfile.NamedTemporaryFile(suffix=’.qbed’).name with open(tmp_qbed, ‘w’) as f: … _ = f.write(‘chr\tstart\tend\tstrand\tdepth\n’) … _ = f.write(‘chr1\t1\t2\t+\t1\n’)
create a temporary chrmap file¶
tmp_chrmap = tempfile.NamedTemporaryFile(suffix=’.csv’).name chrmap_df = pd.DataFrame({‘curr_chr_name_convention’: … [‘chr1’, ‘chr2’, ‘chr3’], … ‘new_chr_name_convention’: … [‘chrI’, ‘chrII’, ‘chrIII’]})
read in the data¶
experiment_df, experiment_total_hops = read_in_experiment_data( … tmp_qbed, … ‘curr_chr_name_convention’, … ‘new_chr_name_convention’, … chrmap_df) list(experiment_df.columns) == [‘chr’, ‘start’, ‘end’, ‘depth’, … ‘strand’] True experiment_total_hops 1
Source code in callingcardstools/PeakCalling/yeast/read_in_data.py
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 |
|