read_in_promoter_data
Read in the promoter data. The promoter data should be a tsv with extension
.bed
or .bed.gz
and should have the following columns:
chr
start
end
name
score
strand
. The chr
column is
refactored from the curr_chr_name_convention
to the
new_chr_name_convention
using the chrmap_df
.
:param promoter_data_path: Path to the promoter bed file (plus colnames)
:type promoter_data_path: str
:param curr_chr_name_convention: The current chromosome name convention.
:type curr_chr_name_convention: str
:param new_chr_name_convention: The new chromosome name convention.
:type new_chr_name_convention: str
:param chrmap_df: The chrmap dataframe.
:type chrmap_df: pd.DataFrame
:return: The promoter data as a dataframe with the chr
column
refactored to the new_chr_name_convention
:rtype: pd.DataFrame
:raises ValueError: If the promoter_data_path
does not exist or
is not a file; if the column headers exist but do not match expectation
or if the datatypes do not match expectation.
:Example:
import pandas as pd import os import tempfile tmp_bed = tempfile.NamedTemporaryFile(suffix=’.bed’).name with open(tmp_bed, ‘w’) as f: … _ = f.write(‘chr\tstart\tend\tname\tscore\tstrand\n’) … _ = f.write(‘chr1\t1\t2\ttest\t1\t+\n’) chrmap_df = pd.DataFrame({‘curr_chr_name_convention’: … [‘chr1’, ‘chr2’, ‘chr3’], … ‘new_chr_name_convention’: … [‘chrI’, ‘chrII’, ‘chrIII’]}) promoter_df = read_in_promoter_data( … tmp_bed, … ‘curr_chr_name_convention’, … ‘new_chr_name_convention’, … chrmap_df) list(promoter_df.columns) == [‘chr’, ‘start’, ‘end’, ‘name’, … ‘score’, ‘strand’] True len(promoter_df) == 1 True
Source code in callingcardstools/PeakCalling/yeast/read_in_data.py
335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 |
|