read_in_background_data
Read in background (hops) data from a qbed file. The qbed file may be
plain text or gzipped and may or may not have column headers. If the
column headers are present, they must be in the following order:
chr
, start
, end
, strand
, depth
. If the column headers are
not present, the columns must be in same order and number. Datatypes
are checked but will not be coerced – errors are raised if they do not
match the expected datatypes. the chr
column is relabeled from the
curr_chr_name_convention
to the new_chr_name_convention
using the
chrmap_df
. NOTE: unlike the experiment df, there is no option to deduplicate
as the background file is expected to be the combination of multiple experiments
at this point.
Additional keyword arguments
- genomic_only (bool): Whether to return only records with type == “genomic”.
See
relabel_chr_column
for more information. Defaults to True.
:param background_data_path: Path to the background data qbed file, plain text or gzipped, with or without column headers :type background_data_path: str :param curr_chr_name_convention: The current chromosome name convention :type curr_chr_name_convention: str :param new_chr_name_convention: The new chromosome name convention :type new_chr_name_convention: str :param chrmap_df: The chrmap dataframe :type chrmap_df: pd.DataFrame :return: The background data. :rtype: pd.DataFrame
:raises ValueError: If the background_data_path
does not exist or
is not a file; if the column headers exist but do not match expectation
or if the datatypes do not match expectation.
:Example:
import pandas as pd import os import tempfile tmp_qbed = tempfile.NamedTemporaryFile(suffix=’.qbed’).name with open(tmp_qbed, ‘w’) as f: … _ = f.write(‘chr\tstart\tend\tstrand\tdepth\n’) … _ = f.write(‘chr1\t1\t2\t+\t1\n’)
create a temporary chrmap file¶
chrmap_df = pd.DataFrame({‘curr_chr_name_convention’: … [‘chr1’, ‘chr2’, ‘chr3’], … ‘new_chr_name_convention’: … [‘chrI’, ‘chrII’, ‘chrIII’]})
read in the data¶
background_df, background_total_hops = read_in_background_data( … tmp_qbed, … ‘curr_chr_name_convention’, … ‘new_chr_name_convention’, … chrmap_df) list(background_df.columns) == [‘chr’, ‘start’, ‘end’, ‘depth’, … ‘strand’] True background_total_hops 1
Source code in callingcardstools/PeakCalling/yeast/read_in_data.py
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 |
|