This is an R package which provides access to the code which I use to process the LLFS RNAseq data.
Release data will now have a version, eg 1.0.0
which
refers to the version of this software. When a new release is made, the
major version number will increment. For example, if
the current version is 1.0.0
, then the release data will be
called share_llfs_data_v1.0.0
. On the next release, I will
increase the version to v2.0.0
, make a new software
release, and the shared llfs data will be
share_llfs_data_v2.0.0
.
If there are any corrections to the code which affect the
data, but no new data is released, then I will increment the
minor version number. For example, v1.1.0
would
mean that there is no new data from v1.0.0
, but there was a
correction to code which affects the released data. There would then be
a data release called share_llfs_data_v1.1.0
.
Finally, I will increment the patch version number if I make
a change to the code which does not affect the data. For example if I
just clean up some code in the vignettes or clarify some documentation.
This would look like v1.0.1
, and there will be no new data
release associated with the change in the change in the patch
number.
Any change in the software will be described in detail in the CHANGELOG
** No LLFS Data, or anything which could be used to generate LLFS Data or IDs, are shared in this repository. This is code only **
To get the data, talk to your lab and contact your points of contact for the LLFS data administration. You do not need the code to use the data, but if you are curious about how the data was processed from fastq to release, then it is here.
Briefly, the steps are:
The llfs_rnaseq_database.sqlite is now distributed along with the released LLFS RNAseq data. It is intended to be a formally normalized database that allows me (and you, if you want to use it) to re-name samples without losing any information about what the original label was, or lose any connection to the metrics which were generated under the original label. This database also serves as a single collection of information related to these samples, such as the “phantom sample” control sample information. I suggest interacting with the database using any one (or more) of the following:
You may simply be interested in how this data is processed. In which case, looking through the Articles (see the navbar) might be of interest. You may want to process data yourself, or wish to run my code. In that case, you should install the package onto your machine. I suggest creating an R project to do this, but it is not necessary.
# note, decrease the number of CPUs as appropriate for your system
# this is the official, or as official as anything in R gets, R development
# toolset. It installs a lot of dependencies, and can take some time.
install.packages('usethis', Ncpus=10)
# you can name this whatever you like. If you enter it like this, R will create
# a project directory in your current working directory. If you would like to
# put the project directory somewhere else, provide a relative or absolute path
usethis::create_project('llfs_rnaseq_dataprocessing')
# at this point, R will launch a new session which is in the package. The only
# thing this means is that it is a dedicated directory with a .Rproj file. It
# is more or less the same as just making a directory and putting all the stuff
# related to whatever the 'project' is there.
# this installs the llfsRnaseq R package
remotes::install_github('https://github.com/cmatKhan/llfsRnaseq')
One thing you might want to do is open one of the vignettes. You can see the available vignettes, after installing, with:
vignette(package='llfsRnaseq')
You can pick and choose which you’re interested and copy/paste the code into your own script or notebook. You can also visit the github page and copy/paste the entire notebook if you wish (they are in the vignettes/ directory). You will also have access to the functions described in the Documentation’s API section.
In this case, just git clone
the repo. The difference
between a “package” and a “project” is minimal – a “package” more or
less just has a DESCRIPTION
file which allows the
remotes::install_github()
command above to work. You can
treat this like a project directory and run the the notebooks in the
vignette, and source
the code in the R
directory.