Thanks for taking the time to do this before the course!

This document means to be an extended version of our installation script located at configs/installation_scRNA.R. Feel free to just go ahead and run this.

If, at any time, you have Questions or Feedback, send us a message via Slack. There’s this channel where you can find us, #r-elated (account registration on Slack will be automatically approved when using your @ie-freiburg.mpg.de e-mail address!)

source("../configs/installation_scRNA.R")

In-depth Installation Instrucctions

There are two options, “skip” or “not to skip”. The latter, would be the preferred way for students to whom analyzing single-cell datasets is a core part of their research project(s).

Or not to skip

Having your own package installation is highly recommended since it enables you to keep updating the libraries to use the latest versions with all their enhancements and bug fixes. This is strategic if you know are going to have a single cell dataset of your own in the upcoming weeks or months.

If you choose this path, please make sure you allow for ~40 minutes to complete all the steps. If all goes well, and you’re working on our Workbench, it could be ~10 minutes. The exact time will depend on network performance, and the current state of your package library (e.g., previous old packages that were already installed).

Steps to Package Installation

Important Notes

  • Ensure you’re running R 4.2.3. The following procedure wasn’t tested with most recent versions.
  • Run each of the code blocks manually, and ensure there were no errors before moving forward.
  • Watch out for possible errors. It would be wise to keep an eye on the text output at all time. It shouldn’t be a surprise if you have a compilation error message in the middle of the whole text output. Be prepared to scroll the walls of text.
  • If asked to update packages, answer ‘none’. We’ll take care of package updates near the end.
  • If asked to compile packages, answer ‘no’.
  • All code blocks may be executed more than once, if some -but not all- packages were installed, there’s no increase in the total duration of this process.
  • If you see errors, re-run the code block again.

NOT SKIPPING BUT STILL ACCELERATING

So, you chose not to skip. You may still accelerate the package installation process by A LOT with the following shell command that will copy the same package library that was offered for skipping. We’ll put this in the default library location (libPaths().)

Run this in a Terminal inside the server (on RStudio IDE, you can open this using the ‘Tools’ menu).

cp -r /scratch/local/rseurat/pkg-lib-4.2.3 /rstudio/${USER}/rstudio/R/workbench-library/4.2

The package library you just copied over is a snapshot taken just before the course.

For safety, you should still run the code blocks. Only missing packages are really downloaded, compiled (sometimes), and installed.

Bioconductor

Our first installation is in regards to the core packages of Bioconductor, a repository of bioinformatic packages, just like CRAN (which also has bioinformatic packages). The difference is that instead of GNU Public License, Bioc packages are under a license that allows commercial usage, widening its user-base (e.g. including private hospitals).

if (!"BiocManager" %in% installed.packages()) install.packages("BiocManager")

With BiocManager, we can install Bioconductor 3.16. If you are using R 4.4 or higher, then you’d be looking for a newest release, see the official release announcements to find your matching version. If you really need it, you may try the latest R with bioconductor 3.18, but these lectures were only tested with the versions we are recommending (4.2.3 & 3.16).

BiocManager::install(version = "3.16")

You can ignore the following message (it will also show different version numbers and dates to you, but the important bit is ‘path not writeable’):

Bioconductor version 3.14 (BiocManager
  1.30.20), R 4.1.3 (2022-03-10)
Installation paths not writeable, unable to
  update packages
  path: /opt/R/4.1.3/lib/R/library
  packages:
    boot, class, cluster, codetools, foreign,
    MASS, Matrix, mgcv, nlme, nnet, rpart,
    spatial, survival

Core & Dependencies

The next code block defines a function, retrieve_namespaces(), that takes a character vector with package names, to be installed. If the execution of the whole function is not interrupted by an error, it will simply return TRUE. This is convenient, as it provides a checkpoint that we’ll be using soon…

retrieve_namespaces <- function(list_of_packages) {
  lapply(list_of_packages,
         function(x) {
           if (!x %in% installed.packages()) {
             suppressMessages(BiocManager::install(
               x, ask = FALSE, update = FALSE))
           }
         })
  TRUE
}

Once the function is defined, we can use it. We are going to install Seurat as well as many other tools and dependencies that we will need throughout this course. The whole process is going to take 10-15 minutes… and, most probably, more than just one single execution.

retrieve_namespaces(
  list_of_packages = c(
    # Core
    "remotes",
    "tidyverse",
    "future",
    "Seurat",
    "sctransform",
    # DE
    "metap",
    "multtest",
    "glmGamPoi",
    "DESeq2",
    "limma",
    "MAST",
    "enrichR",
    # Viz
    "RColorBrewer",
    "patchwork",
    "pheatmap"
  )
)

Sometimes, there are errors while installing packages in bulk that are easily solved by re-iterating the command. These error messages are difficult to track, since we get so much output from the ongoing process. Installing 10-20 packages may look as a simple activity, but that’s not the case when we consider all the dependencies among them.

Re-run the previous code block 2-3 times until its only output is: TRUE.

On workbench, after a couple of re-runs, we needed to remove locks manually, you may do so from your Terminal:

rm -rf /rstudio/${USER}/{.,}rstudio/R/workbench-library/**/00LOCK-*

It may be that re-running the retrieve_namespaces function only returns TRUE after using this Linux shell command. This is detailed on our FAQ too!

Remember the above command is for the Linux Terminal, and not the R Console (to further complicate things, is common to use terminal and console without any disambiguation, as if the terms were 1oo% interchangeable). These are two different ‘Tabs’ in the Rstudio IDE. Go to the menu “View” and select “Move Focus to Terminal”. Over there you can run this rm Linux shell command.

Consistency

  • Finally, we want our installation to be coherent:
BiocManager::valid()

If the above code block outputs anything different to TRUE, it’s because according to BiocManager our current state is not valid. To fix it, you should run the BiocManager::install() command as it’s stated in the output message.

Check Installation

library(Seurat)
packageVersion("Seurat")

Congrats!

You made it. One last thing, we’ll be using some educational datasets. The next part is also required, but way more simpler

Datasets

R packages bundle data with them, usually for testing purposes.

In the case of Seurat, there is a package (SeuratData) specifically designed to download some datasets. You may use SeuratData::AvailableData() to get a table of all these ‘educational’ datasets. Let’s get a dataset:

if (!"SeuratData" %in% installed.packages()) remotes::install_github("satijalab/seurat-data")
if (!"ifnb.SeuratData" %in% installed.packages()) SeuratData::InstallData("ifnb")
if (!"pbmc3k.SeuratData" %in% installed.packages()) SeuratData::InstallData("pbmc3k")

Download preprocessed datasets from zenodo

For the later part of the course, we have preprocessed a couple of large datasets. Depending on where you will be working, there are two alternative ways now. So, if you are a Workbench user, go to the next subtitle.

If you will be using RStudio from your laptop, then you may download from zenodo:

  1. “datasets/preprocessed_rds/panc_sub.RDS”

Depending on your bandwidth, the downloads may take a couple of minutes.

Workbench Users

The datasets are already provided under /scratch/local/rseurat/datasets/preprocessed_rds. You can copy from this location, and skip the download.

Please note the use of working directory environment variable ($PWD) in the next command. This is supposed to be run inside the repository ./rmd/ subfolder. Adjust accordingly, or use mv to fix it.

cp -r /scratch/local/rseurat/datasets/preprocessed_rds ${PWD}/datasets/preprocessed_rds

HDF Files & Core System Dependencies

On the fourth day we’ll be having a hands-on session with public data. It may be the case that we give you a matrix in the H5 file format. To load this, you will need to install hdf5r package.

For this package, you will need a system dependency, that is a software library that needs to be installed on your Operative System. Again, workbench users have an advantage because the system has been carefully tuned already.

In any case, don’t sweat it. Following instructions are not mandatory, and in the case of OSX or Windows users, they’re mostly recommended as a way to handle the difficulty (installing system dependencies) that will probably come up multiple times during your work.

OS-specific instructions

On Linux, you’d need to install either libhdf5-dev (ubuntu) or hdf5-devel (centos), depending on your distribution. The actual name of the package could also change with your release distribution (e.g. ubuntu 18/ bionic versus 20/ focal).

On OSX, you’d need to install Homebrew first, and then run $ brew install hdf5.

On Windows, there are two options: one would be installing miniconda. Or, the most straightforward would probably be using WSL2, and then following instructions for Ubuntu Linux (you may choose another distro, but we recommend you start with Ubuntu).

End

sessionInfo

It’s common practice in the R-lang communities to include a sessionInfo for the means of debugging and reproducibility. You can safely ignore the following.

sessionInfo()