Skip to contents

Reads Per Kilo Base Per Million (RPKM) and Tags Per Million (TPM) are two most waidely used normalisation methods in RNA-seq experiments. This function convert raw read counts either into RPKM or TPM values.

Usage

normalise_counts(x, .vars = NULL, method = "TPM")

Arguments

x

A dataframe of raw counts along with mandatory columns which are GeneID, Chr, Start, End, Strand, Length.

.vars

A character vector containing columns from x. Normalization will be performed only on these columns. If NULL (default) all columns normlaisation will be performed on all columns.

method

A character string, default TPM. Choices are one of TPM, RPKM.

Value

A dataframe with all mandatory columns along with columns mentioned in .vars. Remaining columns from x will be dropped.

Details

implemntaion of RPKM and TPM can be seen in functions get_tpm() and get_rpkm() respectively.

Examples

set.seed(123)
tt <- tibble::tibble(gene_id = c(paste("Gene_",1:5,sep = "")),
                     chr = "Chr1",
                     start = sample(1:100, 5),
                     end = sample(100:200,5),
                     strand = sample(c("+" ,"-"), 5, replace = TRUE),
                     length = (end - start )+ 1 )

tt %<>% dplyr::mutate(sample_1 = sample(c(1:100),5)*10 ,
                      sample_2 = sample(c(1:100),5)*10 ,
                      sample_3 = sample(c(1:100),5)*100,
                      sample_4 = sample(c(1:100),5)*100 )

normalise_counts(x = tt ,method = "RPKM")
#> # A tibble: 5 × 10
#>   gene_id chr   start   end strand length sample_1 sample_2 sample_3 sample_4
#>   <chr>   <chr> <int> <int> <chr>   <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#> 1 Gene_1  Chr1     31   141 -         111    2397.    3626.     283.    2239.
#> 2 Gene_2  Chr1     79   149 +          71    2347.    4122.    4073.    6187.
#> 3 Gene_3  Chr1     51   142 -          92    2924.    1149.    1363.     942.
#> 4 Gene_4  Chr1     14   113 +         100     263.     285.    2718.    1850.
#> 5 Gene_5  Chr1     67   124 +          58    4688.    2944.    4866.     698.
normalise_counts(x = tt, .vars = c("sample_1","sample_2") ,method = "TPM")
#> # A tibble: 5 × 8
#>   gene_id chr   start   end strand length sample_1 sample_2
#>   <chr>   <chr> <int> <int> <chr>   <dbl>    <dbl>    <dbl>
#> 1 Gene_1  Chr1     31   141 -         111  189945.  299019.
#> 2 Gene_2  Chr1     79   149 +          71  186006.  339986.
#> 3 Gene_3  Chr1     51   142 -          92  231691.   94749.
#> 4 Gene_4  Chr1     14   113 +         100   20852.   23468.
#> 5 Gene_5  Chr1     67   124 +          58  371505.  242777.
normalise_counts(x = tt, .vars = c("sample_1","sample_2") ,method = "RPKM")
#> # A tibble: 5 × 8
#>   gene_id chr   start   end strand length sample_1 sample_2
#>   <chr>   <chr> <int> <int> <chr>   <dbl>    <dbl>    <dbl>
#> 1 Gene_1  Chr1     31   141 -         111    2397.    3626.
#> 2 Gene_2  Chr1     79   149 +          71    2347.    4122.
#> 3 Gene_3  Chr1     51   142 -          92    2924.    1149.
#> 4 Gene_4  Chr1     14   113 +         100     263.     285.
#> 5 Gene_5  Chr1     67   124 +          58    4688.    2944.