Reads Per Kilo Base Per Million (RPKM) and Tags Per Million (TPM) are two most waidely used normalisation methods in RNA-seq experiments. This function convert raw read counts either into RPKM or TPM values.
Arguments
- x
A dataframe of raw counts along with mandatory columns which are
GeneID, Chr, Start, End, Strand, Length
.- .vars
A character vector containing columns from
x
. Normalization will be performed only on these columns. If NULL (default) all columns normlaisation will be performed on all columns.- method
A character string, default TPM. Choices are one of TPM, RPKM.
Value
A dataframe with all mandatory columns along with columns mentioned in .vars
. Remaining columns from x
will be dropped.
Details
implemntaion of RPKM and TPM can be seen in functions get_tpm()
and get_rpkm()
respectively.
Examples
set.seed(123)
tt <- tibble::tibble(gene_id = c(paste("Gene_",1:5,sep = "")),
chr = "Chr1",
start = sample(1:100, 5),
end = sample(100:200,5),
strand = sample(c("+" ,"-"), 5, replace = TRUE),
length = (end - start )+ 1 )
tt %<>% dplyr::mutate(sample_1 = sample(c(1:100),5)*10 ,
sample_2 = sample(c(1:100),5)*10 ,
sample_3 = sample(c(1:100),5)*100,
sample_4 = sample(c(1:100),5)*100 )
normalise_counts(x = tt ,method = "RPKM")
#> # A tibble: 5 × 10
#> gene_id chr start end strand length sample_1 sample_2 sample_3 sample_4
#> <chr> <chr> <int> <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Gene_1 Chr1 31 141 - 111 2397. 3626. 283. 2239.
#> 2 Gene_2 Chr1 79 149 + 71 2347. 4122. 4073. 6187.
#> 3 Gene_3 Chr1 51 142 - 92 2924. 1149. 1363. 942.
#> 4 Gene_4 Chr1 14 113 + 100 263. 285. 2718. 1850.
#> 5 Gene_5 Chr1 67 124 + 58 4688. 2944. 4866. 698.
normalise_counts(x = tt, .vars = c("sample_1","sample_2") ,method = "TPM")
#> # A tibble: 5 × 8
#> gene_id chr start end strand length sample_1 sample_2
#> <chr> <chr> <int> <int> <chr> <dbl> <dbl> <dbl>
#> 1 Gene_1 Chr1 31 141 - 111 189945. 299019.
#> 2 Gene_2 Chr1 79 149 + 71 186006. 339986.
#> 3 Gene_3 Chr1 51 142 - 92 231691. 94749.
#> 4 Gene_4 Chr1 14 113 + 100 20852. 23468.
#> 5 Gene_5 Chr1 67 124 + 58 371505. 242777.
normalise_counts(x = tt, .vars = c("sample_1","sample_2") ,method = "RPKM")
#> # A tibble: 5 × 8
#> gene_id chr start end strand length sample_1 sample_2
#> <chr> <chr> <int> <int> <chr> <dbl> <dbl> <dbl>
#> 1 Gene_1 Chr1 31 141 - 111 2397. 3626.
#> 2 Gene_2 Chr1 79 149 + 71 2347. 4122.
#> 3 Gene_3 Chr1 51 142 - 92 2924. 1149.
#> 4 Gene_4 Chr1 14 113 + 100 263. 285.
#> 5 Gene_5 Chr1 67 124 + 58 4688. 2944.