Prepare a matrix of normalised gene expression values.
Source:R/rnaseq_related.R
get_normalised_expression_matrix.Rd
This function returns a dataframe having first column gene names and subsequent columns are
normalised gene expression values for the samples passed through argument samples
. Internally it filters
the data from the norm_counts
of argument x
. For more details on how norm_counts
created refer to
the documentation of run_deseq_analysis()
.
Usage
get_normalised_expression_matrix(
x,
samples = NULL,
genes = NULL,
summarise_replicates = FALSE,
summarise_method = "median"
)
Arguments
- x
an abject of class "parcutils". This is an output of the function
run_deseq_analysis()
.- samples
a character vector denoting samples for which normalised gene expression values to be derived, Default NULL. If NULL it returns all samples in x
- genes
a character vector denoting gene names for which normalised gene expression values to be derived, Default NULL. If NULL it returns all samples in x.
- summarise_replicates
logical, default FALSE, indicating whether gene expression values summarised by mean or median between replicates.
- summarise_method
a character string either "mean" or "median" by which normalised gene expression values will be summarised between replicates.
Examples
count_file <- system.file("extdata","toy_counts.txt" , package = "parcutils")
count_data <- readr::read_delim(count_file, delim = "\t")
#> Rows: 5000 Columns: 10
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (1): gene_id
#> dbl (9): control_rep1, control_rep2, control_rep3, treat1_rep1, treat1_rep2,...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sample_info <- count_data %>% colnames() %>% .[-1] %>%
tibble::tibble(samples = . , groups = rep(c("control" ,"treatment1" , "treatment2"), each = 3) )
res <- run_deseq_analysis(counts = count_data ,
sample_info = sample_info,
column_geneid = "gene_id" ,
group_numerator = c("treatment1", "treatment2") ,
group_denominator = c("control"))
#> ℹ Running DESeq2 ...
#> converting counts to integer mode
#> Warning: some variables in design formula are characters, converting to factors
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> ✔ Done.
#> ℹ Summarizing DEG ...
#> ✔ Done.
get_normalised_expression_matrix(x = res) %>% print()
#> # A tibble: 4,034 × 10
#> gene_id treat1_rep1 treat1_rep2 treat1_rep3 control_rep1 control_rep2
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 ENSG0000015434… 440. 461. 456. 694. 687.
#> 2 ENSG0000019641… 3370. 3222. 3478. 3591. 3616.
#> 3 ENSG0000017302… 707. 590. 709. 739. 781.
#> 4 ENSG0000014059… 72.6 78.3 71.3 92.7 72.7
#> 5 ENSG0000013926… 1403. 1484. 1568. 1463. 1781.
#> 6 ENSG0000016589… 2815. 3559. 3404. 3875. 4514.
#> 7 ENSG0000017945… 12679. 12542. 13690. 13257. 13358.
#> 8 ENSG0000018886… 461. 434. 459. 540. 592.
#> 9 ENSG0000024915… 0 2.03 0.990 0 11.0
#> 10 ENSG0000011085… 2.18 9.15 9.90 11.7 5.97
#> # … with 4,024 more rows, and 4 more variables: control_rep3 <dbl>,
#> # treat2_rep1 <dbl>, treat2_rep2 <dbl>, treat2_rep3 <dbl>
# summarise replicates by median
get_normalised_expression_matrix(x = res ,summarise_replicates = TRUE, summarise_method = "median") %>% print()
#> # A tibble: 4,034 × 4
#> gene_id control treatment1 treatment2
#> <chr> <dbl> <dbl> <dbl>
#> 1 ENSG00000000971:CFH 7349. 5727. 9849.
#> 2 ENSG00000001461:NIPAL3 134. 126. 68.2
#> 3 ENSG00000001497:LAS1L 9282. 8228. 7678.
#> 4 ENSG00000001631:KRIT1 23294. 29183. 20375.
#> 5 ENSG00000002746:HECW1 66.0 31.9 24.9
#> 6 ENSG00000003056:M6PR 6538. 7394. 6694.
#> 7 ENSG00000003436:TFPI 248. 225. 200.
#> 8 ENSG00000003509:NDUFAF7 462. 495. 384.
#> 9 ENSG00000004766:VPS50 40.2 38.6 121.
#> 10 ENSG00000004864:SLC25A13 370. 396. 844.
#> # … with 4,024 more rows