Prepare a matrix of normalised gene expression values. — get_normalised_expression

This function returns a dataframe having first column gene names and subsequent columns are normalised gene expression values for the samples passed through argument samples. Internally it filters the data from the norm_counts of argument x. For more details on how norm_counts created refer to the documentation of run_deseq_analysis().

Usage

get_normalised_expression_matrix(
  x,
  samples = NULL,
  genes = NULL,
  summarise_replicates = FALSE,
  summarise_method = "median"
)

Arguments

x: an abject of class "parcutils". This is an output of the function run_deseq_analysis().
samples: a character vector denoting samples for which normalised gene expression values to be derived, Default NULL. If NULL it returns all samples in x
genes: a character vector denoting gene names for which normalised gene expression values to be derived, Default NULL. If NULL it returns all samples in x.
summarise_replicates: logical, default FALSE, indicating whether gene expression values summarised by mean or median between replicates.
summarise_method: a character string either "mean" or "median" by which normalised gene expression values will be summarised between replicates.

Value

a dataframe.

Examples

count_file <- system.file("extdata","toy_counts.txt" , package = "parcutils")
count_data <- readr::read_delim(count_file, delim = "\t")
#> Rows: 5000 Columns: 10
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (1): gene_id
#> dbl (9): control_rep1, control_rep2, control_rep3, treat1_rep1, treat1_rep2,...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

sample_info <- count_data %>% colnames() %>% .[-1]  %>%
 tibble::tibble(samples = . , groups = rep(c("control" ,"treatment1" , "treatment2"), each = 3) )


res <- run_deseq_analysis(counts = count_data ,
                         sample_info = sample_info,
                         column_geneid = "gene_id" ,
                         group_numerator = c("treatment1", "treatment2") ,
                         group_denominator = c("control"))
#> ℹ Running DESeq2 ...
#> converting counts to integer mode
#> Warning: some variables in design formula are characters, converting to factors
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> ✔ Done.
#> ℹ Summarizing DEG ...
#> ✔ Done.


get_normalised_expression_matrix(x = res) %>% print()
#> # A tibble: 4,034 × 10
#>    gene_id         treat1_rep1 treat1_rep2 treat1_rep3 control_rep1 control_rep2
#>    <chr>                 <dbl>       <dbl>       <dbl>        <dbl>        <dbl>
#>  1 ENSG0000015434…      440.        461.       456.           694.        687.  
#>  2 ENSG0000019641…     3370.       3222.      3478.          3591.       3616.  
#>  3 ENSG0000017302…      707.        590.       709.           739.        781.  
#>  4 ENSG0000014059…       72.6        78.3       71.3           92.7        72.7 
#>  5 ENSG0000013926…     1403.       1484.      1568.          1463.       1781.  
#>  6 ENSG0000016589…     2815.       3559.      3404.          3875.       4514.  
#>  7 ENSG0000017945…    12679.      12542.     13690.         13257.      13358.  
#>  8 ENSG0000018886…      461.        434.       459.           540.        592.  
#>  9 ENSG0000024915…        0           2.03       0.990          0          11.0 
#> 10 ENSG0000011085…        2.18        9.15       9.90          11.7         5.97
#> # … with 4,024 more rows, and 4 more variables: control_rep3 <dbl>,
#> #   treat2_rep1 <dbl>, treat2_rep2 <dbl>, treat2_rep3 <dbl>

# summarise replicates by median

get_normalised_expression_matrix(x = res ,summarise_replicates = TRUE, summarise_method = "median") %>% print()
#> # A tibble: 4,034 × 4
#>    gene_id                  control treatment1 treatment2
#>    <chr>                      <dbl>      <dbl>      <dbl>
#>  1 ENSG00000000971:CFH       7349.      5727.      9849. 
#>  2 ENSG00000001461:NIPAL3     134.       126.        68.2
#>  3 ENSG00000001497:LAS1L     9282.      8228.      7678. 
#>  4 ENSG00000001631:KRIT1    23294.     29183.     20375. 
#>  5 ENSG00000002746:HECW1       66.0       31.9       24.9
#>  6 ENSG00000003056:M6PR      6538.      7394.      6694. 
#>  7 ENSG00000003436:TFPI       248.       225.       200. 
#>  8 ENSG00000003509:NDUFAF7    462.       495.       384. 
#>  9 ENSG00000004766:VPS50       40.2       38.6      121. 
#> 10 ENSG00000004864:SLC25A13   370.       396.       844. 
#> # … with 4,024 more rows