
Run Differential Expression Analysis with DESeq2, edgeR, or limma-voom
run_deseq_analysis.RdThese functions encapsulate the standard RNA-seq analysis workflow using
DESeq2 (run_deseq_analysis), edgeR (run_edger_analysis), or
limma-voom (run_limma_analysis), including:
gene filtering, design matrix setup, normalization, model fitting, differential testing,
DEG classification ("Up", "Down", "Other"), and result formatting.
Both methods return output in a harmonized structure ready for downstream use in create_vista or standalone DEG summaries.
Usage
run_deseq_analysis(
counts,
sample_info,
column_geneid,
group_column,
group_numerator,
group_denominator,
covariates = NULL,
design_formula = NULL,
min_counts = 10,
min_replicates = 1,
log2fc_cutoff = 1,
pval_cutoff = 0.05,
p_value_type = "padj"
)
run_edger_analysis(
counts,
sample_info,
column_geneid,
group_column,
group_numerator,
group_denominator,
covariates = NULL,
design_formula = NULL,
min_counts = 10,
min_replicates = 1,
log2fc_cutoff = 1,
pval_cutoff = 0.05,
p_value_type = "FDR"
)
run_limma_analysis(
counts,
sample_info,
column_geneid,
group_column,
group_numerator,
group_denominator,
covariates = NULL,
design_formula = NULL,
min_counts = 10,
min_replicates = 1,
log2fc_cutoff = 1,
pval_cutoff = 0.05,
p_value_type = "FDR"
)Arguments
- counts
A data frame or matrix of raw counts with one gene per row. Must include a column defined by
column_geneid, and column names must match entries insample_info$sample_names.- sample_info
A data frame with sample metadata. Must contain
sample_namesand the specified grouping column.- column_geneid
A string identifying the column name containing gene identifiers.
- group_column
The name of the column in
sample_infothat defines experimental groups.- group_numerator
A character vector of numerator group(s) for fold-change comparisons.
- group_denominator
A character vector of denominator group(s) for fold-change comparisons.
- covariates
Optional character vector of additional sample_info columns to adjust for.
- design_formula
Optional model formula (or formula string). When provided, it overrides automatic design construction from
group_column+covariates. Must includegroup_column.- min_counts
Minimum total read count across all samples to retain a gene. Default:
10.- min_replicates
Minimum number of replicates within each group that must exceed
min_counts. Default:1.- log2fc_cutoff
Absolute log2 fold-change threshold to define DEGs. Default:
1.- pval_cutoff
P-value or adjusted p-value cutoff for significance. Default:
0.05.- p_value_type
For DESeq2: one of
"padj"or"pvalue". For edgeR/limma: one of"FDR"or"PValue".
Value
A named list with components:
norm_counts: Matrix of normalized expression values (CPM for edgeR/limma, DESeq2-normalized counts).sample_info: Updated sample metadata.row_data: Gene-level metadata, including mean expression.comparisons: Named list of DEG result tibbles (one per comparison), each containing standardized columns:gene_id,log2fc,pvalue,p.adj, andregulation.deg_summary: List of summary tables showing DEG regulation counts.
Details
Perform differential expression (DE) analysis across multiple group comparisons
using DESeq2, edgeR, or limma-voom. These functions process raw count data,
normalize it, execute pairwise group-level tests, and return standardized DEG outputs
compatible with VISTA-based visualization and analysis.
For DESeq2, normalization is performed via DESeq, and DE testing uses results.
For edgeR, normalization uses calcNormFactors, and testing uses glmLRT.
For limma, normalization uses calcNormFactors + voom, and testing uses eBayes.
Low-abundance filtering is applied before model fitting.
Gene regulation status is determined via .categorize_deg_results() based on user thresholds.
All output comparison results are internally standardized via .tidy_de_results() to ensure
a uniform column schema compatible with VISTA plotting tools.
Examples
if (FALSE) { # \dontrun{
deseq_results <- run_deseq_analysis(
counts = counts_small,
sample_info = subset_samples,
column_geneid = "gene_id",
group_column = "groups",
group_numerator = c("ZNF219_sh5KD", "ZNF219_sh6KD"),
group_denominator = c("shCTRL_1", "shCTRL_1"),
min_counts = 5,
min_replicates = 1
)
names(deseq_results$comparisons)
} # }