
Derive starter sample metadata from count sample names
Source:R/input_preparation.R
derive_vista_metadata.Rdderive_vista_metadata() creates a starter sample_info table from count
sample names. It is intended for projects where users have count columns but
do not yet have a separate metadata sheet. The derived table can be edited,
passed through read_vista_metadata(), and then aligned with
match_vista_inputs().
Arguments
- counts
Count input accepted by
read_vista_counts(), or the list returned byread_vista_counts().- column_geneid
Optional gene identifier column for raw tabular count inputs. Ignored when
countsis the list output ofread_vista_counts().- sample_names
Optional explicit sample names to derive metadata from. When supplied, these override names extracted from
counts.- parser
Metadata parsing mode.
"auto"tries a simple delimiter-based split when sample names have a consistent structure."split"usessplitexplicitly."regex"usespattern."none"returns only thesample_namescolumn.- split
Delimiter used when
parser = "split"or when"auto"chooses split-based parsing.- fields
Optional field names for parsed metadata columns. When omitted, VISTA uses
part_1,part_2, etc.- pattern
Regular expression used when
parser = "regex". Capture groups are mapped tofieldsin order.- sample_column
Name of the sample identifier column in the returned metadata. Default is
"sample_names".- repair_sample_names
Strategy passed to
read_vista_counts()when sample names are taken fromcounts. One of"auto"or"none".- return_type
Return
"data.frame"(default) or"template". Both return a data frame;"template"adds empty placeholder columns forgroupandbatch.- verbose
Logical; print an informational derivation summary.
Examples
data("count_data", package = "VISTA")
data("sample_metadata", package = "VISTA")
counts_in <- count_data[seq_len(8), c("gene_id", sample_metadata$sample_names[seq_len(6)]), drop = FALSE]
meta <- derive_vista_metadata(
counts_in,
column_geneid = "gene_id",
parser = "regex",
pattern = "SRR(\\d+)",
fields = "run_id"
)
#> Derived metadata for 6 samples using parser "regex".
head(meta)
#> sample_names run_id
#> 1 SRR1039508 1039508
#> 2 SRR1039509 1039509
#> 3 SRR1039512 1039512
#> 4 SRR1039513 1039513
#> 5 SRR1039516 1039516
#> 6 SRR1039517 1039517