Data Manager Module
1 Data Manager Module
1.1 Overview
The data_manager.R module provides core data loading functionality for GDS, SV, and CNV inputs with pre-load validation and on-disk filtering (SeqArray). Loaders return NULL on validation failure (with a warning); otherwise they return a data.frame (potentially empty).
Location: app/logic/data_manager.R
1.2 Exported Functions
1.2.1 load_gds_data()
Loads SNV/Indel data from GDS files with optional filtering and ranking by IMPACT score. Uses on-disk filtering via SeqArray to minimize memory usage.
Parameters:
gds_path(character): Path to GDS filenum_variants(integer): Maximum number of variants to return (default: 1000)filters(named list): Optional filters with keys:clinvar,tier,genes,genotype(default:list())bravo_thresh(numeric): Optional threshold for BRAVO allele frequency (variants with AF ≤ threshold retained)
Returns:
- data.frame containing top N variants with annotations, or
- empty data.frame() if no variants match filters
- NULL with warning if file validation fails
Details: Applies filters on the GDS handle (on-disk), ranks by annotation/info/impact_score (descending; NAs last), then retrieves top N variants and extracts a fixed set of annotations.
The returned data.frame includes (at least) the following columns:
Chromosome,PositionIMPACT_Score,IMPACT_CalcVarInfo,Genes,TierClinVar,ClinVar_DiseaseREF,ALT,ShapeBravo_AF,ALoFT_Prediction,ALoFT_Score
Example:
box::use(app/logic/data_manager[load_gds_data])
# Load all variants, no filtering
snv_data <- load_gds_data(
gds_path = "app/data/sample_1_SNV_IMPACT.gds",
num_variants = 1000
)
# Load with Tier filtering
snv_filtered <- load_gds_data(
gds_path = "app/data/sample_1_SNV_IMPACT.gds",
num_variants = 500,
filters = list(tier = c("1", "2")),
bravo_thresh = 0.01
)Error Handling: Returns NULL and emits warning if GDS file validation fails.
1.2.2 apply_gds_filters()
Applies filter criteria to an open GDS file handle in-place. Modifies the GDS filter mask (on-disk filtering, no memory load).
Parameters: - gds_h (SeqArray GDS object): Open GDS file handle - filters (named list): Filter criteria with optional keys: - clinvar: Character vector of ClinVar terms to match (e.g., “Pathogenic”) - tier: Character vector of tier values (e.g., c(“1”, “2”)) - genes: Character vector of gene names to filter by - genotype: One of “alt1” (heterozygous) or “alt_ge2” (homozygous/compound) - bravo_thresh (numeric): BRAVO AF upper threshold
Returns: NULL (invisibly). Modifies GDS filter in-place.
Details: Applies filters sequentially with logical AND. ClinVar terms are normalized for matching (case-insensitive, punctuation-tolerant). This is an internal function typically used by load_gds_data().
Filter semantics:
filters$clinvar: splits each variant’s ClinVar string on common separators (pipe/comma/whitespace), normalizes terms, and matches if any term equals any selected normalized term.filters$tier: keeps variants whereannotation/info/tieris in the provided tier set.filters$genes: uses regex matching againstannotation/info/FunctionalAnnotation/genecode_comprehensive_info.filters$genotype: supports"alt1"for heterozygous ($dosage_alt == 1) and"alt_ge2"for homozygous/compound ($dosage_alt >= 2) using an efficient subset filter.bravo_thresh: keeps variants with missing Bravo AF, orbravo_af <= bravo_thresh.
Example:
box::use(app/logic/data_manager[apply_gds_filters])
box::use(SeqArray[seqOpen, seqClose])
gds <- seqOpen("sample_1_SNV_IMPACT.gds")
apply_gds_filters(
gds,
filters = list(clinvar = c("Pathogenic", "Likely Pathogenic")),
bravo_thresh = 0.01
)
seqClose(gds)1.2.3 load_sv_data()
Loads structural variant data from AnnotSV TSV files with validation.
Parameters: - sv_path (character): Path to AnnotSV TSV file
Returns: - data.frame containing SV annotations, or - NULL if validation fails
Details: Validates TSV structure before loading. Expects AnnotSV format with standard columns: AnnotSV_ID, SV_chrom, SV_start, SV_end, SV_type, Samples_ID, and 150+ additional annotation columns.
Notes: The loader is permissive beyond validation: it reads the TSV via readr::read_tsv(..., na = c(".", "")) and returns the resulting data.frame.
Example:
box::use(app/logic/data_manager[load_sv_data])
sv_data <- load_sv_data("app/data/sample_1_SV_IMPACT.tsv")Error Handling: Returns NULL and emits warning if file validation fails.
1.2.4 load_cnv_data()
Loads copy number variant data from IMPACT-CNV TXT files (headerless, 6 tab-separated columns).
Parameters: - cnv_path (character): Path to CNV TXT file - include_failed (logical): Include CNVs with “Failed” interpretation (default: FALSE)
Returns: - data.frame with columns: CNV_Identifier, Sample_Interval, Interpretation, User_Note, Date_Time, Username, or - NULL if validation fails
Details: Parses headerless IMPACT-CNV format. By default, only “Passed” interpretations are returned. Column structure: (1) cnv_identifier, (2) sample_interval, (3) interpretation, (4) user_note, (5) date_and_time, (6) username.
Example:
box::use(app/logic/data_manager[load_cnv_data])
# Load only passed CNVs
cnv_passed <- load_cnv_data("app/data/sample_1_CNV_IMPACT.txt")
# Load all CNVs including failed
cnv_all <- load_cnv_data(
"app/data/sample_1_CNV_IMPACT.txt",
include_failed = TRUE
)Error Handling: Returns NULL and emits warning if file validation fails.
1.3 See Also
Validators for data validation functions
Error Handler for error reporting