SV Processing
1 Overview
This page documents SV processing as implemented in the IMPACT-SV upstream module. IMPACT-VIS receives preprocessed AnnotSV-annotated TSV files with variants already filtered and prioritized via phenotype-aware gene and HPO term filtering.
Upstream Repository: IMPACT-SV (companion to IMPACT-VIS) Expected Output: {sample_id}_SV_IMPACT.tsv
For reproducibility details, see the IMPACT-SV Documentation.
2 Introduction
The IMPACT-SV module leverages AnnotSV for comprehensive annotation and phenotype-aware filtering of structural variants (SVs). This section details the SV preprocessing workflow, annotation strategy, phenotype-specific filtering, ACMG classification, and output format for integration with IMPACT-VIS.
3 Workflow Overview
The IMPACT-SV module implements a multi-step pipeline:
- VCF Processing: SNV/indel integration for breakpoint validation
- AnnotSV Annotation: Comprehensive functional and clinical annotation
- Phenotype-Aware Filtering: Candidate gene and HPO term filtering
- ACMG Classification: Standardized pathogenicity assessment
- Integration with IMPACT-VIS: TSV output for interactive visualization
4 Structural Variant Annotation
4.1 AnnotSV Configuration
Each structural variant VCF file is processed with AnnotSV using parameters optimized for phenotype-specific prioritization:
Key Parameters:
| Parameter | Value | Purpose |
|---|---|---|
candidateGenesFile |
Curated gene list (Open Targets) | Retain only variants near phenotype-associated genes |
hpoTermFile |
HPO term ID | Enable phenotype-linked annotation (e.g., HP:0000365 for hearing loss) |
candidateSnvIndelFiles |
SNV/indel VCF | Cross-reference small variants for breakpoint confidence |
outputFormat |
Full annotation | Produce detailed variant-level information |
4.2 Functional Impact Classification
AnnotSV assigns each SV a functional classification based on predicted impact:
| Impact Category | Examples | Clinical Significance |
|---|---|---|
| Exonic | Within gene coding region | Directly affects protein-coding sequence |
| Intronic | Within gene introns | May affect splicing; context-dependent |
| Regulatory | Promoter, enhancer region | Potential transcriptional regulation |
| Intergenic | Between genes | Low priority unless within regulatory elements |
4.3 ACMG Classification
AnnotSV integrates American College of Medical Genetics and Genomics (ACMG) classification guidelines for structural variants. Deletions and duplications are assigned ACMG-based pathogenicity categories based on curated evidence:
| ACMG Category | Interpretation |
|---|---|
| Pathogenic | Strong evidence for pathogenicity |
| Likely Pathogenic | Moderate evidence for pathogenicity |
| Uncertain Significance | Insufficient evidence for classification |
| Likely Benign | Moderate evidence for benignity |
| Benign | Strong evidence for benignity |
Classification Basis: - Overlap with dosage-sensitive genes (ClinGen dosage sensitivity maps) - Inheritance patterns - Population frequency (gnomAD) - Clinical evidence (ClinVar, literature)
Current Limitations: ACMG scoring is currently omitted for translocations, inversions, and insertions in AnnotSV, reflecting limitations in available evidence and consensus criteria for these variant types.
5 Phenotype-Specific Filtering
5.1 Candidate Gene Filtering
The candidateGenesFile parameter is set to a curated gene list derived from the Open Targets platform, ensuring that only variants within or proximal to genes with non-zero gene–disease association (GDA) scores are retained.
Gene List Generation: 1. Query Open Targets platform with phenotype term (e.g., “hearing loss”) 2. Extract genes with non-zero GDA scores 3. Generate text file with one gene per line 4. Supply to AnnotSV via candidateGenesFile parameter
Distance Threshold: Variants with at least one breakpoint within 10 kb of a phenotype-associated gene are retained.
Rationale: This filtering ensures that identified variants are prioritized for known disease associations, reducing false positives and focusing manual review on actionable candidates.
5.2 HPO-Based Annotation
The Human Phenotype Ontology (HPO) term corresponding to the trait under investigation (e.g., HP:0000365 for “hearing loss”) is provided to AnnotSV, enabling:
- Phenotype-linked annotation appending
- Cross-reference with phenotype-disease relationships
- Enrichment of clinical context in output
5.3 SNV/Indel Cross-Referencing
The candidateSnvIndelFiles parameter is supplied with the SNV/indel VCF for each sample, allowing AnnotSV to:
- Cross-reference small variants when assessing SV zygosity
- Improve confidence in genotype interpretation for deletions overlapping coding regions
- Reduce false positives through concordance checking
Example: A deletion spanning a gene can be validated by detecting no heterozygous SNVs within the deleted region (consistent with homozygous deletion).
6 Output Format
The final output of the IMPACT-SV module consists of per-sample SV_IMPACT.tsv files in AnnotSV format, consolidating:
- AnnotSV annotations: Comprehensive variant-level functional information
- Phenotype-aware filtering: Candidate genes and HPO annotations applied
- ACMG classifications: Standardized pathogenicity assessment
- Sample-level genotype information: Zygosity and supporting evidence
File Structure: Tab-separated values with AnnotSV standard columns plus sample-specific genotype fields
6.1 Integration with IMPACT-VIS
SV_IMPACT.tsv files are formatted for direct input into the IMPACT-VIS visualization module, enabling:
- Unified variant review: SVs visualized alongside SNVs, indels, and CNVs
- Phenotype-aware prioritization: High-priority variants automatically highlighted
- Interactive exploration: Click-through to extended annotations and external databases
- Systematic curation: Per-sample classifications and notes persisted for reproducibility
7 Computational Considerations
File Size: AnnotSV output typically generates 50+ columns per variant, resulting in files ranging from 500 KB to several MB depending on SV count.
Processing Time: Full AnnotSV annotation (100-500 SVs per sample) typically requires 30-120 seconds depending on annotation database performance.
Reproducibility: All AnnotSV parameters and gene lists are documented alongside output files, enabling downstream regeneration of annotations if needed.