SV Processing

Methods for structural variant annotation and phenotype-aware prioritization via the IMPACT-SV module

Authors

Affiliation

Nicholas Boehler

University of Toronto Mississauga

Hai-Ying Mary Cheng

University of Toronto Mississauga

Published

June 1, 2026

1 Overview

IMPACT-SV Pipeline Integration

This page documents SV processing as implemented in the IMPACT-SV upstream module. IMPACT-VIS receives preprocessed AnnotSV-annotated TSV files with variants already filtered and prioritized via phenotype-aware gene and HPO term filtering.

Upstream Repository: IMPACT-SV (companion to IMPACT-VIS) Expected Output: {sample_id}_SV_IMPACT.tsv

For reproducibility details, see the IMPACT-SV Documentation.

2 Introduction

The IMPACT-SV module leverages AnnotSV for comprehensive annotation and phenotype-aware filtering of structural variants (SVs). This section details the SV preprocessing workflow, annotation strategy, phenotype-specific filtering, ACMG classification, and output format for integration with IMPACT-VIS.

3 Workflow Overview

The IMPACT-SV module implements a multi-step pipeline:

VCF Processing: SNV/indel integration for breakpoint validation
AnnotSV Annotation: Comprehensive functional and clinical annotation
Phenotype-Aware Filtering: Candidate gene and HPO term filtering
ACMG Classification: Standardized pathogenicity assessment
Integration with IMPACT-VIS: TSV output for interactive visualization

4 Structural Variant Annotation

4.1 AnnotSV Configuration

Each structural variant VCF file is processed with AnnotSV using parameters optimized for phenotype-specific prioritization:

Key Parameters:

Parameter	Value	Purpose
`candidateGenesFile`	Curated gene list (Open Targets)	Retain only variants near phenotype-associated genes
`hpoTermFile`	HPO term ID	Enable phenotype-linked annotation (e.g., HP:0000365 for hearing loss)
`candidateSnvIndelFiles`	SNV/indel VCF	Cross-reference small variants for breakpoint confidence
`outputFormat`	Full annotation	Produce detailed variant-level information

4.2 Functional Impact Classification

AnnotSV assigns each SV a functional classification based on predicted impact:

Impact Category	Examples	Clinical Significance
Exonic	Within gene coding region	Directly affects protein-coding sequence
Intronic	Within gene introns	May affect splicing; context-dependent
Regulatory	Promoter, enhancer region	Potential transcriptional regulation
Intergenic	Between genes	Low priority unless within regulatory elements

4.3 ACMG Classification

AnnotSV integrates American College of Medical Genetics and Genomics (ACMG) classification guidelines for structural variants. Deletions and duplications are assigned ACMG-based pathogenicity categories based on curated evidence:

ACMG Category	Interpretation
Pathogenic	Strong evidence for pathogenicity
Likely Pathogenic	Moderate evidence for pathogenicity
Uncertain Significance	Insufficient evidence for classification
Likely Benign	Moderate evidence for benignity
Benign	Strong evidence for benignity

Classification Basis: - Overlap with dosage-sensitive genes (ClinGen dosage sensitivity maps) - Inheritance patterns - Population frequency (gnomAD) - Clinical evidence (ClinVar, literature)

Current Limitations: ACMG scoring is currently omitted for translocations, inversions, and insertions in AnnotSV, reflecting limitations in available evidence and consensus criteria for these variant types.

5 Phenotype-Specific Filtering

5.1 Candidate Gene Filtering

The candidateGenesFile parameter is set to a curated gene list derived from the Open Targets platform, ensuring that only variants within or proximal to genes with non-zero gene–disease association (GDA) scores are retained.

Gene List Generation: 1. Query Open Targets platform with phenotype term (e.g., “hearing loss”) 2. Extract genes with non-zero GDA scores 3. Generate text file with one gene per line 4. Supply to AnnotSV via candidateGenesFile parameter

Distance Threshold: Variants with at least one breakpoint within 10 kb of a phenotype-associated gene are retained.

Rationale: This filtering ensures that identified variants are prioritized for known disease associations, reducing false positives and focusing manual review on actionable candidates.

5.2 HPO-Based Annotation

The Human Phenotype Ontology (HPO) term corresponding to the trait under investigation (e.g., HP:0000365 for “hearing loss”) is provided to AnnotSV, enabling:

Phenotype-linked annotation appending
Cross-reference with phenotype-disease relationships
Enrichment of clinical context in output

5.3 SNV/Indel Cross-Referencing

The candidateSnvIndelFiles parameter is supplied with the SNV/indel VCF for each sample, allowing AnnotSV to:

Cross-reference small variants when assessing SV zygosity
Improve confidence in genotype interpretation for deletions overlapping coding regions
Reduce false positives through concordance checking

Example: A deletion spanning a gene can be validated by detecting no heterozygous SNVs within the deleted region (consistent with homozygous deletion).

6 Output Format

The final output of the IMPACT-SV module consists of per-sample SV_IMPACT.tsv files in AnnotSV format, consolidating:

AnnotSV annotations: Comprehensive variant-level functional information
Phenotype-aware filtering: Candidate genes and HPO annotations applied
ACMG classifications: Standardized pathogenicity assessment
Sample-level genotype information: Zygosity and supporting evidence

File Structure: Tab-separated values with AnnotSV standard columns plus sample-specific genotype fields

6.1 Integration with IMPACT-VIS

SV_IMPACT.tsv files are formatted for direct input into the IMPACT-VIS visualization module, enabling:

Unified variant review: SVs visualized alongside SNVs, indels, and CNVs
Phenotype-aware prioritization: High-priority variants automatically highlighted
Interactive exploration: Click-through to extended annotations and external databases
Systematic curation: Per-sample classifications and notes persisted for reproducibility

7 Computational Considerations

File Size: AnnotSV output typically generates 50+ columns per variant, resulting in files ranging from 500 KB to several MB depending on SV count.

Processing Time: Full AnnotSV annotation (100-500 SVs per sample) typically requires 30-120 seconds depending on annotation database performance.

Reproducibility: All AnnotSV parameters and gene lists are documented alongside output files, enabling downstream regeneration of annotations if needed.