SV Processing

Methods for structural variant annotation and phenotype-aware prioritization via the IMPACT-SV module
Authors
Affiliation

Nicholas Boehler

University of Toronto Mississauga

Hai-Ying Mary Cheng

University of Toronto Mississauga

Published

December 18, 2025

1 Overview

NoteIMPACT-SV Pipeline Integration

This page documents SV processing as implemented in the IMPACT-SV upstream module. IMPACT-VIS receives preprocessed AnnotSV-annotated TSV files with variants already filtered and prioritized via phenotype-aware gene and HPO term filtering.

Upstream Repository: IMPACT-SV (companion to IMPACT-VIS) Expected Output: {sample_id}_SV_IMPACT.tsv

For reproducibility details, see the IMPACT-SV Documentation.

2 Introduction

The IMPACT-SV module leverages AnnotSV for comprehensive annotation and phenotype-aware filtering of structural variants (SVs). This section details the SV preprocessing workflow, annotation strategy, phenotype-specific filtering, ACMG classification, and output format for integration with IMPACT-VIS.

3 Workflow Overview

The IMPACT-SV module implements a multi-step pipeline:

  1. VCF Processing: SNV/indel integration for breakpoint validation
  2. AnnotSV Annotation: Comprehensive functional and clinical annotation
  3. Phenotype-Aware Filtering: Candidate gene and HPO term filtering
  4. ACMG Classification: Standardized pathogenicity assessment
  5. Integration with IMPACT-VIS: TSV output for interactive visualization

4 Structural Variant Annotation

4.1 AnnotSV Configuration

Each structural variant VCF file is processed with AnnotSV using parameters optimized for phenotype-specific prioritization:

Key Parameters:

Parameter Value Purpose
candidateGenesFile Curated gene list (Open Targets) Retain only variants near phenotype-associated genes
hpoTermFile HPO term ID Enable phenotype-linked annotation (e.g., HP:0000365 for hearing loss)
candidateSnvIndelFiles SNV/indel VCF Cross-reference small variants for breakpoint confidence
outputFormat Full annotation Produce detailed variant-level information

4.2 Functional Impact Classification

AnnotSV assigns each SV a functional classification based on predicted impact:

Impact Category Examples Clinical Significance
Exonic Within gene coding region Directly affects protein-coding sequence
Intronic Within gene introns May affect splicing; context-dependent
Regulatory Promoter, enhancer region Potential transcriptional regulation
Intergenic Between genes Low priority unless within regulatory elements

4.3 ACMG Classification

AnnotSV integrates American College of Medical Genetics and Genomics (ACMG) classification guidelines for structural variants. Deletions and duplications are assigned ACMG-based pathogenicity categories based on curated evidence:

ACMG Category Interpretation
Pathogenic Strong evidence for pathogenicity
Likely Pathogenic Moderate evidence for pathogenicity
Uncertain Significance Insufficient evidence for classification
Likely Benign Moderate evidence for benignity
Benign Strong evidence for benignity

Classification Basis: - Overlap with dosage-sensitive genes (ClinGen dosage sensitivity maps) - Inheritance patterns - Population frequency (gnomAD) - Clinical evidence (ClinVar, literature)

Current Limitations: ACMG scoring is currently omitted for translocations, inversions, and insertions in AnnotSV, reflecting limitations in available evidence and consensus criteria for these variant types.

5 Phenotype-Specific Filtering

5.1 Candidate Gene Filtering

The candidateGenesFile parameter is set to a curated gene list derived from the Open Targets platform, ensuring that only variants within or proximal to genes with non-zero gene–disease association (GDA) scores are retained.

Gene List Generation: 1. Query Open Targets platform with phenotype term (e.g., “hearing loss”) 2. Extract genes with non-zero GDA scores 3. Generate text file with one gene per line 4. Supply to AnnotSV via candidateGenesFile parameter

Distance Threshold: Variants with at least one breakpoint within 10 kb of a phenotype-associated gene are retained.

Rationale: This filtering ensures that identified variants are prioritized for known disease associations, reducing false positives and focusing manual review on actionable candidates.

5.2 HPO-Based Annotation

The Human Phenotype Ontology (HPO) term corresponding to the trait under investigation (e.g., HP:0000365 for “hearing loss”) is provided to AnnotSV, enabling:

  • Phenotype-linked annotation appending
  • Cross-reference with phenotype-disease relationships
  • Enrichment of clinical context in output

5.3 SNV/Indel Cross-Referencing

The candidateSnvIndelFiles parameter is supplied with the SNV/indel VCF for each sample, allowing AnnotSV to:

  • Cross-reference small variants when assessing SV zygosity
  • Improve confidence in genotype interpretation for deletions overlapping coding regions
  • Reduce false positives through concordance checking

Example: A deletion spanning a gene can be validated by detecting no heterozygous SNVs within the deleted region (consistent with homozygous deletion).

6 Output Format

The final output of the IMPACT-SV module consists of per-sample SV_IMPACT.tsv files in AnnotSV format, consolidating:

  • AnnotSV annotations: Comprehensive variant-level functional information
  • Phenotype-aware filtering: Candidate genes and HPO annotations applied
  • ACMG classifications: Standardized pathogenicity assessment
  • Sample-level genotype information: Zygosity and supporting evidence

File Structure: Tab-separated values with AnnotSV standard columns plus sample-specific genotype fields

6.1 Integration with IMPACT-VIS

SV_IMPACT.tsv files are formatted for direct input into the IMPACT-VIS visualization module, enabling:

  • Unified variant review: SVs visualized alongside SNVs, indels, and CNVs
  • Phenotype-aware prioritization: High-priority variants automatically highlighted
  • Interactive exploration: Click-through to extended annotations and external databases
  • Systematic curation: Per-sample classifications and notes persisted for reproducibility

7 Computational Considerations

File Size: AnnotSV output typically generates 50+ columns per variant, resulting in files ranging from 500 KB to several MB depending on SV count.

Processing Time: Full AnnotSV annotation (100-500 SVs per sample) typically requires 30-120 seconds depending on annotation database performance.

Reproducibility: All AnnotSV parameters and gene lists are documented alongside output files, enabling downstream regeneration of annotations if needed.