SV File Format Specification

Detailed specification for AnnotSV structural variant files

1 SV File Format Specification

Note

This page provides detailed field-level documentation for AnnotSV format. For quick reference, see Overview.

1.1 Overview

Structural variant data uses AnnotSV output format (TSV).

Format: Tab-separated values (TSV) Extension: .tsv Tool: AnnotSV (https://lbgi.fr/AnnotSV/)

1.2 Required Columns

1.2.1 Coordinate Columns

Column Type Description Example
SV_chrom character Chromosome chr1
SV_start integer Start position (1-based) 12345678
SV_end integer End position 12456789
SV_length integer SV length in bp 111111
SV_type character SV type DEL, DUP, INV, BND

1.2.2 Annotation Columns

Column Type Description Values
Annotation_mode character Split or full annotation split, full
Gene_name character Overlapping gene(s) TP53, BRCA1
Location character Functional location exonic, intronic, intergenic
Exon_count integer Number of exons affected 1, 2, …

1.2.3 Impact Scores

Column Type Description Range
ACMG_class integer ACMG classification 1-5
AnnotSV_ranking_score numeric Pathogenicity score 0-1
AnnotSV_ranking_criteria character Ranking justification Text

1.2.4 Frequency Data

Column Type Description Source
DGV_GAIN_n_samples_tested integer DGV sample count DGV
DGV_LOSS_n_samples_tested integer DGV sample count DGV
gnomAD_pLI numeric LoF intolerance gnomAD

1.2.5 Genotype Information

Column Type Description Values
Samples_ID character Sample identifier(s) Comma-separated
FORMAT character Genotype format GT:DP:GQ
Sample columns character Per-sample genotypes 0/1:25:99

1.3 Optional Columns

Column Type Description
CytoBand character Cytogenetic band
Overlapped_CDS_percent numeric % CDS overlap
Promoter character Promoter overlap
Phenotype_HPO character HPO terms
OMIM_morbid character OMIM disease genes

1.4 File Example

SV_chrom    SV_start    SV_end  SV_length   SV_type Gene_name   Location    ACMG_class  AnnotSV_ranking_score
chr1    12345678    12456789    111111  DEL TP53    exonic  5   0.95
chr2    23456789    23567890    111101  DUP BRCA1   intronic    3   0.65
chr17   41196311    41277500    81189   DEL BRCA1   exonic  5   0.99

1.5 Validation Rules

Important

SV files must pass validation before loading:

  1. Required Columns: All coordinate and annotation columns present
  2. Data Types: Numeric columns contain valid numbers
  3. Coordinate Validity: Start < End, positions within chromosome bounds
  4. SV Type: One of: DEL, DUP, INV, BND, INS
  5. ACMG Class: Integer 1-5 (if present)
  6. File Encoding: UTF-8 or ASCII

1.6 Creating AnnotSV Files

# Run AnnotSV on VCF
AnnotSV \
  -SVinputFile input.vcf \
  -outputFile output.tsv \
  -genomeBuild GRCh38 \
  -annotationsDir /path/to/annotations \
  -SVminSize 50

1.7 File Size Considerations

SV Count Columns File Size
100 50 ~500 KB
1,000 50 ~5 MB
10,000 50 ~50 MB

1.8 Performance Tips

  1. Filtering: Pre-filter VCF before AnnotSV to reduce file size
  2. Split Mode: Use split annotation for gene-level resolution
  3. Column Selection: Remove unused columns to reduce memory
  4. Compression: Gzip TSV files for storage (gunzip before loading)

1.9 See Also

- [SV Processing Methods](../methods/03-sv-processing.qmd)
- [Data Preparation Guide](../guides/02-data-preparation.qmd)
- [Data Manager API](../reference/data_manager.qmd)