SV File Format Specification
Detailed specification for AnnotSV structural variant files
1 SV File Format Specification
Note
This page provides detailed field-level documentation for AnnotSV format. For quick reference, see Overview.
1.1 Overview
Structural variant data uses AnnotSV output format (TSV).
Format: Tab-separated values (TSV) Extension: .tsv Tool: AnnotSV (https://lbgi.fr/AnnotSV/)
1.2 Required Columns
1.2.1 Coordinate Columns
| Column | Type | Description | Example |
|---|---|---|---|
SV_chrom |
character | Chromosome | chr1 |
SV_start |
integer | Start position (1-based) | 12345678 |
SV_end |
integer | End position | 12456789 |
SV_length |
integer | SV length in bp | 111111 |
SV_type |
character | SV type | DEL, DUP, INV, BND |
1.2.2 Annotation Columns
| Column | Type | Description | Values |
|---|---|---|---|
Annotation_mode |
character | Split or full annotation | split, full |
Gene_name |
character | Overlapping gene(s) | TP53, BRCA1 |
Location |
character | Functional location | exonic, intronic, intergenic |
Exon_count |
integer | Number of exons affected | 1, 2, … |
1.2.3 Impact Scores
| Column | Type | Description | Range |
|---|---|---|---|
ACMG_class |
integer | ACMG classification | 1-5 |
AnnotSV_ranking_score |
numeric | Pathogenicity score | 0-1 |
AnnotSV_ranking_criteria |
character | Ranking justification | Text |
1.2.4 Frequency Data
| Column | Type | Description | Source |
|---|---|---|---|
DGV_GAIN_n_samples_tested |
integer | DGV sample count | DGV |
DGV_LOSS_n_samples_tested |
integer | DGV sample count | DGV |
gnomAD_pLI |
numeric | LoF intolerance | gnomAD |
1.2.5 Genotype Information
| Column | Type | Description | Values |
|---|---|---|---|
Samples_ID |
character | Sample identifier(s) | Comma-separated |
FORMAT |
character | Genotype format | GT:DP:GQ |
| Sample columns | character | Per-sample genotypes | 0/1:25:99 |
1.3 Optional Columns
| Column | Type | Description |
|---|---|---|
CytoBand |
character | Cytogenetic band |
Overlapped_CDS_percent |
numeric | % CDS overlap |
Promoter |
character | Promoter overlap |
Phenotype_HPO |
character | HPO terms |
OMIM_morbid |
character | OMIM disease genes |
1.4 File Example
SV_chrom SV_start SV_end SV_length SV_type Gene_name Location ACMG_class AnnotSV_ranking_score
chr1 12345678 12456789 111111 DEL TP53 exonic 5 0.95
chr2 23456789 23567890 111101 DUP BRCA1 intronic 3 0.65
chr17 41196311 41277500 81189 DEL BRCA1 exonic 5 0.99
1.5 Validation Rules
Important
SV files must pass validation before loading:
- Required Columns: All coordinate and annotation columns present
- Data Types: Numeric columns contain valid numbers
- Coordinate Validity: Start < End, positions within chromosome bounds
- SV Type: One of: DEL, DUP, INV, BND, INS
- ACMG Class: Integer 1-5 (if present)
- File Encoding: UTF-8 or ASCII
1.6 Creating AnnotSV Files
# Run AnnotSV on VCF
AnnotSV \
-SVinputFile input.vcf \
-outputFile output.tsv \
-genomeBuild GRCh38 \
-annotationsDir /path/to/annotations \
-SVminSize 501.7 File Size Considerations
| SV Count | Columns | File Size |
|---|---|---|
| 100 | 50 | ~500 KB |
| 1,000 | 50 | ~5 MB |
| 10,000 | 50 | ~50 MB |
1.8 Performance Tips
- Filtering: Pre-filter VCF before AnnotSV to reduce file size
- Split Mode: Use split annotation for gene-level resolution
- Column Selection: Remove unused columns to reduce memory
- Compression: Gzip TSV files for storage (gunzip before loading)
1.9 See Also
- [SV Processing Methods](../methods/03-sv-processing.qmd)
- [Data Preparation Guide](../guides/02-data-preparation.qmd)
- [Data Manager API](../reference/data_manager.qmd)