GDS File Format Specification
Detailed specification for SNV/Indel GDS files
1 GDS File Format Specification
Note
This page provides detailed field-level documentation for GDS format. For quick reference, see Overview.
1.1 Overview
GDS (Genomic Data Structure) files store SNV and Indel data in a hierarchical, compressed format based on SeqArray.
Format: HDF5-based binary Extension: .gds Library: SeqArray (Bioconductor)
1.2 Required Structure
1.2.1 Variant-Level Annotations
| Field | Type | Description | Example |
|---|---|---|---|
variant.id |
integer | Unique variant identifier | 1, 2, 3, ... |
position |
integer | Chromosomal position (1-based) | 12345678 |
chromosome |
integer/character | Chromosome identifier (SeqArray) | 1, 2, …, 23 (=X) |
allele |
character | Reference/alternate alleles | A,G |
annotation/info/impact_score |
numeric | Numerical IMPACT severity score (0–100) | 85.5 |
annotation/info/impact_score_calc |
character | Scoring method | 80 + 20 * 0.275 |
annotation/info/tier |
integer | Tier classification (1–4) | 1 |
1.2.2 Genotype Data
| Field | Type | Description |
|---|---|---|
genotype |
integer | Genotype array (0/1/2 encoding, SeqArray standard) |
sample.id |
character | Sample identifiers |
$dosage |
integer | Dosage matrix used by IMPACT-VIS for genotype-derived fields |
$dosage_alt |
integer | Alternate allele dosage used for genotype filtering |
1.2.3 Functional Annotations
| Field | Type | Description | Source |
|---|---|---|---|
annotation/info/FunctionalAnnotation/VarInfo |
character | FAVOR variant identifier string used as a stable per-variant key | FAVOR |
annotation/info/FunctionalAnnotation/Consequence |
character | Variant consequence terms | Ensembl VEP (via favorannotator) |
annotation/info/FunctionalAnnotation/clnsig |
character | ClinVar clinical significance | ClinVar (via favorannotator) |
annotation/info/FunctionalAnnotation/clndn |
character | ClinVar disease name | ClinVar (via favorannotator) |
annotation/info/FunctionalAnnotation/bravo_af |
numeric | BRAVO allele frequency | BRAVO (via favorannotator) |
annotation/info/FunctionalAnnotation/gnomad_af |
numeric | gnomAD allele frequency | gnomAD (via favorannotator) |
1.3 Optional Annotations
| Field | Type | Description |
|---|---|---|
annotation/info/CADD_score |
numeric | CADD deleteriousness |
annotation/info/REVEL_score |
numeric | REVEL pathogenicity |
annotation/info/SIFT_pred |
character | SIFT prediction |
annotation/info/PolyPhen_pred |
character | PolyPhen prediction |
1.4 Validation Rules
Important
GDS files must pass validation before loading:
- File Structure: Must be valid SeqArray GDS format
- Required Fields: Must include SeqArray core nodes, plus IMPACT annotations (
annotation/info/impact_scoreorannotation/info/impact_score_calc, andannotation/info/FunctionalAnnotation/VarInfo) - Data Types: Correct types for each field
- Coordinate Validity: Positions within chromosome bounds
- Allele Format: REF,ALT comma-separated
1.5 Creating GDS from VCF
library(SeqArray)
# Convert VCF to GDS
seqVCF2GDS(
vcf.fn = "input.vcf.gz",
out.fn = "output.gds",
storage.option = "LZMA_RA",
verbose = TRUE
)
# Verify structure
gds <- seqOpen("output.gds")
seqSummary(gds)
seqClose(gds)