CNV File Format Specification
1 CNV File Format Specification
This page provides detailed field-level documentation for CNV format. For quick reference, see Overview.
1.1 Overview
Copy number variant data uses a tab-delimited curation format (TXT, no header). This format persists per-sample CNV annotations and classifications generated during the IMPACT-CNV preprocessing and manual review workflow.
Format: Tab-delimited text (no header row) Extension: .txt Delimiter: Tab (\t) Source: IMPACT-CNV output, curated via IMPACT-VIS interface
1.2 Column Structure
The CNV file contains exactly 7 tab-separated columns (no header):
| Column # | Field Name | Type | Description | Example |
|---|---|---|---|---|
| 1 | Timestamp |
numeric | Unix timestamp of record creation | 1752094007.21964 |
| 2 | Sample_Interval |
character | Variant identifier: sample.chr.start.end.type |
test_sample.12.8863035.8865835.DEL |
| 3 | Interpretation |
character | QC result: Passed or Failed |
Passed |
| 4 | Classification |
character | Curation classification (see below) | Further Review - Potentially Reportable |
| 5 | Evidence |
character | Curated evidence summary or NA |
Het DEL of CACNA1C. Further Review needed |
| 6 | Date_Time |
character | Last modification timestamp (ISO 8601) | 2025-07-09T16:17:19Z |
| 7 | Username |
character | Curator username | nboehler |
1.3 Classification Values
The Classification column (column 4) contains curation classifications:
| Classification | Description | QC Status |
|---|---|---|
Ruled Out - Quality Inadequate / Difficult to Assess |
CNV failed quality checks | Failed |
Ruled Out - Incorrect Boundary, Fully Intronic |
Breakpoints incorrect, no exonic overlap | Failed |
Ruled Out - Population Variation |
Common CNV in gnomAD or similar database | Passed |
Further Review - Not Likely Reportable |
Likely benign; documented for completeness | Passed |
Further Review - Potentially Reportable |
Requires additional review; possible pathogenicity | Passed |
Not Evaluated |
CNV not yet curated | Failed |
1.4 File Example
1752094007.21964 test_sample.12.8863035.8865835.DEL Failed Ruled Out - Quality Inadequate / Difficult to Assess NA 2025-07-09T16:46:47Z nboehler
1752092239.12654 test_sample.12.2051740.2054739.DEL Passed Further Review - Potentially Reportable Het DEL of CACNA1C. Further Review needed 2025-07-09T16:17:19Z nboehler
1752091645.91913 test_sample.5.76790253.76840052.DUP Passed Further Review - Potentially Reportable DUP of F2RL1. Further Review needed. 2025-07-09T16:07:25Z nboehler
1752062920.41666 test_sample.7.33090339.33148281.DUP Passed Further Review - Potentially Reportable DUP overlaps BBS9, RP9. No clear pathogenicity known. 2025-07-09T08:08:40Z nboehler1.5 Validation Rules
CNV files must pass validation before loading:
- Column Count: Exactly 7 tab-separated columns (no header)
- Delimiter: Tab character (
\t) separating all fields - No Header Row: First row is data, not column names
- Timestamp Format: Column 1 is Unix timestamp (numeric)
- Sample_Interval Format: Column 2 must match
sample.chr.start.end.typesample: Sample ID (must match filename prefix)chr: Chromosome (1-22, X, Y)start,end: Integer genomic coordinatestype: CNV type (DEL, DUP, etc.)
- Interpretation: Column 3 is
PassedorFailed - Data Types:
- Column 1 (Timestamp): Numeric (Unix time)
- Columns 2-7: Character/string
- Sample Consistency: All records should belong to same sample (extracted from filename)
1.6 Sample_Interval Parsing
The Sample_Interval field (column 2) encodes genomic and variant information:
sample_id.chr.start.end.type
├─ sample_id: Matches filename prefix (e.g., "test_sample")
├─ chr: Chromosome (1, 2, ..., 22, X, Y)
├─ start: Start position (1-based, integer)
├─ end: End position (1-based, integer)
└─ type: Variant type (DEL for deletion, DUP for duplication)
Parsing Example:
test_sample.12.8863035.8865835.DEL
↓ ↓ ↓ ↓ ↓
sample chr start end type
This encoding allows IMPACT-VIS to extract genomic coordinates and determine variant classification without requiring separate coordinate columns.
1.7 QC Status & Classification
The Interpretation field indicates QC pass/fail status: - Passed: CNV passed quality checks; may still be ruled out by curation - Failed: CNV failed quality checks; flagged for manual review or exclusion
The Classification field provides detailed curation reasoning (see table above).
1.8 Creating and Curating CNV Files
1.8.1 CNV File Generation Workflow
CNV files are generated by the IMPACT-CNV preprocessing module:
- VCF to SCIP Conversion: CNV VCF → SCIP-compatible coordinates (chr, start, end, CN)
- Quality Validation: Read-depth verification via CRAM inspection
- Phenotype-Aware Prioritization: SCIP backend filtering and ranking
- Output Generation: Initial CNV_IMPACT.txt with Passed/Failed QC status
See IMPACT-CNV Documentation for preprocessing details.
1.8.2 Manual Curation in IMPACT-VIS
Once CNV files are loaded into IMPACT-VIS:
- Interactive Review: Users inspect each CNV in the CNV panel
- Classification: Assign curation classification (Ruled Out, Further Review, etc.)
- Evidence Entry: Document reasoning and supporting evidence
- Persistence: Classifications saved to RDS state files (one per sample)
- Export: Final curated CNV_IMPACT.txt files (with updated timestamps and classifications) exported for downstream analysis
File Format Note: The CNV_IMPACT.txt file is a curation record, not raw variant data. Each row represents one curator action (QC pass/fail, classification assignment) with timestamp and curator name for audit trail.
1.8.3 Example Curation Workflow
Original QC Output (from IMPACT-CNV):
1751234567.00000 test_sample.1.12345678.12456789.DEL Failed QC Review Pending NA 2025-01-01T00:00:00Z system
After Curator Review & Classification:
1752094007.21964 test_sample.1.12345678.12456789.DEL Passed Ruled Out - Population Variation Common in gnomAD; benign 2025-07-09T16:46:47Z curator_name
1.9 File Size Considerations
| Segments | File Size |
|---|---|
| 100 | ~5 KB |
| 1,000 | ~50 KB |
| 10,000 | ~500 KB |
CNV files are typically much smaller than SNV/SV files.
1.10 Performance Tips
- Merging: Merge adjacent segments with same CN for smaller files
- Filtering: Remove segments with cn=2 (normal) to reduce size
- Quality: Pre-filter low-quality segments before loading
- Sorting: Sort by chr/start for efficient loading
1.11 Clinical Interpretation
1.11.1 Pathogenic CNVs
Focus on: - Deletions (cn=0,1) in tumor suppressor genes - Amplifications (cn≥4) in oncogenes - Large segments (>1 Mb) affecting multiple genes
1.11.2 Common Artifacts
Watch for: - Centromeric/telomeric regions (often noisy) - Short segments (<10 KB, may be false positives) - Segments with very few probes (<10)