CNV File Format Specification

Detailed specification for copy number variant files

1 CNV File Format Specification

Note

This page provides detailed field-level documentation for CNV format. For quick reference, see Overview.

1.1 Overview

Copy number variant data uses a tab-delimited curation format (TXT, no header). This format persists per-sample CNV annotations and classifications generated during the IMPACT-CNV preprocessing and manual review workflow.

Format: Tab-delimited text (no header row) Extension: .txt Delimiter: Tab (\t) Source: IMPACT-CNV output, curated via IMPACT-VIS interface

1.2 Column Structure

The CNV file contains exactly 7 tab-separated columns (no header):

Column # Field Name Type Description Example
1 Timestamp numeric Unix timestamp of record creation 1752094007.21964
2 Sample_Interval character Variant identifier: sample.chr.start.end.type test_sample.12.8863035.8865835.DEL
3 Interpretation character QC result: Passed or Failed Passed
4 Classification character Curation classification (see below) Further Review - Potentially Reportable
5 Evidence character Curated evidence summary or NA Het DEL of CACNA1C. Further Review needed
6 Date_Time character Last modification timestamp (ISO 8601) 2025-07-09T16:17:19Z
7 Username character Curator username nboehler

1.3 Classification Values

The Classification column (column 4) contains curation classifications:

Classification Description QC Status
Ruled Out - Quality Inadequate / Difficult to Assess CNV failed quality checks Failed
Ruled Out - Incorrect Boundary, Fully Intronic Breakpoints incorrect, no exonic overlap Failed
Ruled Out - Population Variation Common CNV in gnomAD or similar database Passed
Further Review - Not Likely Reportable Likely benign; documented for completeness Passed
Further Review - Potentially Reportable Requires additional review; possible pathogenicity Passed
Not Evaluated CNV not yet curated Failed

1.4 File Example

1752094007.21964    test_sample.12.8863035.8865835.DEL  Failed  Ruled Out - Quality Inadequate / Difficult to Assess    NA  2025-07-09T16:46:47Z    nboehler
1752092239.12654    test_sample.12.2051740.2054739.DEL  Passed  Further Review - Potentially Reportable Het DEL of CACNA1C. Further Review needed   2025-07-09T16:17:19Z    nboehler
1752091645.91913    test_sample.5.76790253.76840052.DUP Passed  Further Review - Potentially Reportable DUP of F2RL1. Further Review needed.    2025-07-09T16:07:25Z    nboehler
1752062920.41666    test_sample.7.33090339.33148281.DUP Passed  Further Review - Potentially Reportable DUP overlaps BBS9, RP9. No clear pathogenicity known.   2025-07-09T08:08:40Z    nboehler

1.5 Validation Rules

Important

CNV files must pass validation before loading:

  1. Column Count: Exactly 7 tab-separated columns (no header)
  2. Delimiter: Tab character (\t) separating all fields
  3. No Header Row: First row is data, not column names
  4. Timestamp Format: Column 1 is Unix timestamp (numeric)
  5. Sample_Interval Format: Column 2 must match sample.chr.start.end.type
    • sample: Sample ID (must match filename prefix)
    • chr: Chromosome (1-22, X, Y)
    • start, end: Integer genomic coordinates
    • type: CNV type (DEL, DUP, etc.)
  6. Interpretation: Column 3 is Passed or Failed
  7. Data Types:
    • Column 1 (Timestamp): Numeric (Unix time)
    • Columns 2-7: Character/string
  8. Sample Consistency: All records should belong to same sample (extracted from filename)

1.6 Sample_Interval Parsing

The Sample_Interval field (column 2) encodes genomic and variant information:

sample_id.chr.start.end.type

├─ sample_id: Matches filename prefix (e.g., "test_sample")
├─ chr: Chromosome (1, 2, ..., 22, X, Y)
├─ start: Start position (1-based, integer)
├─ end: End position (1-based, integer)
└─ type: Variant type (DEL for deletion, DUP for duplication)

Parsing Example:

test_sample.12.8863035.8865835.DEL
        ↓      ↓   ↓        ↓      ↓
   sample     chr  start    end   type

This encoding allows IMPACT-VIS to extract genomic coordinates and determine variant classification without requiring separate coordinate columns.

1.7 QC Status & Classification

The Interpretation field indicates QC pass/fail status: - Passed: CNV passed quality checks; may still be ruled out by curation - Failed: CNV failed quality checks; flagged for manual review or exclusion

The Classification field provides detailed curation reasoning (see table above).

1.8 Creating and Curating CNV Files

1.8.1 CNV File Generation Workflow

CNV files are generated by the IMPACT-CNV preprocessing module:

  1. VCF to SCIP Conversion: CNV VCF → SCIP-compatible coordinates (chr, start, end, CN)
  2. Quality Validation: Read-depth verification via CRAM inspection
  3. Phenotype-Aware Prioritization: SCIP backend filtering and ranking
  4. Output Generation: Initial CNV_IMPACT.txt with Passed/Failed QC status

See IMPACT-CNV Documentation for preprocessing details.

1.8.2 Manual Curation in IMPACT-VIS

Once CNV files are loaded into IMPACT-VIS:

  1. Interactive Review: Users inspect each CNV in the CNV panel
  2. Classification: Assign curation classification (Ruled Out, Further Review, etc.)
  3. Evidence Entry: Document reasoning and supporting evidence
  4. Persistence: Classifications saved to RDS state files (one per sample)
  5. Export: Final curated CNV_IMPACT.txt files (with updated timestamps and classifications) exported for downstream analysis

File Format Note: The CNV_IMPACT.txt file is a curation record, not raw variant data. Each row represents one curator action (QC pass/fail, classification assignment) with timestamp and curator name for audit trail.

1.8.3 Example Curation Workflow

Original QC Output (from IMPACT-CNV):
1751234567.00000    test_sample.1.12345678.12456789.DEL Failed  QC Review Pending   NA  2025-01-01T00:00:00Z    system

After Curator Review & Classification:
1752094007.21964    test_sample.1.12345678.12456789.DEL Passed  Ruled Out - Population Variation    Common in gnomAD; benign    2025-07-09T16:46:47Z    curator_name

1.9 File Size Considerations

Segments File Size
100 ~5 KB
1,000 ~50 KB
10,000 ~500 KB

CNV files are typically much smaller than SNV/SV files.

1.10 Performance Tips

  1. Merging: Merge adjacent segments with same CN for smaller files
  2. Filtering: Remove segments with cn=2 (normal) to reduce size
  3. Quality: Pre-filter low-quality segments before loading
  4. Sorting: Sort by chr/start for efficient loading

1.11 Clinical Interpretation

1.11.1 Pathogenic CNVs

Focus on: - Deletions (cn=0,1) in tumor suppressor genes - Amplifications (cn≥4) in oncogenes - Large segments (>1 Mb) affecting multiple genes

1.11.2 Common Artifacts

Watch for: - Centromeric/telomeric regions (often noisy) - Short segments (<10 KB, may be false positives) - Segments with very few probes (<10)

1.12 See Also