DNA Testing Guide

What Is Raw DNA Data? A Complete Beginner's Guide

If you have ever taken a DNA test from a service like 23andMe, AncestryDNA, or Helixline, your results go far beyond the colorful ancestry pie chart on your screen. Behind those visual reports lies something far more fundamental: your raw DNA data. This file is the actual output of your genetic test, and understanding what it is and how to use it opens up a world of possibilities for exploring your genome.

In this guide, we will break down exactly what raw DNA data is, what it looks like when you open the file, how different testing companies format it, how to download it, and what you can do with it once you have it in your hands. Whether you are a first-time DNA test customer or a genetics enthusiast looking to go deeper, this guide has everything you need.

Key Takeaway: Raw DNA data is a plain text file containing hundreds of thousands of rows, each representing a single genetic marker (SNP) in your genome. It records which two letters (alleles) you carry at each tested position on your chromosomes. This file is the foundation for all the ancestry, health, and trait reports you receive from your DNA testing company.

What Exactly Is Raw DNA Data?

When you spit into a tube or swab your cheek for a DNA test, the laboratory does not read your entire genome. Instead, it uses a technology called SNP genotyping (pronounced "snip") to read specific positions in your DNA that are known to vary between people. These variable positions are called single nucleotide polymorphisms, or SNPs.

Your genome is made up of about 3.2 billion base pairs of DNA, but any two humans share roughly 99.9% of their DNA sequence. The remaining 0.1% -- around 4 to 5 million positions -- is where the interesting variation occurs. SNP genotyping chips typically test between 600,000 and 900,000 of these variable positions, selecting the ones that are most informative for ancestry, health, and trait analysis.

The raw DNA data file is simply the complete output of this genotyping process. It is a text file where each row represents one SNP and records:

What Does Raw DNA Data Look Like?

When you open a raw DNA data file in a text editor, you will see something that looks like a very long spreadsheet without the gridlines. Here is an annotated example of what a typical 23andMe-format raw data file looks like:

# This data file generated by 23andMe at: Wed Feb 12 2026 10:30:00 UTC
#
# This file contains raw genotype data, including data that is not used in
# 23andMe reports. This data has undergone a general quality review however
# only a subset of markers have been individually validated for accuracy.
#
# rsid chromosome position genotype
rs12564807 1 734462 AA
rs3131972 1 752721 AG
rs12124819 1 776546 AA
rs1110052 1 838555 GG
rs1815606 1 844113 AG
rs7537756 1 854250 AG
rs13302982 1 861808 GG
rs1799945 6 26093141 CG
rs1800562 6 26093017 GG
rs4988235 2 136608646 CT

Line-by-Line Explanation

Let us walk through several of these lines to understand exactly what the data is telling us:

Understanding Genotypes: At every SNP position, you have two alleles. If both are the same (e.g., AA or GG), you are homozygous at that position. If they differ (e.g., AG or CT), you are heterozygous. The specific combination determines everything from your eye color to your disease risk to your ancestry composition.

Raw Data Formats Across DNA Testing Companies

Not all DNA testing companies produce raw data in the same format. While the fundamental information is the same (rsID, chromosome, position, genotype), the file structure, column order, number of SNPs tested, and reference genome build can differ significantly. Understanding these differences is important when you want to upload your data to third-party analysis tools.

Company File Format SNPs Tested File Size Genome Build Compatibility
23andMe (v5) .txt (tab-separated) ~640,000 ~15 MB (unzipped) GRCh37 (hg19) Widely accepted by most third-party tools
AncestryDNA .txt (tab-separated) ~700,000 ~18 MB (unzipped) GRCh37 (hg19) Accepted by most tools; some require format conversion
MyHeritage .csv (comma-separated) ~720,000 ~19 MB (unzipped) GRCh37 (hg19) Good compatibility; CSV format differs slightly
FamilyTreeDNA .csv (comma-separated) ~700,000 ~17 MB (unzipped) GRCh37 (hg19) / GRCh38 Widely compatible; some older files use build 36
Helixline .txt (tab-separated) ~850,000 ~22 MB (unzipped) GRCh38 (hg38) Compatible with all major tools; includes South Asian-specific markers
Whole Genome (VCF) .vcf (VCF format) 4,000,000+ 1-5 GB GRCh38 (hg38) Research-grade; requires bioinformatics tools to process

Key Differences Between Formats

While all raw data files contain essentially the same type of information, there are important differences to be aware of:

Understanding VCF Format

If you have had whole genome sequencing (WGS) rather than SNP genotyping, your raw data will be in VCF (Variant Call Format) rather than a simple text file. VCF is the standard file format used in bioinformatics for storing gene sequence variations. It is significantly more complex than SNP genotyping output:

For most consumer genetic testing purposes, SNP genotyping raw data is sufficient. Whole genome sequencing provides more complete data but requires specialized tools and expertise to analyze effectively.

How to Download Your Raw DNA Data

Every major DNA testing company allows you to download your raw data. Here are step-by-step instructions for each platform:

Downloading from 23andMe

  1. Log in to your 23andMe account at 23andme.com
  2. Click on your name in the top-right corner and select Settings
  3. Scroll down to the 23andMe Data section
  4. Click Download Raw Data
  5. Re-enter your password for security verification
  6. Complete the two-step verification if enabled
  7. Select "Submit Request" -- 23andMe will email you when the file is ready
  8. Return to the same page and click the download link (available for 30 days)
  9. The file downloads as a .zip archive containing a single .txt file

Downloading from AncestryDNA

  1. Log in to your Ancestry account at ancestry.com
  2. Click the DNA tab in the top navigation
  3. Click Settings (gear icon) on your DNA home page
  4. Scroll to Download DNA Data under the Actions section
  5. Click Get Started
  6. Confirm your identity by re-entering your password
  7. Ancestry will send a confirmation email -- click the link in the email
  8. Return to the settings page and click Download DNA Raw Data
  9. The file downloads as a .zip archive containing a .txt file

Downloading from Helixline

  1. Log in to your Helixline account at helixline.in
  2. Navigate to your Dashboard
  3. Click My DNA in the sidebar menu
  4. Select Download Raw Data
  5. Verify your identity through password re-entry
  6. Choose your preferred format: Helixline native format or 23andMe-compatible format
  7. Click Download -- the file is generated immediately
  8. The file downloads as a .zip archive containing your raw data file

Important Note: Always store your downloaded raw DNA data file in a secure location on your computer or in an encrypted cloud storage service. This file contains sensitive genetic information. Treat it with the same care you would give to your medical records or financial documents.

What Can You Do With Raw DNA Data?

Once you have downloaded your raw data file, you can use it for a wide range of analyses that go beyond what your original testing company provides. Here are the most popular use cases:

1. Health and Trait Analysis

Several third-party services accept raw DNA data uploads and provide detailed health and trait reports:

2. Genetic Genealogy and Finding Relatives

Uploading your raw data to genetic genealogy databases can help you find relatives who tested with different companies:

3. Ancestry and Population Analysis

Beyond the standard ancestry report from your testing company, raw data enables deeper population-level analysis:

4. Upload to Helixline for India-Specific Analysis

If you tested with another company but want Indian-specific insights, Helixline accepts raw data uploads from 23andMe, AncestryDNA, MyHeritage, and FamilyTreeDNA. When you upload your data to Helixline, you receive:

Get More From Your DNA Data

Already tested with another company? Upload your raw DNA data to Helixline for India-specific ancestry, haplogroup, and wellness insights.

Upload Your Data

Privacy Considerations When Sharing Raw DNA Data

Your raw DNA data is among the most sensitive personal information you possess. Unlike a password or credit card number, you cannot change your DNA if it is compromised. Before sharing or uploading your raw data anywhere, carefully consider these privacy factors:

What Your DNA Data Can Reveal

Best Practices for DNA Data Privacy

  1. Read privacy policies carefully before uploading to any third-party service. Look for clear statements about data storage, sharing with third parties, and whether your data is used for research.
  2. Check deletion options: Only use services that allow you to delete your uploaded data and account at any time. Verify that deletion is genuine and not just hiding your data from view.
  3. Use strong passwords and enable two-factor authentication on any account that stores your genetic data.
  4. Be cautious with public databases: Some genealogy databases are publicly searchable. Understand the privacy settings available and choose the level of visibility you are comfortable with.
  5. Consider pseudonymous uploads: Some services allow you to upload under a pseudonym or alias. This provides a layer of privacy while still enabling analysis.
  6. Never post raw data on public forums or social media, even partial snippets. A small subset of SNPs can be enough to identify an individual or infer sensitive health information.
  7. Encrypt your stored files: If you store your raw data file on your computer or cloud storage, consider encrypting it. Most operating systems offer built-in encryption options.

How Helixline Handles Raw DNA Data

At Helixline, we take the security and privacy of your genetic data extremely seriously. Here is how we handle raw DNA data:

Helixline Privacy Promise: Your DNA belongs to you. We believe you should have full control over your genetic data, including the ability to download it, upload it elsewhere, or delete it permanently. We are custodians of your data, not owners.

Common Questions About Raw DNA Data File Structure

What Do the Chromosome Numbers Mean?

Humans have 23 pairs of chromosomes. In your raw data file, these are numbered as follows:

What Do "No Call" or "--" Entries Mean?

You may notice some entries in your raw data that show "--" or "00" or "NC" instead of a normal genotype like "AA" or "AG". These are called no-calls and mean that the genotyping chip was unable to determine your genotype at that particular position. This can happen due to:

No-calls are normal and typically affect only 1-3% of the SNPs in your raw data. They do not indicate a problem with your DNA; they simply mean the measurement was not reliable enough to report. Most analyses will simply skip these positions.

What Is the "i" Prefix in Some SNP Names?

In 23andMe raw data files, you may notice SNP identifiers that start with "i" instead of "rs" (for example, i3000027 instead of rs3000027). These are internal identifiers used by 23andMe for SNPs that either:

These "i-number" SNPs may not be recognized by all third-party analysis tools, as they are specific to 23andMe's platform. However, many popular tools like Promethease and GEDmatch can interpret them.

Understanding Genotype Notation

The genotype column in your raw data uses standard IUPAC nucleotide codes. Here is what each letter means and the possible genotypes you can encounter:

Letter Nucleotide Full Name Common Pairings
A Adenine A purine base AA, AG, AC, AT
G Guanine A purine base GG, GA, GC, GT
C Cytosine A pyrimidine base CC, CA, CG, CT
T Thymine A pyrimidine base TT, TA, TG, TC
D Deletion A base was deleted DD, DI
I Insertion A base was inserted II, ID

Most SNPs in your raw data will involve only two possible alleles (for example, A or G at a particular position). The three possible genotypes for such a SNP would be AA (homozygous for the A allele), AG (heterozygous), and GG (homozygous for the G allele). Which allele is considered "reference" and which is "alternate" depends on the reference genome.

How Raw DNA Data Powers Your Reports

Understanding how testing companies transform your raw data into meaningful reports helps you appreciate both the power and the limitations of consumer genetic testing:

Ancestry Reports

Ancestry analysis works by comparing your genotypes at thousands of ancestry-informative markers (AIMs) against reference panels of populations with known geographic origins. The algorithm calculates the statistical likelihood that your DNA pattern at each marker came from various reference populations, then combines these probabilities across all markers to produce your ancestry composition percentages.

Different companies use different reference panels and algorithms, which is why your ancestry results may vary between services. Helixline uses a reference panel that includes detailed representation of Indian subpopulations, providing more granular South Asian ancestry results than services that group all of India into a single category.

Health Reports

Health-related insights are derived by looking up specific SNPs in your raw data that have been associated with health conditions in published scientific research (genome-wide association studies, or GWAS). For each health-related SNP, the report checks which genotype you carry and references the scientific literature to determine what that genotype is associated with.

It is important to understand that most health-related SNPs identified through GWAS contribute only a small amount of risk for any given condition. Having a "risk" genotype does not mean you will develop a condition; it means your statistical probability may be slightly higher or lower than average.

Relative Matching

When two people upload their raw data to the same platform, the system compares their genotypes across all shared SNPs. If two people share long stretches of identical genotypes (called identical-by-descent segments, or IBD), it indicates they share a recent common ancestor. The total amount of shared DNA (measured in centimorgans, or cM) determines the likely relationship.

Frequently Asked Questions

What does raw DNA data look like?

Raw DNA data is a plain text file that you can open in any text editor or spreadsheet application. Each row contains four pieces of information: an rsID (a unique identifier like rs1234567), a chromosome number (1-22, X, Y, or MT), a position (numerical coordinate on the chromosome), and your genotype (two letters like AA, AG, or GG). A typical file contains 600,000 to 900,000 such rows, with comment lines at the top preceded by a "#" symbol. The file is usually 15 to 25 megabytes when unzipped and can be opened in Notepad, TextEdit, Excel, or any text editor.

Can I download my raw DNA data?

Yes, all major DNA testing companies provide the option to download your raw DNA data. With 23andMe, go to Settings and then 23andMe Data and then Download Raw Data. With AncestryDNA, visit Settings, then DNA Settings, then Download DNA Data. With Helixline, navigate to your Dashboard, click My DNA, and select Download Raw Data. The process typically requires re-entering your password for security, and the file is usually ready within a few minutes. The download comes as a compressed .zip file containing your raw data as a text file. You own this data and have the right to download it at any time.

Is it safe to share raw DNA data?

Sharing raw DNA data carries meaningful privacy risks and should be done with careful consideration. Your DNA data can reveal sensitive health information (like BRCA gene variants), family secrets (non-paternity, unknown siblings), ethnic ancestry, and can potentially be used to identify you or your relatives. Only upload your data to reputable services with clear privacy policies, data encryption, and deletion options. Never post raw data on public forums or social media. That said, sharing data with trusted third-party tools like Promethease, GEDmatch, or Helixline can provide valuable insights when done thoughtfully. Always read the privacy policy and terms of service before uploading.

What can I do with raw DNA data?

Raw DNA data unlocks a wide range of analyses beyond your original testing company's reports. You can upload it to health analysis tools like Promethease for detailed health and trait reports, or to genetic genealogy platforms like GEDmatch to find DNA matches from people who tested with different companies. You can explore alternative ancestry calculators like HarappaWorld for more detailed South Asian ancestry breakdowns. You can check for carrier status on specific genetic conditions using tools like Codegen.eu. And if you tested with another company, you can upload your raw data to Helixline for India-specific ancestry, haplogroup, and wellness analysis tailored to South Asian genetics.

Conclusion

Your raw DNA data file is the most fundamental output of any genetic test. While the ancestry pie charts and health reports are easier to understand at a glance, the raw data file is where the real power lies. It is a portable record of hundreds of thousands of your genetic variants that you can take with you to any analysis platform, now or in the future.

Understanding what this file contains -- the rsIDs, chromosomes, positions, and genotypes that make up your unique genetic fingerprint -- empowers you to make informed decisions about how to use your genetic information. Whether you want to explore your deep ancestry through admixture calculators, find long-lost relatives through genetic genealogy, investigate health-related traits, or simply keep a copy of your genetic data for future use, the raw data file is your starting point.

As genetic science advances and new analysis tools are developed, having your raw DNA data on hand means you can always take advantage of the latest discoveries without needing to take another test. Your DNA does not change, but our ability to interpret it improves every year.

Ready to explore your DNA? Order your Helixline DNA kit or upload your existing raw data to get India-specific ancestry and wellness insights that go beyond the generic.