International HapMap Project

General Background 
In 2001, the International HapMap Consortium launched the International HapMap Project to develop a haplotype map (“HapMap”) of the human genome - a resource that describes the common patterns of human DNA sequence variation. The HapMap has become an important tool for researchers to use to find genes that affect health, disease, and response to drugs and environmental factors. All HapMap data are freely available to the public through the database dbSNP. The Project is described at Nature 426 :789-796, 2003 [PMID: 14685227]. The associated ethical issues are described at Nature Reviews Genetics 5: 467-475, 2004 [PMID: 15153999]. The general process of community engagement used in connection with the collection of samples for the Project is described at Community Genetics, 10(3): 186-198, 2007 [PMID: 17575464]. 

Phase I 
In 2005, the International HapMap Consortium released the Phase I HapMap, a resource consisting of over a million SNP genotypes generated in 269 individuals from four geographically diverse populations: The Yoruba in Ibadan, Nigeria; Japanese in Tokyo, Japan; Han Chinese in Beijing, China; and the CEPH (U.S. Utah residents with ancestry from northern and western Europe). The Phase I HapMap includes data from ten 500-kb regions (the “HapMap ENCODE I regions”) that were sequenced, to assess the genotyping. The Phase I HapMap documents the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbors. This resource has contributed to the design and analysis of genetic association studies. It has shed light on structural variation and recombination and contributed to the identification of loci that may have been subject to natural selection during human evolution. The Phase I HapMap is described at Nature 437:1299-1320 [PMID: 16255080]. 

Phase II 
In 2007, the International HapMap Consortium released the Phase II HapMap, which added over 2.1 million SNPs in the same 269 individuals. The Phase II HapMap enabled a better understanding of how well studies capture patterns of genetic variation, and the potential to increase the power of association studies through imputation. In addition, it improved the resolution of the fine-scale genetic map and location of recombination hotspots and provided new information about the influence of natural selection on protein-changing variants. The Phase II HapMap is described at Nature 449: 851-861, 2007 [PMID: 17943122]. 

Analysis of Samples from Additional Populations – HapMap 3 
U.K. and U.S. investigators expanded the Phase I/II HapMap by genotyping and sequencing additional samples contributed by seven additional populations: Maasai in Kinyawa, Kenya; Luhya in Webuye, Kenya; Chinese in metropolitan Denver, CO, USA; Gujarati Indians in Houston, TX, USA; Toscani in Italia (Tuscans in Italy); African ancestry in the Southwest USA; and Mexican ancestry in Los Angeles, CA, USA. Most of these samples were genotyped for 1.6 million SNPs. A subset of these samples was sequenced at 2 Mb of the ENCODE II regions (20 regions of 100 kb each). This combination of genotyping and sequencing expanded comparison of genome-wide patterns of variation. The HapMap 3 results are described at Nature 467: 52-58, 2010 [PMID: 20811451].

HapMap Samples 
No identifying or phenotype information is available for the HapMap samples that are housed in the NHGRI Repository. All of the samples were collected with extensive community engagement, including discussions with members of the donor communities about the ethical and social implications of human genetic variation research. Donors gave broad consent to future uses of the samples, including their use for extensive genotyping and sequencing, gene expression and proteomics studies, and all other types of genetic variation research, with the data publicly released. An example HapMap consent document can be found here. Investigators can order the HapMap individual DNA samples or individual cell cultures. The biomaterials currently available are shown in the table below: 

Populations Included in Phase I/II HapMap 


DNA Samples  

Cell Cultures
Yoruba in Ibadan, Nigeria [YRI] 120
Han Chinese in Beijing, China [CHB] 120
Japanese in Tokyo, Japan [JPT] 120
CEPH Collection [CEU] samples are available from the NIGMS Human Genetic Cell Repository at Coriell. 
Additional Populations
Population DNA

Cell Cultures
Maasai in Kinyawa, Kenya [MKK] 205 205
Luhya in Webuye, Kenya [LWK] 122 122
Chinese in Metropolitan Denver, CO, USA [CHD] 129 129
Gujarati Indians in Houston, TX, USA [GIH] 117 117
Toscani in Italia [TSI] 117 117
Mexican Ancestry in LA, CA, USA [MXL] 104 104
African Ancestry in SW USA [ASW] 107 107