An interesting series of articles from the Economist: More special than you thought and Nature: Global variation in copy number in the human genome (PDF) describe a new appreciation for variation in the human genome.
From the Economist:
To find out how much things vary between people, they had previously focused on places in the string of molecular “letters” of which DNA is made where one letter is replaced by another in a significant fraction of the population. (These are known as single-nucleotide polymorphisms.) But it has been known for a long time that there is another sort of variation around. During the process by which DNA is copied, entire blocks of DNA can be accidentally deleted or multiplied. (Following such multiplied and deleted regions between the generations was an important genetic technique before the invention of cheap and rapid gene-sequencing technology.)
What the team has shown in the latest report is that duplication and deletion are much more widespread than was previously realised. Also, the duplication and deletion often involve active genes as well as the so-called non-coding part of the DNA, which is not translated into the protein molecules that keep cells running.
Does all that matter? The absence of a gene has obvious implications. Unless it is covered up by the presence of that gene on a sister chromosome (for chromosomes come in pairs; one from the mother and one from the father), trouble is likely to ensue. Multiple copies of a gene may bring more subtle problems. Genes pass their orders to the rest of the cell via messenger molecules copied from their DNA. Too many copies of a gene might mean too many messengers and thus too much protein. That might, in turn, cause disease.
The news is Copy Number Variation (CNV) is significant and more widely varied than previously thought. The HapMap database of 270 individuals from four populations representing Europe, Asia, and Africa, is only the beginning.
As an aside: we (CBRI) built a version of the HapMap data as a database. It took three days to load the phase I and phase II data. We hope our local researchers will find this a valuable resource.