Rice audience offered look into genome mapping

Rice
audience offered look into genome mapping

…………………………………………………………………

BY LIA UNRAU
Rice News Staff

On the heels
of the public announcement of the mapping of the human genome,
Gene Myers, vice president of informatics research for Celera
Genomics, spoke to a packed audience in Anne and Charles
Duncan Hall’s McMurtry Auditorium Feb. 21.

In his talk, “Assembling the Whole Genome,” sponsored by Rice’s
Computer and Information Technology Institute, Myers explained
Celera’s approach to the problem, which relied heavily
on computers.

Celera researchers,
who believe the company’s gene assembly methods have
produced a more accurate sequence of the genome than the
Human Genome Project (HGP), used public data produced by
the HGP and “shredded,” or disassembled, it.

“Why did
we shred it?” asked Myers. “We wanted to overcome
the artifacts that appeared in the public data.” He
stated that more than 3.19 percent of the rough draft’s
assembled contiguous sequences of DNA were misassembled.
More than 2.2 percent of bacterial artificial chromosome
(BAC) entries, or bacterial DNA spliced with a medium-sized
fragment of a genome, contain data from foreign BACs. He
said the rough draft also included more than 100 million
base pairs of contaminated sequences. “By shredding
we let our whole genome assembler rectify these errors,” he said.

As a point of
pride, he said, Celera used the public data only as additional
fuel for the whole genome shotgun sequencing approach.

“How do
you solve a 43-million-piece puzzle? I think it’s fairly
intuitive once you see what we did,” said Myers. “The
problem is that 25 percent of the genome is basically the
same sequence over and over. Our strategy was simple: Let’s
figure out which are the repetitive pieces, and let’s
not use them.”

Celera built
an assembler that was least likely to make a mistake — one that would detect repeats and so avoid being misled
by them.

In the same
way many people begin doing a jigsaw puzzle, Celera researchers
wanted to make all the “sure thing” moves first
and look for pieces that go together in a unique way. On
average, 30 pieces come together to form a segment. The
real problem, Myers said, was not to be seduced by the repeated
segments that look like they belong together.

Celera built
a statistical discriminator that provides odds that their
coverage is correct versus incorrect. Through computer simulations,
they could then separate with a high amount of certainty
which segments were correct. They were able to get a 30-fold
reduction in the number of pieces, he said.

Celera’s
was a computer-intensive effort, with the shotgun sequencing
using 20,000 computer hours on 160 processors. By comparison,
Celera used 177.44 computer hours in identifying the Drosophila
genome.

Findings included
that fewer genes were found than expected — about 30,000,
rather than 100,000, as once thought.

About admin