Thursday, July 26, 2007

Passing the Test

For the first time in recent memory, I had a test -- two days in a row! Yikes!

This is one side effect of being in bioinformatics. Unlike professional fields such as medicine or law, there is no concept of formal continuing education for us run-of-the-mill biotechies. Since leaving Harvard I've taken a couple of outside courses and a bunch of Millennium-sponsored workshops and such, but none had formal grading.

Which is fine by me. The stuff that matters I get tested on in the most rigorous way possible -- on-the-job. I'm actually historically pretty good at tests & don't have much anxiety, but I've also spit the bit more than a few times during my academic career. Tests are really not fun.

Now these tests weren't too bad, but I did have to (1) learn a bunch of new (and semi-new) vocabulary (2) pass a practical exam requiring dexterity & patience (two traits I have -- in clubs) & (3) dust off some once burned in but very rusty information. However, it was worth it to get my license, which means I can now menace innocent bystanders all over the country.

Well, if they get too close to a sailboat I'm piloting, which primarily means if they are in the sailboat I am piloting. I took the beta unit out one very windy Sunday & skipped ahead to the not-yet-covered capsize-and-recovery technique. After three times in the drink (laughing hysterically each time), he'd had enough. Not that I'd had any trouble righting the boat -- one place where a not-quite-slim physique really comes in handy. The written test wasn't bad, but the practical took some time. Most things went well, but nearly half-a-dozen tries were required to get down the precision sailing (turn around a U-shaped dockage without touching the sides).

Sailing has two classes of moments: calm, easy times when everything goes right & moments of sheer thrill when you get close to going over. It is a real rush getting the boat up nearly 90 degrees and racing ahead -- so long as you don't complete the flip. If you liked driving your grade school teachers crazy by balancing your chair on two legs, that ain't nothing. Of course, it is one thing to try it on a small pond with a lifeguard ready to fire up a motorboat; I really wouldn't want to go over in the shipping channel of Boston Harbor.

In the calmer moments, one can be contemplative. This is a lot closer to where I thought my interest in biology would take me than what I actually do. I originally planned, when deciding on a biology major, that I would go into ecology or wildlife biology. No, it isn't all glamour, but field work does take place in, well, fields. Later, I thought my graduate career would be in plant genetics, where I might at least spend a lot of time in greenhouses and perhaps in experimental plots.

Bioinformatics really doesn't mix well with sunlight -- if the shade is right I can work on the buggy side of the house via Wi-Fi, but normally the laptop is too washed out in the day & the buzzers too thick at night. If only I could somehow get funded to go sailing on a genomics mission, in my own private yacht. Nah, could never happen -- nobody has enough chutzpah to attempt that.

Kicking the Media

One short newswire article, three spikes in my blood pressure. Impressive!

Just before heading off on a short vacation last week I spotted the news item about the two new genetic association studies which report on restless leg syndrome.

The lead paragraph drove the first spike: "suggesting the twitching condition...is biologically based". Now, I've elided the pop cultural reference for clarity, not because it was the problem (I'm actually a huge Seinfeld fan). I was left wondering what other causes were ascribed to a condition which is treatable by medication which has passed at least one double-blind placebo-controlled trial? Poltergeists? Ah, perhaps they're suggesting it is purely psychosomatic?

While away I saw in another paper a longer version of the same item -- with still no explanation of what else, besides physiology, might result in restless legs.m

But going further, spike number two. The article mentions that Kari Stefansson was an author. I don't have an inherent bias against company-sponsored or company-driven research, but why wasn't the fact he is the head of DeCode mentioned? That's important background information -- DeCode has succeeded again, but also has inherent financial conflicts of interest.

The final two paragraphs gave the kicker: a doctor pooh-poohing the results by email, complaining that it is "overhyped" and "doesn't pin down what the condition is, who has it, or what medication is needed". Gimme complete solutions or shut up, in other words. Now, I am somewhat surprised that DeCode got their paper in New England Journal of Medicine, given the small number of genetics papers published there it is striking that a relatively routine linkage study for a non-fatal disorder was published there, but editors get to pick what they like.

Via a post on Freakonomics I finally discovered some of the background missing from the newspaper items. The same doctor (Steven Woloshin) quoted in the newspaper item had recently published in PLoS Medicine an article claiming that restless legs syndrome is a poster child for "disease mongering" by pharmaceutical companies and their dupes/comrades in the media.

If one steps back from the dust & smoke, the papers are intriguing (well, the abstracts -- I don't normally have access to either journal though NEJM is apparently, at least at the moment, making the full text freely available) first because they each found the same gene (though the second paper found two more). BTBD9 is not a well-characterized gene, but it contains a BTB domain, a protein domain involved in protein-protein interactions. So, one clear path forward is to identify the interaction partners of BTBD9.

Each abstract has some additional, apparently unique information, which is intriguing. DeCode reports that the BTBD9 variant is also linked to reduced serum ferritin levels and that ferritin levels have been previously implicated in restless legs syndrome. They also report higher levels of other movements during sleep in individuals carrying the variant. The Nature Genetics paper reports linkages to one gene and an intergenic region, with the one gene (MEIS1)
previously implicated in limb development.

Hints & suggestions: no, it doesn't tell Dr. Woloshin how to treat or prescribe, but it does suggest a route towards understand the pathology, which will probably not include poltergeists.

Wednesday, July 18, 2007

Turn Right on Main, Then Left at Chromosome 4

It's apparently been up since April, but I just stumbled on the Cambridge Genome Trail. Running down the main commercial spine of Cambridge from Harvard to MIT and through much of biotech country (but far enough away from my current office that I didn't see it sooner), the trail consists of large wrap-around banners on lampposts with descriptive text at street level.

The Boston area also has a permanent scale model of the solar system. I don't believe there is an atom or periodic table; perhaps they will show up in the future. Truly Quixotic would be to attempt to model the protein interactome of even a small creature -- too many interactions which are being added to too quickly!

Tuesday, July 17, 2007

New Breast Cancer Molecular Diagnostic

The Cancer Genetics blog has a post on the approval of Veridex's new RT-PCR test for breast cancer spread.

What was emphasized in the Globe article which is striking is that this test can potentially be performed while the patient is still on the operating table, avoiding a delay between screening test & initiating follow-up testing. If this holds true, then this is an example of molecular diagnostics really having a big impact in a major health problem. As with any diagnostic test, the key question is specificity & sensitivity aka false positives and false negatives. The key study had 300ish patients in it, which is just a small puddle compared to the ocean of breast cancer patients.

Veridex, which is owned by J&J, has some other cool technologies cooking, including some to sift tiny numbers of cancer cells from the bloodstream, cells which have escaped from the primary tumor or metastases. Since getting clinical samples can be a serious challenge, this technology is pretty amazing.

Thursday, July 12, 2007

David Copperfield's Favorite Database

An interesting paper in BMC Bioinformatics led me to a database I hadn't heard of, and one which is very unusual. Most databases grow over time, often exponentially. This is a database intended to disappear.

The database is ORENZA, a database of orphan enzyme activities. These are enzyme activities which have been described in the literature, but not yet linked to a cloned protein. In other words, it is a big punchlist for our understanding of metabolism. This is the mirror image of all those lists of ORFs lacking known function out there; this is the list of identified functions lacking known ORFs.

I have found one puzzle in the paper which has me scratching my head; I wish a reviewer had insisted on an explanation. In the list of validated orphans, one entry is for EC 5.1.3.17 (Heparosan-N-sulfate-glucuronate 5-epimerase), an enzyme I claim no mental familiarity with (though apparently I routinely take advantage of this activity). The note for it says
Involved in the biosynthesis of
heparan sulfate, which binds
proteins to modulate signaling
events in embryogenesis. Mouse
gene knock-out results in late
lethal phenotype


Huh????? How do you knock out a gene for an orphan enzyme? Indeed, there would seem to be a paper describing the cloned mouse gene in J Biol Chem from 2001. The protein seems to be annotated with the activity in UniProt. I'm clearly missing something here -- perhaps only the bacterial activites are orphans?

If I were behind ivy-covered walls, I would see this as a grand opportunity for projects for advanced undergraduate students in biochemistry / molecular biology / systems biology and so forth. Assign each student a bunch of activities from ORENZA and have them prepare a report on what is known about them. If the students can propose a good candidate, then beaucoup extra credit!

It is unlikely that many of these will be deorphaned by literature searches alone; biochemical slogging will be required. An interesting approach was just published in Nature in which an ORF was assigned a biochemical function by first experimentally determining its three-dimensional structure (via a structural genomics effort) and then bombarding it computationally with various small molecules. Successful docking of a number of adenine analogs gave a short list of candidate substrates and even a possible reaction. That latter trick is neat: by docking compounds that represent high-energy (transiently present) intermediates, the possible reaction can be guessed. In this case, the ORF was successfully shown to be a deaminase for several adenosine-like molecules (including adenosine itself).

Since the crystal structure had already been determined, determining the structure with one of the docked compounds was tractable with an excellent match to the docking prediction. The authors performed further docking to propose extending this annotation to 78 eubacterial and archeal ORFs.

There is a nice bit at the end describing some of the conditions that helped this effort to succeed and how general or specific they are. For example, the ORF in question belonged to a large enzyme family by sequence similarity, which narrowed the list of candidate reactions. Your commonplace ORF-that-looks-like-nothing-but-ORFs won't be helped by that. Also the enzyme did not undergo gross structural rearrangements on binding substrate, a phenomenon that would certainly confound this approach. The enzyme also functioned on well-characterized metabolites; enzymes that work on uncharacterized compounds may remain mysteries. However, even with these caveats, this approach is likely to yield further fruit, particularly since the structural genomics projects are really cranking out the structures.

Wednesday, July 11, 2007

The Devil in the Deep Blue Sea?

An open-access paper in PNAS is interesting on at least two scores.

First, it illustrates how bacterial genome sequencing is becoming a routine tool: two new bacterial genomes packed into one short paper.

Second is the key thrust of the paper. They sequenced two species isolated from deep hydrothermal vents in the ocean. These bacteria are related to a number of bacteria from up here on the surface, including such pathogens as Helicobacter (stomach ulcers & cancer) and Campylobacter (food poisoning).

What is striking is that they find genes in these deep sea vents which are very, very similar to important virulence genes in the terrestial nasties. A proffered explanation is that these bacteria may engage in symbioses with eukaryotes living in the vent communities.

Oceans have long enchanted and terrified humanity. The focus for the latter has usually been big things: storms & man-eating sharks. Now we must shift some of our anxiety to the very small things which live deep in Davy Jones' locker.

Tuesday, July 10, 2007

Restriction Endonuclease Reverie

One of the first molecular biology techniques I learned as an undergraduate was restriction enzyme mapping. It's simple and beautiful; at the end you have neat bands of orange glowing in the darkroom.

Molecular biology involves a lot of incubations, giving one time to read, think or work on other projects. An easy way to pass some time was to pull out the New England Biolabs catalog and browse. NEB sells a lot of reagents, but their selection of restriction enzymes has always been a key point. In addition to the enzymes themselves, there were the restriction maps of common vectors in the back.

Restriction enzymes are simply amazing, nature's gift to molecular biology. Each enzyme recognizes a short DNA sequence with incredible specificity, cleaving only on or near the appropriate sequence. All sorts of interesting variations on the theme exist. Some are blocked by methylation of nucleotides in their recognition site, others require methylation. Some cleave in a region of precise length but undefined sequence between their recognition site; some cleave a select distance away, and a few clip out an island of DNA centered on their recognition site. The taxonomy of these enzymes simply grows & grows as new variants are identified.

During my graduate years I didn't work with restriction enzymes, other than one concept that never got beyond the idea stage. At Millennium it was totally outside my scope.

But now, in the synthetic biology world, I get to play again. I'm again browsing through the lists of enzymes, though now I do so with REBASE. How many other databases are labors of love by a Nobel laureate? As an undergraduate some of those outside and island cutters seemed to be oddities; now they are opportunities.

In particular, the Type IIS restriction enzymes, those which cut adjacent to their asymmetric sites, have really moved into their own due to their utility in manipulating DNA. By ligating a IIS site to unknown sequence, one can clip out a short tag easily sequenced, such as in SAGE. In synthetic biology, designing IIS sites into a sequence can be used to generate a huge variety of sticky ends, yet also leave no 'scar' in the final sequence.

Of course, one can never be satisfied. Enzymes with very rarely occurring sites are useful for a lot of genomics research, but very few restriction enzymes with long (and therefore rare) recognition sites have been found. There are only limited numbers of methylation-dependent enzymes, or IIS enzymes. Not only do enzymes vary in their recognition sequence, but even enzymes with the same recognition sequence can cleave at different positions (using different enzymatic mechanisms), which can be useful -- but for many sites only one cleavage pattern is available.

Ah, no matter how impressive the toy chest, we still have a wish list!

Monday, July 09, 2007

Cancer: Genes, Chromosomes or both

The Gene Sherpa recently posted on the chromosomal instability theory of cancer, which he sees as an emerging paradigm shift, displacing the dominant gene-centric model of cancer. I'd like to point out some recent results that paint a much more complicated picture & suggest that both theories have a lot to contribute.

It's worth reviewing some background on the two-hit model. Knudson described in 1971 a statistical model to explain different patterns of retinoblastoma, including the inherited familial form. The model proved true in retinoblastoma, with the responsible gene (Rb) being cloned and sequenced. Other familial cancer syndromes also appear to fit Knudsen's model.

The key question is how well does this model work in general. This is truly an important question: huge amounts of cancer research in both academia and industry are focused around the oncogene / tumor suppressor model of cancer.

Two competing theories are the cellular disorganization theory and a central role for aneuploidy. Each of these holds that biological disorganization, either at the level of cells or chromosomes.

There are probably few biologists who believe that one of these hypotheses utterly trumps the others; the question is which comes first and which should we focus our efforts on.

A paper in Nature last month (alas, you'll need a Nature subscription) nicely illustrates the interplay, but also would favor single genetic events leading to aneuploidy and not necessarily the other way round.

The authors present a transgenic mouse model of cancer. These mice carry inactivating mutations in three key genes, Atm, Terc and p53. Atm is a protein kinase important for turning on many DNA damage repair genes. Terc encodes the RNA component of the telomeres, the special structures which protect the ends of chromosomes. p53 is another gene critical to DNA repair and the growth arrest of deranged cells. Inactivating mutations in p53 are found in roughly half of all human cancers, and ATM is also often mutated. Mice lacking Atm function develop lymphomas, an effect suppressed if the mouse is also knocked out for Terc.

The triple mutant mice develop tumors much like those mutant only for Atm, suggesting that the tumor suppression in Terc null mice is effected by p53. They also have high levels of aneuploidy, much more pronounced than in Atm null only mice.

So, high levels of aneuploidy can be driven by knockouts in a few key genes, a point for genes before aneuploidy.

Using genomic arrays the precise regions of aneuploidy, meaning those DNA segments amplified or reduced in copy number, can be determined. DNA sequencing can identify point mutants in selected genes. An important point about this paper is that many of the changes observed parallel those seen in human lymphomas. Mutations in Notch, Fbxw7 and the Pten/Akt pathway were all observed as well as many other changes. So the mouse model, driven by three genetic changes, mimics the genetic changes seen in human tumors.

This is not the first paper in this vein. Last year there was a burst of papers showing that transgenic mouse models of cancer could recapitulate genomic alterations seen in human tumors, including breast, liver and melanoma. Many of these models used more traditional oncogenes such as RAS, which are not directly involved in chromosome maintenance. So again, gene changes can beget chromosome changes.

Any model claiming primacy of genetic events will need to incorporate these, and many other observations. However, trying to claim complete primacy of genes would be silly as well. For example, events in a small number of genes might ignite aneuploidy, but it could easily be the case that restoring function to those genes later would be ineffective. Similarly, genetic events might initiate cellular disorganization, but chaos at the tissue level may eventually be self-sustaining.

Paradigm shift? Not from how I read Kuhn. Simple models being replaced by messy models reflecting the chaos of cancer; that's a sure bet.

Tuesday, July 03, 2007

Diabetes: Deja vu all over again

This week's Nature Genetics advance publication abstracts (you need a subscription to access the full text; I don't have one) brought more genetic association studies. These studies are coming in at a furiohttp://www.blogger.com/post-create.g?blogID=36768584
Blogger: Omics! Omics! - Create Postus pace, with the rate expected only to increase.

A huge issue with association studies is whether they are correct. The field has been tainted by early studies that failed to hold up to later scrutiny. The sheer frequency of new genetic associations makes watching the field challenging, and I don't claim to keep up in general. Many of these studies turn up variants in genes which have been little if at all characterized, and the biological follow-up is often slow -- because it is slow, hard work.

What struck me about these two papers was first that they were both about common variants & diabetes. What is even more interesting is that in each case the study found common variants affecting diabetes risk that were in genes already strongly associated with diabetes.

A group including deCODE Genomics identified variants in TCF2 (aka HNF1-beta), a gene already associated with Mature Onset Diabetes of the Young, or MODY. When I first came to Millennium there was a race on to find one of the MODY genes, which resulted in finding HNF1-alpha (albeit after the other group). Other members of the HNF family cause MODY when mutated.

The other group found protective mutations in the WFS1 gene, which when mutated causes Wolfram syndrome. Strikingly, among the major symptoms of Wolfram syndrome are diabetes, though with a bunch of nasty developmental defects thrown in. Now, it wasn't entirely surprising that this study nailed a known gene in diabetes, because they focused on genes with known relevance to pancreatic beta cell biology. But it still beats gene of unknown function #10,001.