Sunday, December 28, 2014

Holiday project: Javascript

2:07 PM Posted by sandmann No comments
Working in a large bioinformatics department, I am lucky to be exposed to many different questions, topics, tools and approaches. For example, two of my colleagues - Alexandre Masselot  (now at the Swiss Institute of Bioinformatics) and Kiran Mukhyala  built light-weight, customized visualizations of protein data using Javascript. Their work was recently published in the journal Bioinformatics  - check it out on github here and here.

I had the chance to hear Kiran and Alexandre explain how they build modern web applications that allow both experimental and computational scientists to explore their data. So this year-end holiday, I decided to learn more about Javascript myself, starting with the basics.

First, I started reading Douglas Crockford's book "Javascript: The good parts"  . This (suspiciously ?) short book contains a lot of important information, but I realized quickly that I needed to start with a more basic introduction into the language itself, before emphasizing "the good parts".

Next, I discovered Marijn Haverbeke's book "Eloquent JavaScript", published both as a free tutorial online as well as in print. Haverbeke provides a gentle introduction into Javascript and, in passing, also includes valuable lessons on programming in general, including abstraction, modularization and object-orientated programming. The tutorial comes complete with examples, exercises and humor - and will keep me busy until the end of the year.


Afterward, I should be ready to return to "Javascript: The good parts" and explore some Javascript libraries, including lodash and d3.  A first new year's resolution...


Wednesday, October 29, 2014

From Aad to Zutshi - authorship and bias

9:19 AM Posted by sandmann , , , No comments
On July 4th, 2012, scientists at the European Organization for Nuclear Research (CERN) in Switzerland reported the detection of an elusive elementary particle, the Higgs Bosom. Peter Higgs, who had predicted its existence in 1964, was awarded the Nobel prize in Physics in the following year, prompting Neal Hartman to wonder "Who really found the Higgs Boson [?]" ?

Investigating the story behind the discovery, Hartman realized just how many scientists are involved in modern experimental physics: more than 3000 scientists from the ATLAS team are listed as authors on the publication reporting the Higgs Boson's detection and every experiment involves the active participation of hundreds of scientists.

Considering that organizing scientists has been likened to herding cats, the achievement of coordinating such large teams probably comes close to outshining even this groundbreaking discovery. Important ATLAS publications list the authors in alphabetical order, instead of in order of importance, as is common in biomedical research where it is a constant source of aggravation.
A subset of the authors of a recent ATLAS publication

The researchers at CERN also take great care to avoid known psychological pitfalls of experimental research. "We don’t work with real data until the very last step," Kerstin Tackmann, a member of the Higgs to Gamma Gamma analysis group. explains. "Once we look at the real data,” says Tackmann, “we’re not allowed to change the analysis anymore." This precaution is similar to the blinded analysis of clinical trials in biomedical research, considered an essential tool in drug development.

Biomedical researchers appear to be less attuned to the risks of being misled by their own data and MacLeod et al recently estimated that "85% of research resources are wasted".

In my own experience, most researchers are careful and strive to include the necessary controls and safeguards. Yet, with most experiments yielding negative results and the competition for funding and positions increasing, we often overestimate the significance of exciting (though perhaps unlikely) results and tend to disregard contradictory data. While we easily spot overconfident colleagues, many scientists readily state that " I know the effect is there, I just don't have the data to show it, yet".

A few simple procedures can help to remove some of the nagging doubt of whether we interpret too much into our observations. True and tested methods include randomization (e.g. assigning animals randomly to cages instead of placing them on a "first come, first serve" basis) and blinding, e.g. ensuring that the data collection is not biased by the expectation of the experimenter (e.g. by scoring microscopy images without knowing whether samples were treated or not).

Statistics [are used] in the same way that a drunk uses lamp-posts—for support rather than illumination (Andrew Lang)

Yet, I still vividly remember the blank look of disbelief I got from a colleague, when I helpfully offered to replace the names of her digital microscopy images (treated1.tif, treated2.tif, control1.tif, control2.tif, etc), with random labels before she counted the number of cells surviving a drug treatment . She clearly didn't think that her expecting a significant treatment effect could bias the results and didn't take any comfort in reducing it through 'blinding' (although I did offer to provide the original labels afterward - for free).

During the experimental research process, statistical tools can contribute useful feedback  by quantifying how confident we should be in our results. Yet, in my experience, these tools are most often used too late, e.g. when the data has been collected and statistical tests have to be added as an afterthought to pacify a reviewer. Facing last minute judgment of their body of work by a t-test, many researchers are be hard pressed to embrace statistics as a useful addition to their tool kit.

Yet, as new technologies - from single-cell sequencing to CYTOF - enable biomedical researchers to collect more and more data, relying on our intuition is going to mislead us. With the blessing of more data comes the "curse of dimensionality", requiring biologists (like myself) to learn new tricks and kindle their appreciation and use of statistics.





Tuesday, October 14, 2014

Bioconductor 3.0 released today

8:29 PM Posted by sandmann No comments
Today, the latest version (3.0) of the Bioconductor suite of R libraries was released, featuring nearly one thousand software packages as well as an equal number of annotation and experimental data sources.

Despite being a daily user of Bioconductor packages, I am still amazed by the breadth and quality of tools shared in the Bioconductor community, providing open source solutions for metabolomics, cheminformatics, mass spectrometry, genetics or cell biology and many other research areas.

When I first encountered R and Bioconductor as a PhD student at EMBL, I was looking over the shoulders of Wolfgang Huber, a members of the Bioconductor core and co-authors of its official publication in 2004. Back then, I was very impressed at how quickly my ChIP-on-chip data was transformed into scientific plots at Wolfgang's hands. Little did I know that I would spend many hours using R and Bioconductor packages and even contribute code myself in the years to come !

Bioconductor was initially conceived in 2001 to enable reproducible research, share statistical software for biological data analysis and provide training for its growing user base. It is supported by an international team of software developers and scientists based primarily at the Fred Hutchinson Cancer Research Center and other US and international research institutes.

All packages contributed to Bioconductor are reviewed to ensure adherence to established guidelines, including e.g. the availability of vignettes, unit tests, help pages and examples. In addition, both released packages and those under development are automatically tested in a continuous integration environment. A dedicated support site is available where questions are usually answered by package authors and experienced users within hours.

Today, Bioconductor provides important infrastructure for the research community worldwide, and packages are downloaded tens of thousands (sometimes hundreds of thousands of times) every year.

My previous and current research is heavily indebted to the many Bioconductor contributors, some of which I am fortunate to count among friends and colleagues, and to the Bioconductor core team. 

Many thanks for providing an outstanding open-source infrastructure !

References

Sunday, October 12, 2014

Replication stress links structural and numerical cancer chromosomal instability

7:59 PM Posted by sandmann , , , No comments
At the recent "Tumor heterogeneity: Implications for targeted Therapy" conference, Charles Swanton (London Research Institute) presented how kidney, colon and lung tumors change and evolve over time. This work was published in a series of high-profile papers, including two back-to-back articles in Science this week. Here, I review a part of the talk with results published in Burrell et al in Nature in 2013.

In 2013, Swanton and co-workers examined why many tumors display chromosomal instability (CIN). While normal human cells are diploid and carry two copies of each gene, cancer cells from many solid tumors often accumulate specific regions and loose others. This can e.g. lead to the loss of tumor suppressor genes and provides genetic variation that can fuel the evolution of sub-clones.

The balanced inheritance of genetic material is tightly controlled in normal cells, but seems to be error-prone in many human cancers. For example, colorectal cancer can broadly be categorized into chromosomal-stable (CIN-) and -unstable (CIN+) subtypes. Swanton an co-workers set out to understand the mechanisms specifically destabilizing CIN+ colorectal tumors.

Through careful microscopy imaging of dividing cells, they documented a high frequency of DNA replication artifacts in CIN+ cell lines. These cells seemed unable to duplicate their genome correctly before cell division and produced e.g. chromosomal fragments without centromers, which were randomly distributed to the daughter cells. The root of the problems appeared to be a disruption of the DNA replication process itself, as the authors noticed that the replication forks in CIN+ cells progressed at a slower pace than in their CIN- counterparts. This is a sign of "replication stress", which was previously shown to cause DNA damage and chromosomal aberrations.

What could cause replication stress in colorectal cancers ? To formulate specific hypotheses, Swanton and co-workers compared the cancer genomes of CIN+ and CIN- tumors. First, they checked known oncogenes and tumor suppressors. While the TP53 gene, which is frequently deactivated in human cancers, appeared to be more often mutated in CIN+ cases, its biological function did not explain the observed chromosomal instability.

Next, the scientists enumerated copy number variants (CNVs), looking for regions lost or gained in CIN+ but not CIN- tumors, and found a promising candidate: loss of a specific region of chromosome 18 (region 18q) was observed in 88% of aneuploid tumours and 80% of CIN+ cell lines. The researchers had detected a statistically significant correlation between 18q loss and chromosomal instability - but had they really identified a causal relationship ?

If region 18 really contained genes important for the correct execution of replication, its loss should precede the onset of chromosomal instability. During colon cancer development, cells typically progress through a precursor stage, called adenoma, before progressing into malignant carcinomas.

Vogelgram Overview
A genetic model for colorectal tumorigenesis, Fearon & Vogelstein, Cell, 1990; Image source: Wikimedia Commons

Both 18q loss and chromosomal instability were found to be less frequent in adenomas than carcinoma samples from the same patients, consistent with a causal relationship between these two observations (but not proving it).

To elucidate the molecular consequences of 18q loss, the researchers systematically deactivated all of the protein-coding genes contained in this region of the genome. Targeting any one of three genes - PIGN, MEX3C or ZNF516 - produced a CIN+ phenotype in cell lines including acentric chromosomes and anaphase bridges, activation of the DNA damage response and reduced replication fork speed. In addition, just like the naturally observed CIN+ phenotype, the consequences of inactivating these genes could be prevented by supplying additiona DNA building blocks in the form of nucleosides to the cells.

Conclusions:
Recurrent loss of a specific genomic region, 18q, in colorectal cancers may be responsible for disrupting the normal replication process in cancer cells. This could trigger a cascade of subsequent losses or gains, increasing the genetic heterogeneity in the following generations of cancer cells and accelerate the emergence of resistance.

References

Thursday, October 9, 2014

How we got to now - at KQED

7:30 PM Posted by sandmann , , No comments
A fun evening at KQED in San Francisco: Steven Johnson presented a preview of his new PBS television show "How we got to now". Johnson presents how things we generally take for granted came into existence, e.g. electric light, clean water or radio. I specifically loved the animations illustrating e.g. how the city of Chicago was lifted several feet to allow construction of the first sewer system in the US.

Mina Kim, Steven Johnson and Biz Stone at KQED (source: Tom Olson, KQED, via Twitter)

In the post-preview discussion at KQED, led by Mina Kim, Johnson was joined by Biz Stone, co-founder of Twitter, who contributed some of his own experience with innovation. While Johnson addressed more profound questions, e.g. the different time scales of technological and cultural change, Stone emphasized how his projects typically produce results very different from what he had imagined at the beginning: Twitter was originally a fun side project while his team was working on a podcasting platform (that was eventually abandoned).

The first episode of the television show will be broadcast on October 15th, 2014 and Johnson promises fun for eleven-year olds ("It has sewers in it !") as well as grown ups: "We present a different story about every five minutes, which you can tell at dinner parties. Just like I did for the last three years".

If you miss the TV show, you can also check out the book.

Clone wars

7:51 AM Posted by sandmann , , No comments
At the Tumor Heterogeneity: Implications for Targeted Therapy conference in Stanford on October 6th, 2014, Kornelia Polyak (Harvard Medical School) described how her lab used fluorescence in situ hybridization (iFISH) and allele-specific PCR-FISH (STAR-FISH) to image copy number variation and point-mutations of breast cancer at the single-cell level. She highlighted how treatment with anti-HER2 antibodies induced changes in the composition of tumors in a breast cancer cohort, including a  post-treatment enrichment of PI3K mutant cells.

 
To understand how clonal heterogeneity is maintained over time and study the role of interactions between clones, her lab performed xenograft experiments with combinations of cell lines engineered to over-express different non-cell autonomous drivers. Polyclonal tumors, containing a mixture of cell lines, grew faster than clonal xenografts and produced metastases earlier.

Among many other findings, the researchers found that IL11 expression by even a small number of cells in the xenograft increased its density of blood vessels. This promoted growth of the tumor as a whole, including that of the other sub-clones, providing direct experimental evidence for interactions between clonal sub-populations.

As pointed out during the lively discussion, these experiments focused on selected secreted signaling molecules and did not investigate the competitiveness of well-known driver oncogenes such as e.g. mutant KRAS. (KRAS mutant cell lines grew too fast to be included in the xenograft experiment.)

References:

Non-cell-autonomous driving of tumour growth supports sub-clonal heterogeneity; Marusyk et al, Nature, 2014

Tumor Heterogeneity: Implications for targeted therapy

7:43 AM Posted by sandmann , No comments
Earlier this week, I had the chance to attend the "Tumor Heterogeneity: Implications for targeted therapy" conference at the Stanford Cancer Institute. (For a 6 min, 17 s summary of the topic, listen to Simon Tavare on ABC's Science Show.) Cancer is a complex disease - and tumors from different patients, multiple tumors from the same patient and even separate parts of the same tumor can look and behave markedly differently.
Treatment induces a bottleneck effect, where only some resistant sub-clones will survive and propagate to re-form a heterogeneous tumor. (Source: Wikipedia, by Lcchong - Own work. Licensed under Creative Commons Attribution-Share Alike 3.0 Unported license via Wikimedia Commons.)
For example, as Kimberly Allison (Stanford) pointed out, pathologist routinely look at breast cancer samples under the microscope to assess whether a tumor expresses high or low levels of the HER2 (ERBB2) protein. They struggle with summarizing what they see in a single score, because they frequently observe regions with high and low levels of HER2 in the same section.

The conference provided a broad range of topics, including technologies to characterize tumors at the single-cell level, the study of tumor evolution and clinical reports from physicians. Here are some of the main points I took away:
  • late stage tumors contain many sub-clones, differing e.g. in the numbers and types of genetic lesions as well as response to treatment
  • the degree of heterogeneity within a tumor is itself prognostic, e.g. more heterogeneity is often associated with a worse outcome
  • subset of cells already carry mutations causing resistance to any specific cancer drug, and their expansion is associated with recurrence after single-agent therapy
  • every tumor follows a unique evolutionary path, starting with early 'trunk' mutations followed by branching into sub-clones, which may compete, cooperate or simply co-exist.
  • new technologies to detect tumors and characterize them over time, e.g. through blood draws during the course of treatment, offer opportunities to study the dynamics of cancer progression
  • recent advances in the field of cancer immunotherapy, e.g. alerting the patient's immune cells to the presence of a tumor, may offer new therapeutic opportunities

In the coming days, I will summarize a few of my personal highlights from this meeting.

Friday, October 3, 2014

Ebola: Beyond the Headlines

7:30 PM Posted by sandmann No comments

City Arts & Lectures presented a conversation about the challenges of battling the Ebola pandemic. Host Roy Eisenhardt led four panelists, Drs. Paul Farmer, Dan Kelly, Raj Panjabi, and Ambassador Eric P. Goosby, in discussing the causes of the largest ever Ebola outbreak in West Africa and the challenges in fighting the disease in the affected countries.


All panelists pointed out that urgent action from governments and NGOs alike is required to fight the spread of the disease, which is hampered by lack of funding and especially personnel in Africa. They also called for better coordination between aide organizations and national governments.

Dr. Panjabi also highlighted how long-term investments into medical infrastructure to reduce the spread of polio in Nigeria is now contributing to manage Ebola. Dr. Farmer also stressed the importance of long-term partnerships with developing nations: together with UCSF and the Partners in Health NGO Haiti’s Hinche Hospital has been transformed into a training site for doctors and nurses, some of whom are now answering the call for help in Sierra Leone.

A response to the Ebola pandemic requires the establishment of effective treatment centers, which provide hope to patients and affected communities. Much of the discussion focused on the structural problems and organizational challenges today and in the future. In the end, a member of the audience reminded both panelists and listeners that there is also an immediate need for action, as aid agency like Doctors Without Borders have reached the limits of their capabilities.

Proceeds of the event supported Partners In Health’s Ebola relief efforts.
A recording of the discussion will be broadcast on the KQED radio on 10/19/2014.

References:


Invited speaker: Oliver Smithies on "Where ideas come from"

11:00 AM Posted by sandmann No comments



Oliver Smithies visited the campus today and gave a one-hour lecture titled "Where ideas come from". (You can watch a version Prof. Smithies' talk presented in Lindau, Germany, earlier this year here.)

The 89-year old Nobel laureate delighted a captivated audience with a journey through his long career. Recounting his childhood days in the UK, he highlighted the impact that good teachers have on the life and career of scientists. He encouraged researchers to seize the opportunity of teaching themselves, stressing how much he benefited himself from thoroughly dissecting the reading material for his courses:

Always teach from the original papers, don't just read reviews. Watson & Crick's paper is one page long and you can teach anybody, not just scientists, what it means.

Oliver-smithies.jpg
"Oliver-smithies" by Mapos (Markus Pössel) - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons.

In the 1950s, Smithies invented a form of molecular sieving electrophoresis trying to separate insulin from precursors and other proteins, when he discovered that there were more than the previously described 5 proteins in human plasma. This discovery changed the direction of his research project, marked the transition to becoming an independent researcher, and culminated in the development of targeted gene replacement in mammalian cells (1982-85).

Throughout his career, Smithies frequently overcame technical challenges by being persistent. He encouraged scientists to take risks and to overcome their fear of failure:

 If you encounter a difficult problem, you have to learn and get instructions. You can do anything if you put your mind to it.

Smithies fondly remembered how he benefited from sharing scientific ideas and reagents, and encouraged his audience to do the same:

 Science is about sharing. Nine times out of ten it is productive and brings joy. One time out of ten you get screwed, but you get to enjoy the nine [other times].

At the end of the lecture, Smithies, who still performs experiments in his laboratory and flies planes in his spare time, advised younger researchers to follow their passion:

If you ever find yourself in a situation where you don't enjoy what you're doing every day, go talk to your advisor and ask him to work on something else. If he doesn't want you to do that, find another advisor. [...] You only live once.

Smithies' PhD work on an "extremely precise osmometer [...] has the dubious distinction of never being cited" and he only began his groundbreaking experiments on gene replacement at the age of 57. Leaving the lecture hall, I wondered how this outstanding researcher would have fared in today's research environment, in which young scientists need to publish early, frequently and with high impact to turn their passion for science into a career.

Disclaimer: Quotes are as I remember them and may not be perfectly accurate.

References:

Tuesday, September 30, 2014

Illumina user meeting 2014: targeted NGS applications

8:55 AM Posted by sandmann , , No comments
On Tuesday, I attended the Illumina user meeting in Burlingame, CA, which featured speakers from Illumina as well as users from academia and industry. With sequencing costs not coming done as fast in the last two years as they used to, the focus has shifted to specific applications, e.g. clinical diagnostics, and accessing novel types of samples, e.g. FFPE-fixed material.

I enjoyed e.g. Joe de Risi's talk on applying next-generation sequencing to diagnose the various causes of encephalitis. In the same session, Lincoln Nadauld presented clinical cases studies highlighting how exome sequencing is used to make targeted cancer treatment decisions at Intermountain Healthcare hospitals.

Themes of the technical presentations by Illumina representatives were library preparation strategies for targeted capture of specific genomic regions / transcripts of interest. Focusing on predefined subsets of sequences promises more economic use of the available reads, potentially increasing sensitivity and lowering cost at the same time.

For RNAseq applications, traditional poly-A enrichment strategies already do an excellent job at enriching for expressed exons from high-quality material. But many clinical samples suffer from RNA degradation and capturing merely the 3' tail of a transcript is not effective. As capture protocols for both DNAseq and RNAseq are adapted, there is hope to make hundreds of thousands of archived FFPE samples available for molecular analysis. At the same time, sensitive detection of tumor DNA in blood would open the door toward longitudinal data collection, e.g. following the response of patients to treatment through simple blood tests.

With ever more samples analyzed by NGS, multiple speakers pointed out that the bottleneck was not in sequencing any more, but in the subsequent computational and statistical analysis. Mike Snyder (Stanford), who presented an updated on his personal omics project , estimated that the cost of sequencing a genome was < $2,000, but the cost of analyzing > $10,000.

Plenty of room for improvement !


References:
  • Actionable diagnosis of neuroleptospirosis by next-generation sequencing.Wilson MR1, Naccache SN, Samayoa E, Biagtan M, Bashir H, Yu G, Salamat SM, Somasekar S, Federman S, Miller S, Sokolic R, Garabedian E, Candotti F, Buckley RH, Reed KD, Meyer TL, Seroogy CM, Galloway R, Henderson SL, Gern JE, DeRisi JL, Chiu CY.; N Engl J Med. 2014 Jun 19;370(25):2408-17. doi: 10.1056/NEJMoa1401268. Epub 2014 Jun 4.
  • Personal omics profiling reveals dynamic molecular and medical phenotypes. Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HY, Miriami E, Karczewski KJ, Hariharan M, Dewey FE, Cheng Y, Clark MJ, Im H, Habegger L, Balasubramanian S, O'Huallachain M, Dudley JT, Hillenmeyer S, Haraksingh R, Sharon D, Euskirchen G, Lacroute P, Bettinger K, Boyle AP, Kasowski M, Grubert F, Seki S, Garcia M, Whirl-Carrillo M, Gallardo M, Blasco MA, Greenberg PL, Snyder P, Klein TE, Altman RB, Butte AJ, Ashley EA, Gerstein M, Nadeau KC, Tang H, Snyder M. Cell. 2012;148(6):1293-307. PMCID: PMC3341616.

Monday, September 22, 2014

Invited speaker: Michael Fischbach, UCSF, Small molecules from the human microbiota

9:27 PM Posted by sandmann , , , No comments

Today, I had the chance to attend a talk by Michael Fischbach . He presented his group's work on exploring the metabolic abilities of microorganisms, using computational tools to identify operons encoding complex cascades of enzymes - and their chemical products.  Dr. Fischbach presented a whirlwind tour of some of the natural products produced by e.g. by the human microbiota and highlighted how little we know about the biosynthetic abilities of even the most widely used microbes. Great presentation and lot's of new science ! Check out the Fischbach lab's recent publications: