Sunday, December 28, 2014

Holiday project: Javascript

2:07 PM Posted by sandmann No comments
Working in a large bioinformatics department, I am lucky to be exposed to many different questions, topics, tools and approaches. For example, two of my colleagues - Alexandre Masselot  (now at the Swiss Institute of Bioinformatics) and Kiran Mukhyala  built light-weight, customized visualizations of protein data using Javascript. Their work was recently published in the journal Bioinformatics  - check it out on github here and here.

I had the chance to hear Kiran and Alexandre explain how they build modern web applications that allow both experimental and computational scientists to explore their data. So this year-end holiday, I decided to learn more about Javascript myself, starting with the basics.

First, I started reading Douglas Crockford's book "Javascript: The good parts"  . This (suspiciously ?) short book contains a lot of important information, but I realized quickly that I needed to start with a more basic introduction into the language itself, before emphasizing "the good parts".

Next, I discovered Marijn Haverbeke's book "Eloquent JavaScript", published both as a free tutorial online as well as in print. Haverbeke provides a gentle introduction into Javascript and, in passing, also includes valuable lessons on programming in general, including abstraction, modularization and object-orientated programming. The tutorial comes complete with examples, exercises and humor - and will keep me busy until the end of the year.


Afterward, I should be ready to return to "Javascript: The good parts" and explore some Javascript libraries, including lodash and d3.  A first new year's resolution...


Wednesday, October 29, 2014

From Aad to Zutshi - authorship and bias

9:19 AM Posted by sandmann , , , No comments
On July 4th, 2012, scientists at the European Organization for Nuclear Research (CERN) in Switzerland reported the detection of an elusive elementary particle, the Higgs Bosom. Peter Higgs, who had predicted its existence in 1964, was awarded the Nobel prize in Physics in the following year, prompting Neal Hartman to wonder "Who really found the Higgs Boson [?]" ?

Investigating the story behind the discovery, Hartman realized just how many scientists are involved in modern experimental physics: more than 3000 scientists from the ATLAS team are listed as authors on the publication reporting the Higgs Boson's detection and every experiment involves the active participation of hundreds of scientists.

Considering that organizing scientists has been likened to herding cats, the achievement of coordinating such large teams probably comes close to outshining even this groundbreaking discovery. Important ATLAS publications list the authors in alphabetical order, instead of in order of importance, as is common in biomedical research where it is a constant source of aggravation.
A subset of the authors of a recent ATLAS publication

The researchers at CERN also take great care to avoid known psychological pitfalls of experimental research. "We don’t work with real data until the very last step," Kerstin Tackmann, a member of the Higgs to Gamma Gamma analysis group. explains. "Once we look at the real data,” says Tackmann, “we’re not allowed to change the analysis anymore." This precaution is similar to the blinded analysis of clinical trials in biomedical research, considered an essential tool in drug development.

Biomedical researchers appear to be less attuned to the risks of being misled by their own data and MacLeod et al recently estimated that "85% of research resources are wasted".

In my own experience, most researchers are careful and strive to include the necessary controls and safeguards. Yet, with most experiments yielding negative results and the competition for funding and positions increasing, we often overestimate the significance of exciting (though perhaps unlikely) results and tend to disregard contradictory data. While we easily spot overconfident colleagues, many scientists readily state that " I know the effect is there, I just don't have the data to show it, yet".

A few simple procedures can help to remove some of the nagging doubt of whether we interpret too much into our observations. True and tested methods include randomization (e.g. assigning animals randomly to cages instead of placing them on a "first come, first serve" basis) and blinding, e.g. ensuring that the data collection is not biased by the expectation of the experimenter (e.g. by scoring microscopy images without knowing whether samples were treated or not).

Statistics [are used] in the same way that a drunk uses lamp-posts—for support rather than illumination (Andrew Lang)

Yet, I still vividly remember the blank look of disbelief I got from a colleague, when I helpfully offered to replace the names of her digital microscopy images (treated1.tif, treated2.tif, control1.tif, control2.tif, etc), with random labels before she counted the number of cells surviving a drug treatment . She clearly didn't think that her expecting a significant treatment effect could bias the results and didn't take any comfort in reducing it through 'blinding' (although I did offer to provide the original labels afterward - for free).

During the experimental research process, statistical tools can contribute useful feedback  by quantifying how confident we should be in our results. Yet, in my experience, these tools are most often used too late, e.g. when the data has been collected and statistical tests have to be added as an afterthought to pacify a reviewer. Facing last minute judgment of their body of work by a t-test, many researchers are be hard pressed to embrace statistics as a useful addition to their tool kit.

Yet, as new technologies - from single-cell sequencing to CYTOF - enable biomedical researchers to collect more and more data, relying on our intuition is going to mislead us. With the blessing of more data comes the "curse of dimensionality", requiring biologists (like myself) to learn new tricks and kindle their appreciation and use of statistics.





Tuesday, October 14, 2014

Bioconductor 3.0 released today

8:29 PM Posted by sandmann No comments
Today, the latest version (3.0) of the Bioconductor suite of R libraries was released, featuring nearly one thousand software packages as well as an equal number of annotation and experimental data sources.

Despite being a daily user of Bioconductor packages, I am still amazed by the breadth and quality of tools shared in the Bioconductor community, providing open source solutions for metabolomics, cheminformatics, mass spectrometry, genetics or cell biology and many other research areas.

When I first encountered R and Bioconductor as a PhD student at EMBL, I was looking over the shoulders of Wolfgang Huber, a members of the Bioconductor core and co-authors of its official publication in 2004. Back then, I was very impressed at how quickly my ChIP-on-chip data was transformed into scientific plots at Wolfgang's hands. Little did I know that I would spend many hours using R and Bioconductor packages and even contribute code myself in the years to come !

Bioconductor was initially conceived in 2001 to enable reproducible research, share statistical software for biological data analysis and provide training for its growing user base. It is supported by an international team of software developers and scientists based primarily at the Fred Hutchinson Cancer Research Center and other US and international research institutes.

All packages contributed to Bioconductor are reviewed to ensure adherence to established guidelines, including e.g. the availability of vignettes, unit tests, help pages and examples. In addition, both released packages and those under development are automatically tested in a continuous integration environment. A dedicated support site is available where questions are usually answered by package authors and experienced users within hours.

Today, Bioconductor provides important infrastructure for the research community worldwide, and packages are downloaded tens of thousands (sometimes hundreds of thousands of times) every year.

My previous and current research is heavily indebted to the many Bioconductor contributors, some of which I am fortunate to count among friends and colleagues, and to the Bioconductor core team. 

Many thanks for providing an outstanding open-source infrastructure !

References

Sunday, October 12, 2014

Replication stress links structural and numerical cancer chromosomal instability

7:59 PM Posted by sandmann , , , No comments
At the recent "Tumor heterogeneity: Implications for targeted Therapy" conference, Charles Swanton (London Research Institute) presented how kidney, colon and lung tumors change and evolve over time. This work was published in a series of high-profile papers, including two back-to-back articles in Science this week. Here, I review a part of the talk with results published in Burrell et al in Nature in 2013.

In 2013, Swanton and co-workers examined why many tumors display chromosomal instability (CIN). While normal human cells are diploid and carry two copies of each gene, cancer cells from many solid tumors often accumulate specific regions and loose others. This can e.g. lead to the loss of tumor suppressor genes and provides genetic variation that can fuel the evolution of sub-clones.

The balanced inheritance of genetic material is tightly controlled in normal cells, but seems to be error-prone in many human cancers. For example, colorectal cancer can broadly be categorized into chromosomal-stable (CIN-) and -unstable (CIN+) subtypes. Swanton an co-workers set out to understand the mechanisms specifically destabilizing CIN+ colorectal tumors.

Through careful microscopy imaging of dividing cells, they documented a high frequency of DNA replication artifacts in CIN+ cell lines. These cells seemed unable to duplicate their genome correctly before cell division and produced e.g. chromosomal fragments without centromers, which were randomly distributed to the daughter cells. The root of the problems appeared to be a disruption of the DNA replication process itself, as the authors noticed that the replication forks in CIN+ cells progressed at a slower pace than in their CIN- counterparts. This is a sign of "replication stress", which was previously shown to cause DNA damage and chromosomal aberrations.

What could cause replication stress in colorectal cancers ? To formulate specific hypotheses, Swanton and co-workers compared the cancer genomes of CIN+ and CIN- tumors. First, they checked known oncogenes and tumor suppressors. While the TP53 gene, which is frequently deactivated in human cancers, appeared to be more often mutated in CIN+ cases, its biological function did not explain the observed chromosomal instability.

Next, the scientists enumerated copy number variants (CNVs), looking for regions lost or gained in CIN+ but not CIN- tumors, and found a promising candidate: loss of a specific region of chromosome 18 (region 18q) was observed in 88% of aneuploid tumours and 80% of CIN+ cell lines. The researchers had detected a statistically significant correlation between 18q loss and chromosomal instability - but had they really identified a causal relationship ?

If region 18 really contained genes important for the correct execution of replication, its loss should precede the onset of chromosomal instability. During colon cancer development, cells typically progress through a precursor stage, called adenoma, before progressing into malignant carcinomas.

Vogelgram Overview
A genetic model for colorectal tumorigenesis, Fearon & Vogelstein, Cell, 1990; Image source: Wikimedia Commons

Both 18q loss and chromosomal instability were found to be less frequent in adenomas than carcinoma samples from the same patients, consistent with a causal relationship between these two observations (but not proving it).

To elucidate the molecular consequences of 18q loss, the researchers systematically deactivated all of the protein-coding genes contained in this region of the genome. Targeting any one of three genes - PIGN, MEX3C or ZNF516 - produced a CIN+ phenotype in cell lines including acentric chromosomes and anaphase bridges, activation of the DNA damage response and reduced replication fork speed. In addition, just like the naturally observed CIN+ phenotype, the consequences of inactivating these genes could be prevented by supplying additiona DNA building blocks in the form of nucleosides to the cells.

Conclusions:
Recurrent loss of a specific genomic region, 18q, in colorectal cancers may be responsible for disrupting the normal replication process in cancer cells. This could trigger a cascade of subsequent losses or gains, increasing the genetic heterogeneity in the following generations of cancer cells and accelerate the emergence of resistance.

References

Thursday, October 9, 2014

How we got to now - at KQED

7:30 PM Posted by sandmann , , No comments
A fun evening at KQED in San Francisco: Steven Johnson presented a preview of his new PBS television show "How we got to now". Johnson presents how things we generally take for granted came into existence, e.g. electric light, clean water or radio. I specifically loved the animations illustrating e.g. how the city of Chicago was lifted several feet to allow construction of the first sewer system in the US.

Mina Kim, Steven Johnson and Biz Stone at KQED (source: Tom Olson, KQED, via Twitter)

In the post-preview discussion at KQED, led by Mina Kim, Johnson was joined by Biz Stone, co-founder of Twitter, who contributed some of his own experience with innovation. While Johnson addressed more profound questions, e.g. the different time scales of technological and cultural change, Stone emphasized how his projects typically produce results very different from what he had imagined at the beginning: Twitter was originally a fun side project while his team was working on a podcasting platform (that was eventually abandoned).

The first episode of the television show will be broadcast on October 15th, 2014 and Johnson promises fun for eleven-year olds ("It has sewers in it !") as well as grown ups: "We present a different story about every five minutes, which you can tell at dinner parties. Just like I did for the last three years".

If you miss the TV show, you can also check out the book.

Clone wars

7:51 AM Posted by sandmann , , No comments
At the Tumor Heterogeneity: Implications for Targeted Therapy conference in Stanford on October 6th, 2014, Kornelia Polyak (Harvard Medical School) described how her lab used fluorescence in situ hybridization (iFISH) and allele-specific PCR-FISH (STAR-FISH) to image copy number variation and point-mutations of breast cancer at the single-cell level. She highlighted how treatment with anti-HER2 antibodies induced changes in the composition of tumors in a breast cancer cohort, including a  post-treatment enrichment of PI3K mutant cells.

 
To understand how clonal heterogeneity is maintained over time and study the role of interactions between clones, her lab performed xenograft experiments with combinations of cell lines engineered to over-express different non-cell autonomous drivers. Polyclonal tumors, containing a mixture of cell lines, grew faster than clonal xenografts and produced metastases earlier.

Among many other findings, the researchers found that IL11 expression by even a small number of cells in the xenograft increased its density of blood vessels. This promoted growth of the tumor as a whole, including that of the other sub-clones, providing direct experimental evidence for interactions between clonal sub-populations.

As pointed out during the lively discussion, these experiments focused on selected secreted signaling molecules and did not investigate the competitiveness of well-known driver oncogenes such as e.g. mutant KRAS. (KRAS mutant cell lines grew too fast to be included in the xenograft experiment.)

References:

Non-cell-autonomous driving of tumour growth supports sub-clonal heterogeneity; Marusyk et al, Nature, 2014

Tumor Heterogeneity: Implications for targeted therapy

7:43 AM Posted by sandmann , No comments
Earlier this week, I had the chance to attend the "Tumor Heterogeneity: Implications for targeted therapy" conference at the Stanford Cancer Institute. (For a 6 min, 17 s summary of the topic, listen to Simon Tavare on ABC's Science Show.) Cancer is a complex disease - and tumors from different patients, multiple tumors from the same patient and even separate parts of the same tumor can look and behave markedly differently.
Treatment induces a bottleneck effect, where only some resistant sub-clones will survive and propagate to re-form a heterogeneous tumor. (Source: Wikipedia, by Lcchong - Own work. Licensed under Creative Commons Attribution-Share Alike 3.0 Unported license via Wikimedia Commons.)
For example, as Kimberly Allison (Stanford) pointed out, pathologist routinely look at breast cancer samples under the microscope to assess whether a tumor expresses high or low levels of the HER2 (ERBB2) protein. They struggle with summarizing what they see in a single score, because they frequently observe regions with high and low levels of HER2 in the same section.

The conference provided a broad range of topics, including technologies to characterize tumors at the single-cell level, the study of tumor evolution and clinical reports from physicians. Here are some of the main points I took away:
  • late stage tumors contain many sub-clones, differing e.g. in the numbers and types of genetic lesions as well as response to treatment
  • the degree of heterogeneity within a tumor is itself prognostic, e.g. more heterogeneity is often associated with a worse outcome
  • subset of cells already carry mutations causing resistance to any specific cancer drug, and their expansion is associated with recurrence after single-agent therapy
  • every tumor follows a unique evolutionary path, starting with early 'trunk' mutations followed by branching into sub-clones, which may compete, cooperate or simply co-exist.
  • new technologies to detect tumors and characterize them over time, e.g. through blood draws during the course of treatment, offer opportunities to study the dynamics of cancer progression
  • recent advances in the field of cancer immunotherapy, e.g. alerting the patient's immune cells to the presence of a tumor, may offer new therapeutic opportunities

In the coming days, I will summarize a few of my personal highlights from this meeting.