DFCI Knowledge Systems Group

- Ethan Cerami

Our new team room. Long a dream of mine, the team room includes a large scrum board at one end, and multiple "information radiators", displaying real-time updates on all our projects.

One year ago, I joined the Dana-Farber Cancer Institute to lead the newly created DFCI Knowledge Systems Group. We are an applied genomics software and data sciences group, focused on enabling cancer genomics research and precision cancer medicine at DFCI. We are also now part of a larger, new computational biology center, headed by Chris Sander, who has recently moved from Sloan Kettering Cancer Center to DFCI.

One year into the new position, here is a quick summary of what we are working on, who we are, how we work, and positions we are currently looking to fill.

Enabling Precision Cancer Medicine

The knowledge systems group is focused on building new genomic information and knowledge systems and performing data analyses of cancer genomic data, all with the larger goal of enabling precision cancer medicine at DFCI.

A large part of our effort is focused on mining of data generated by the Profile project, a multi-institution initiative between DFCI, Brigham and Women’s Hospital, and Boston Children’s Hospital. This is a large scale, multi-year project providing tumor sequencing to any cancer patient across the three institutions, and we just reached a significant milestone of sequencing our 10,000th sample.

Overview of the DFCI Knowledge Systems Group.

The genomic data generated by Profile, linked to detailed clinical data, is an enormously valuable data set, and we are building new data mining tools and clinical decision support tools to enable both cancer genomic discovery and precision cancer medicine.

We also have a very strong open source focus, and we do this because: 1) it enables us to gain expertise from other institutions, and therefore build better tools; and 2) it enables our software to have an impact across multiple cancer centers. It also fits in well with our overall scientific mission of making our work more widely distributed throughout the scientific community.

Below is a quick summary of our current major projects:

cBioPortal for Cancer Genomics: <img style=”float: right; margin: 20px” src=”http://www.cbioportal.org/images/cbioportal_logo.png” width=250px>Chris Sander, Nikolaus Schultz and I founded the cBioPortal for Cancer Genomics ~6 years ago. Since then, it has morphed into one of the most popular platforms for mining of cancer genomics data, and is now fully open source. We continue to actively develop the platforms, and development now continues within a distributed development team that spans MSKCC, DFCI, and Princess Margaret Cancer Center. We also actively maintain a local instance of cBioPortal, dubbed cBioPortal @ DFCI where we enable approved researchers to mine all genomic data generated by the Profile project.

<img style=”float: left; margin: 20px” src=”https://raw.github.com/ecerami/ecerami.github.io/master/img/match_miner.png” width=250px> MatchMiner: Our newest creation is MatchMiner, a computational platform for matching genomic profiles to open clinical trials at DFCI. We are rolling this out in several stages. In stage 1, clinical trial investigators will be able to create and set their own genomic filters, and use this filters to forecast future clinical trial enrollment (based on two years of historical data), prospectively identify eligible patients, and get automatic notifications of newly matching patients. In stage 2, we will enable “patient-centric” trial matching, such that any clinician will be able to access genomic data for their specific patient, and identify any algorithmically matched clinical trials.

AACR GENIE and Cloud Computing: Beyond sharing of software, we are also strong believers in “big data” for cancer and sharing of genomic data across multiple institutions. DFCI is therefore a founding member of the AACR Project GENIE, a new initiative for sharing genomic data across multiple cancer centers. This Spring, we are also starting a new cloud-based initiative to enable secure, joint computation across multiple cancer centers, and optimization of next-gen sequencing pipelines.

Ideas Under Development: We also have a number of “ideas under development”. These are all currently in the prototype, idea generation phase. These currently include: 1) the DFCI Insight Engine: a platform of Python notebooks for automatically re-running data analyses across all Profile data sets; 2) an open access cancer knowledge base for annotating clinically actionable variants in cancer; and 3) a new web-based information platform for enabling clinicians to better interpret genomic variants within their patients.

Who we Are

There are currently three of us in the group. We all have background in genomics, data science, bioinformatics, or software engineering. The current group consists of myself, James Lindsay, and Priti Kumari. Ersin Çiftçi will be joining us in a few weeks.

We also work very closely with the Hyve, an open source bioinformatics group headquartered in the Netherlands. This includes: Sjoerd van Hagen, Pieter Lukasse, Sander de Ridder, Fedde Schaeffer, all of whom work with us on the cBioPortal; and Bernd van der Veen, who is our front-end engineer extraordinaire for MatchMiner.


We have a number of Open Positions within the group. We are currently looking to fill the following roles:

– Front End Software Engineer: front-end focused engineer with strong AngularJS experience, plus strong data visualization.

– Senior Bioinformatics Engineer: strong bioinformatics engineer, ideally with experience in build cancer genomics systems.

– Senior Research Scientist / Clinical Genomics Analyst: to help lead our open cancer knowledge base and help us curate DFCI clinical trials.

If you have an interest in any of the positions, please contact me directly.

comments powered by Disqus