After many years of struggling with R, I have now made the transition to Python for data analysis. Python notebooks have also been a revelation, and there is no turning back now.
A few useful resources for those looking to take the plunge:
10 Minutes to Pandas: Good introduction with basic code examples.
Pandas Cookbook: Excellent introduction with sample Python notebooks you can work through yourself. I especially enjoyed the notebooks for analyzing New York City 311 calls.
Pandas, MatPlotLib, and Numpy Cheat Sheets: Excellent, concise cheat sheets. Print them out and keep them handy.
Official Pandas Docs: Not the easiest for beginners, but handy nonetheless.
What’s New in Pandas: New stuff appears here.
MatplotLib: Plotting in Python. So much easier than R!
Seaborn: statistical data visualization: For when MatPlotLib doesn't cut it.
Plot.ly: now fully open source; easily create interactive plots and visualization.
And, for those looking for a quick, simple mash up of pandas and plotly, served via flask, a basic hello world application: https://github.com/ecerami/hello_flask.
On December 7 and 8, we organized our first ever cBioPortal Hackathon. For those unfamiliar with cBioPortal, it's a web-based platform for analyzing and visualizing large-scale cancer genomic data sets. Earlier this year, cBioPortal went fully open source, and we now have a cross-institutional team from DFCI, MSKCC, Princess Margaret Cancer Centre, and The Hyve working on the continued development of the portal.
The Hackathon was hosted at DFCI in Boston. All told, they were 15 of us, broken out into four teams.
Pieter Lukasse from The Hyve and Adam Abeshouse from MSKCC presenting performance improvements to the cBioPortal.
Team 1: Data Validation and Importing. Tasked with improving the validation and loading of new data sets into the cBioPortal. Succeeded in extending the existing validator, adding an HTML report option, and made good progress to an overall single script for validating / importing new data sets.
Team 2: Performance. Tasked with identifying performance bottlenecks in the current code, and working their way through these bottlenecks. Succeeded in identifying and fixing a number of front-end performance bottlenecks.