Background Info

Data Science: is an interdisciplinary field that combines computer science, mathematics, and the scientific method to explore, visualize, and to extract knowledge from data.
Why is this important?
Recent improvements in the collection and storage of big data have opened up new avenues for analytics. Industries such as health care, banking, education, and tech (to name a few), are seeking new ways to wrangle, and gain insights from these massive data repositories. Data science is viewed as a tool to do so, by pooling computer science and statistics.
Resources
- Data science wiki
- The vision for data science Nature volume 493, pages 473–475 (24 January 2013)
Data Science Pipeline

The goal of each post is to:
- Demonstrate how to conduct exploratory data analysis
- Demonstrate how to prepare data for analysis
- Use the scientific method to ask hypothesis driven questions
- Conduct a statistical analysis
- Demonstrate the various libraries used to carry out this work
- Visualize data where appropriate
- Use machine learning where appropriate
- Discuss the analysis
- Provide follow up resources
Each blog will include a link to the github repository that houses the original dataset, the code blocks used in the analysis, and any other relevant links.