DataScience Blogs

Background Info

Data Science: is an interdisciplinary field that combines computer science, mathematics, and the scientific method to explore, visualize, and to extract knowledge from data.

Why is this important?
Recent improvements in the collection and storage of big data have opened up new avenues for analytics. Industries such as health care, banking, education, and tech (to name a few), are seeking new ways to wrangle, and gain insights from these massive data repositories. Data science is viewed as a tool to do so, by pooling computer science and statistics.

Resources

Data science wiki
The vision for data science Nature volume 493, pages 473–475 (24 January 2013)

Data Science Pipeline

The goal of each post is to:

Demonstrate how to conduct exploratory data analysis
Demonstrate how to prepare data for analysis
Use the scientific method to ask hypothesis driven questions
Conduct a statistical analysis
Demonstrate the various libraries used to carry out this work
Visualize data where appropriate
Use machine learning where appropriate
Discuss the analysis
Provide follow up resources

Each blog will include a link to the github repository that houses the original dataset, the code blocks used in the analysis, and any other relevant links.

Data Science Blogs

Resources