Data Science Blogs

Background Info

html

Data Science: is an interdisciplinary field that combines computer science, mathematics, and the scientific method to explore, visualize, and to extract knowledge from data.

Why is this important?
Recent improvements in the collection and storage of big data have opened up new avenues for analytics. Industries such as health care, banking, education, and tech (to name a few), are seeking new ways to wrangle, and gain insights from these massive data repositories. Data science is viewed as a tool to do so, by pooling computer science and statistics.

Data Science Pipeline

html

The goal of each post is to:

  • Demonstrate how to conduct exploratory data analysis
  • Demonstrate how to prepare data for analysis
  • Use the scientific method to ask hypothesis driven questions
  • Conduct a statistical analysis
  • Demonstrate the various libraries used to carry out this work
  • Visualize data where appropriate
  • Use machine learning where appropriate
  • Discuss the analysis
  • Provide follow up resources

Each blog will include a link to the github repository that houses the original dataset, the code blocks used in the analysis, and any other relevant links.