Browsed by
Category: Data Science

Are you Correlating Correctly?

Are you Correlating Correctly?

Well…are you? In a conversation between a few colleagues the concept of a correlation came up. I made a comment that we couldn’t be certain that the correlation formula was being applied correctly by our software, and therefore, we shouldn’t use it (typical black box). To my surprise, my colleague asked me “what’s the difference?” not knowing that using certain correlation formulas for certain data sets is inappropriate and inaccurate. It prompted me to write this article about the four…

Read More Read More

What is a p-value?

What is a p-value?

In this video I provide a short explanation of what a p-value is, and how it is used. P-values are essential in hypothesis testing using traditional statistics, but they are meaningful in other ways which I’ll explore in a future post. For now, I hope you enjoy this video.

Manage Python Environments with Anaconda

Manage Python Environments with Anaconda

Anaconda is a great tool to manage Python packages and host multiple environments easily. It gives a ton of “standard” packages on the initial install, and provides a simple platform to create and remove different Python states. In this video I’ll demonstrate the installation with some critical options, as well as show how to set up and move between environments.

How to use python with databases

How to use python with databases

Today we are going to explore some of the ways you can connect to sql databases with python. A few examples of why you would want to do this are: Querying large data sets without the need to store the data in an intermediate step. To combine data from several disparate databases. To use as a wrapper for an application (sqlite, mariadb, etc) Extracting data from an application with a hosted database. Let’s dive in and explore the different ways…

Read More Read More

The apriori algorithm, an example of machine learning in python.

The apriori algorithm, an example of machine learning in python.

Machine learning isn’t a mythical beast that only appears every 6 years when you park a 2014 Toyota Prius next to a dumpster in Buford Wyoming. It is just a form of statistics that has gathered a lot of hype. It is about finding patterns in massive data sets, find a pattern and then apply that to something. There are a lot of different ways to find patterns and they are not necessarily complicated. Today I’m going to provide a…

Read More Read More

Why Visualize Data? Anscombe’s quartet and the many lives that could have been saved…

Why Visualize Data? Anscombe’s quartet and the many lives that could have been saved…

Any intellectual worth their salt can clearly look at a data set and see trends, right? Why would we need to bother ourselves with the “prettying” of data, which provides the cold hard facts we are looking for? What if I told you that the difference between a good data presentation and a poor one, was the difference between someone living or dying? We need to visualize data to see the whole picture. Whether we are looking at temperature data,…

Read More Read More

Twitter and Network Data Visualizations: A Python Tutorial

Twitter and Network Data Visualizations: A Python Tutorial

Note to the readers: For this tutorial I will be posting and explaining multiple code blocks. I will not be posting code in a “Pythonic” or more advanced format. It is my opinion that longer more laid out blocks are easier to understand. If you disagree, please tell me why! However, if you see code repeated, this is intentional. Social media has become one of the most prominent aspects of our culture; even a zeitgeist of the early 21st century….

Read More Read More

Is Data Mining the Same as Collection?

Is Data Mining the Same as Collection?

Data mining is a different type of task Data mining is a term used a lot out of context. For many, “data mining” is simply the act of acquiring and organizing data for cleaning, analysis, and presentation. I like to clarify the difference between the acquisition of data and actual data mining, where more advanced methods are applied. These methods are helpful in finding useful data in a large cluster. Patterns observed will help problem formulation and how you think…

Read More Read More