Browsed by
Category: Data Science

Riding the Waves: Ups and Downs of Gas Prices

Riding the Waves: Ups and Downs of Gas Prices

This April gas price report is based on data through March 2024. All insights provided are taken from the Gas Price Tracker which is updated monthly. Direction in Prices So Far Gas prices have generally trended downward since reaching a peak in June 2022. Although there were increases in September 2023 and March 2024, longer-term trends indicate a declining price index. While prices are higher than they’ve been in the past few years, current rates are slightly lower than the…

Read More Read More

Lazy Recruiting: Coding Tests for Analyst and Data Science Positions

Lazy Recruiting: Coding Tests for Analyst and Data Science Positions

In the competitive landscape of data analytics and science, the hiring process has become a battlefield not just for candidates, but also for companies vying for top talent. Amidst this, coding tests have emerged as a common hurdle. I argue that these tests are not just ineffective, but also a lazy approach to recruitment that could be doing more harm than good. When a interview process boils candidates to a “top n”, you’ll soon get candidates that just prepare to…

Read More Read More

Building a Gasoline Price Pipeline with EIA Data

Building a Gasoline Price Pipeline with EIA Data

In the dynamic realm of data science, constructing robust pipelines to collect, analyze, and forecast data is not just a necessity but a fundamental skill. In this guide, we’ll embark on a journey to set up an Extract, Transform, Load (ETL) pipeline utilizing the U.S. Energy Information Administration (EIA) API to gather gasoline price data for exploratory analysis and time-series modeling. The U.S. Energy Information Administration (EIA) serves as a cornerstone in providing impartial energy information critical for informed decision-making….

Read More Read More

Coding Something Tricky? Try Excel, No Really.

Coding Something Tricky? Try Excel, No Really.

In the realm of data science, where Python and R reign supreme, it might seem counterintuitive to bring Excel into the equation. However, Excel, a tool often associated with traditional business analysis, can serve as a powerful ally in the initial stages of algorithm development. This blog post explores the benefits of using Excel to work out pseudo code before transitioning to more sophisticated programming environments. Excel: A Visual and Interactive Sandbox Excel’s grid layout provides a natural environment for…

Read More Read More

R Vs Python: The Ultimate Showdown

R Vs Python: The Ultimate Showdown

When it comes to Data Analysis, both R and Python have become staples in the data science toolbox. But which one should you reach for when you’re about to dive into your next dataset? Let’s break it down. R: The OG of Data Analysis Pros: Cons: Python: The Jack of All Trades Pros: Cons: The Showdown: EDA Visualizations R’s ggplot2 is arguably more intuitive and offers a lot of customization. Python’s matplotlib and seaborn are powerful but require more effort…

Read More Read More

Are you Correlating Correctly?

Are you Correlating Correctly?

Well…are you? In a conversation between a few colleagues the concept of a correlation came up. I made a comment that we couldn’t be certain that the correlation formula was being applied correctly by our software, and therefore, we shouldn’t use it (typical black box). To my surprise, my colleague asked me “what’s the difference?” not knowing that using certain correlation formulas for certain data sets is inappropriate and inaccurate. It prompted me to write this article about the four…

Read More Read More

What is a p-value?

What is a p-value?

In this video I provide a short explanation of what a p-value is, and how it is used. P-values are essential in hypothesis testing using traditional statistics, but they are meaningful in other ways which I’ll explore in a future post. For now, I hope you enjoy this video.

Manage Python Environments with Anaconda

Manage Python Environments with Anaconda

Anaconda is a great tool to manage Python packages and host multiple environments easily. It gives a ton of “standard” packages on the initial install, and provides a simple platform to create and remove different Python states. In this video I’ll demonstrate the installation with some critical options, as well as show how to set up and move between environments.

How to use python with databases

How to use python with databases

Today we are going to explore some of the ways you can connect to sql databases with python. A few examples of why you would want to do this are: Querying large data sets without the need to store the data in an intermediate step. To combine data from several disparate databases. To use as a wrapper for an application (sqlite, mariadb, etc) Extracting data from an application with a hosted database. Let’s dive in and explore the different ways…

Read More Read More

The apriori algorithm, an example of machine learning in python.

The apriori algorithm, an example of machine learning in python.

Machine learning isn’t a mythical beast that only appears every 6 years when you park a 2014 Toyota Prius next to a dumpster in Buford Wyoming. It is just a form of statistics that has gathered a lot of hype. It is about finding patterns in massive data sets, find a pattern and then apply that to something. There are a lot of different ways to find patterns and they are not necessarily complicated. Today I’m going to provide a…

Read More Read More