Browsed by
Category: Python

Building a Gasoline Price Pipeline with EIA Data

Building a Gasoline Price Pipeline with EIA Data

In the dynamic realm of data science, constructing robust pipelines to collect, analyze, and forecast data is not just a necessity but a fundamental skill. In this guide, we’ll embark on a journey to set up an Extract, Transform, Load (ETL) pipeline utilizing the U.S. Energy Information Administration (EIA) API to gather gasoline price data for exploratory analysis and time-series modeling. The U.S. Energy Information Administration (EIA) serves as a cornerstone in providing impartial energy information critical for informed decision-making….

Read More Read More

Coding Something Tricky? Try Excel, No Really.

Coding Something Tricky? Try Excel, No Really.

In the realm of data science, where Python and R reign supreme, it might seem counterintuitive to bring Excel into the equation. However, Excel, a tool often associated with traditional business analysis, can serve as a powerful ally in the initial stages of algorithm development. This blog post explores the benefits of using Excel to work out pseudo code before transitioning to more sophisticated programming environments. Excel: A Visual and Interactive Sandbox Excel’s grid layout provides a natural environment for…

Read More Read More

R Vs Python: The Ultimate Showdown

R Vs Python: The Ultimate Showdown

When it comes to Data Analysis, both R and Python have become staples in the data science toolbox. But which one should you reach for when you’re about to dive into your next dataset? Let’s break it down. R: The OG of Data Analysis Pros: Cons: Python: The Jack of All Trades Pros: Cons: The Showdown: EDA Visualizations R’s ggplot2 is arguably more intuitive and offers a lot of customization. Python’s matplotlib and seaborn are powerful but require more effort…

Read More Read More

Run Makefile on Windows (…kinda)

Run Makefile on Windows (…kinda)

Makefiles are super cool, they help you create an executable process. Why run files manually or update everything when only a few files changed? Established in 1976 and not at all obsolete in 2022, Makefiles still have their place. Sounds great right? It is, unless you are running windows and don’t have access to a Unix (or Unix-like) operating system. I highly recommend getting a raspberry pi if you need a Unix system. It’s a flexible and light weight solution…

Read More Read More

Are you Correlating Correctly?

Are you Correlating Correctly?

Well…are you? In a conversation between a few colleagues the concept of a correlation came up. I made a comment that we couldn’t be certain that the correlation formula was being applied correctly by our software, and therefore, we shouldn’t use it (typical black box). To my surprise, my colleague asked me “what’s the difference?” not knowing that using certain correlation formulas for certain data sets is inappropriate and inaccurate. It prompted me to write this article about the four…

Read More Read More

Manage Python Environments with Anaconda

Manage Python Environments with Anaconda

Anaconda is a great tool to manage Python packages and host multiple environments easily. It gives a ton of “standard” packages on the initial install, and provides a simple platform to create and remove different Python states. In this video I’ll demonstrate the installation with some critical options, as well as show how to set up and move between environments.

Simple Password Management with Python

Simple Password Management with Python

Many uses of python involve connections to APIs, databases, or hosted information. Embedding a password in a script is always a bad idea. You can easily forget it is there, and inadvertently share confidential information with others. This is especially important if you regularly are posting content to a public GitHub. CSV files Storing important content doesn’t need to be complicated. Simply creating a CSV file that has the information you need in tabular format will certainly do the job….

Read More Read More

Pandas Basics: Timeseries

Pandas Basics: Timeseries

Check out the new video on timeseries, completing the tutorials of how to segment data for more effective aggregation. In the first two videos I showed how to do this in excel and libre office, the next video uses more technical methods. This video uses the pandas library with some help from datetime to slice up our data following the previous example. Pandas will be a common tool for data munging and is very versatile. The methods shown in this…

Read More Read More

How to use python with databases

How to use python with databases

Today we are going to explore some of the ways you can connect to sql databases with python. A few examples of why you would want to do this are: Querying large data sets without the need to store the data in an intermediate step. To combine data from several disparate databases. To use as a wrapper for an application (sqlite, mariadb, etc) Extracting data from an application with a hosted database. Let’s dive in and explore the different ways…

Read More Read More

The apriori algorithm, an example of machine learning in python.

The apriori algorithm, an example of machine learning in python.

Machine learning isn’t a mythical beast that only appears every 6 years when you park a 2014 Toyota Prius next to a dumpster in Buford Wyoming. It is just a form of statistics that has gathered a lot of hype. It is about finding patterns in massive data sets, find a pattern and then apply that to something. There are a lot of different ways to find patterns and they are not necessarily complicated. Today I’m going to provide a…

Read More Read More