Recent posts

Amazon Food Reviews, Part II

Amazon Food Reviews, Part II

Ok, so now that we explored the contents of the Amazon Fine Reviews data file, it's time to move on to the second task: How many reviews and products are perfectly positive and

Amazon Food Reviews, Part I

Amazon Food Reviews, Part I

It takes time to take advantage of all the great data science/ML resources out there like Coursera, Udacity, HackerRank, and Kaggle competitions. I won't bore you with the details, but granted that

Analyzing U.S. Veterans Incomes

Analyzing U.S. Veterans Incomes

Using data from the American Community Survey, I analyzed the disparity of male over female veteran and non-veteran median income. Which states have the greatest disparity in income between the sexes? This post

Faker and Feature Importance

Faker and Feature Importance

I ran into a pretty neat fake name generator that also generates fake addresses in different languages, called Faker: https://pypi.python.org/pypi/Faker. Using Faker, I generated a set of fake

Classifying business documents

Classifying business documents

In a project two years ago, I wrote a basic Naive Bayes classification script, largely from this tutorial off of scikit-learn's site, which was pretty much all that I needed. I typically use