inside out

Ok, so now that we explored the contents of the Amazon Fine Reviews data file, it's time to move on to the second task: How many reviews and products are perfectly positive and perfectly negative in sentiment?

TextBlob assigns a polarity value from -1.0 to +1.0, and scores on these extremes are usually outliers. But for this exercise, I'm going to analyze what makes up these outliers, both in terms of quantifying the number of products and comments, as well as in Part III, the key features mentioned in the comments.

boxplot

#polarity by ProductID
data_pid = data2.groupby(['ProductId'])['tb_polarity', 'Score'].mean().reset_index()

#polarity by Profile Name
data_pn = data2.groupby(['ProfileName'])['tb_polarity', 'Score'].mean().reset_index()

#Perfect Review Sets
pf_negr = data2.loc[(data2['tb_polarity'] == -1.0)]
pf_posr = data2.loc[(data2['tb_polarity'] == 1.0)]

screenshot But, here we can see how the polarity from TextBlob doesn't correctly analyze text accurately all the time. Notice how for 106 the summary reads, "disappointing", and "text_cln" reads "not what I was expecting in terms of the ..." and yet it assigns it a polarity score of 1.0.

#Perfect Review by ProductID
pf_pos_pid = data_pid.loc[(data_pid['tb_polarity'] == 1.0)]
pf_neg_pid = data_pid.loc[(data_pid['tb_polarity'] == -1.0)]

Summary of Perfectly Positive and Perfectly Negative

#How many reviews are perfectly positive? 
print('The number of perfectly positive sentiment reviews are',len(pf_posr))
print(len(pf_posr)/len(data2), 'of all reviews have 1.0 positive sentiment')

#perfectly negative? 
print('The number of perfectly negative sentiment reviews are', len(pf_negr))

print(len(pf_negr)/len(data2), 'of all reviews have -1.0 negative sentiment')

#How many productid's are perfectly positive?
print('Products with perfectly positive average reviews are', len(pf_pos_pid))

print(len(pf_pos_pid)/data2['ProductId'].nunique(), 'of all products have a mean 1.0 positive sentiment')

#perfectly negative?
print('Products with perfectly negative average reviews are', len(pf_neg_pid))

print(len(pf_neg_pid)/data2['ProductId'].nunique(), 'of all products have a mean -1.0 negative sentiment')

Summary of Reviews

The number of perfectly positive sentiment reviews are 2888

0.00508044626302 of all reviews have 1.0 positive sentiment

The number of perfectly negative sentiment reviews are 303

0.000533024659867 of all reviews have -1.0 negative sentiment

Summary of Products

Products with perfectly positive average reviews are 190

0.0025586468798 of all products have a mean 1.0 positive sentiment

Products with perfectly negative average reviews are 24

0.000323197500606 of all products have a mean -1.0 negative sentiment

What this tells us is that there are 9.5 more perfectly positive reviews than negative, which is good. Also, 7.9 times more products have an average perfectly positive set of reviews compared to products with perfectly negative reviews.