INST728E - Module 8. Sentiment Analysis

A common task in social media analytics is sentiment analysis, where we measure how positive or negative a message is by its text. Note that we often omit emoji despite their potential utility. Recent research is working to integrate that information.

For now though, we'll look into sentiment towards our event over time.

For that, we'll use the temporal methods from Module 5 and introduce some new packages for sentiment.

In [2]:
%matplotlib inline

import datetime
import json

import numpy as np

# For plotting
import matplotlib.pyplot as plt

# NLTK's sentiment code
import nltk
import nltk.sentiment.util
import nltk.sentiment.vader

# TextBlob provides its own sentiment analysis
from textblob import TextBlob
/Users/cbuntain/Development/thirdparty/anaconda3/lib/python3.6/site-packages/nltk/twitter/__init__.py:20: UserWarning: The twython library has not been installed. Some functionality from the twitter package will not be available.
  warnings.warn("The twython library has not been installed. "

Carry Forward Disaster Data

In [3]:
crisisInfo = {
    "brussels": {
        "name": "Brussels Transit Attacks",
        "time": 1458629880, # Timestamp in seconds since 1/1/1970, UTC
                            # 22 March 2016, 6:58 UTC to 08:11 UTC
        "directory": "brussels",
        "keywords": ["brussels", "bomb", "belgium", "explosion"],
        "place" : [
            50.8503, # Latitude
            4.3517 # Longitude
        ],
        "box": {
            "lowerLeftLon": 2.54563,
            "lowerLeftLat": 49.496899,
            "upperRightLon": 6.40791,
            "upperRightLat": 51.5050810,
        }
    },
}
In [4]:
# Replace the name below with your selected crisis
selectedCrisis = "brussels"

Reading Relevant Tweets

Re-read our relevant tweets...

In [5]:
in_file_path = "/Users/cbuntain/relevant_tweet_output.json" # Replace this as necessary

relevant_tweets = []
with open(in_file_path, "r") as in_file:
    for line in in_file:
        relevant_tweets.append(json.loads(line.encode("utf8")))
        
print("Relevant Tweets:", len(relevant_tweets))
Relevant Tweets: 4687

Temporal Ordering

In [6]:
# Twitter's time format, for parsing the created_at date
timeFormat = "%a %b %d %H:%M:%S +0000 %Y"

# Frequency map for tweet-times
rel_frequency_map = {}
for tweet in relevant_tweets:
    # Parse time
    currentTime = datetime.datetime.strptime(tweet['created_at'], timeFormat)

    # Flatten this tweet's time
    currentTime = currentTime.replace(second=0)

    # If our frequency map already has this time, use it, otherwise add
    extended_list = rel_frequency_map.get(currentTime, [])
    extended_list.append(tweet)
    rel_frequency_map[currentTime] = extended_list
    
# Fill in any gaps
times = sorted(rel_frequency_map.keys())
firstTime = times[0]
lastTime = times[-1]
thisTime = firstTime

# We want to look at per-minute data, so we fill in any missing minutes
timeIntervalStep = datetime.timedelta(0, 60)    # Time step in seconds
while ( thisTime <= lastTime ):

    rel_frequency_map[thisTime] = rel_frequency_map.get(thisTime, [])
        
    thisTime = thisTime + timeIntervalStep

# Count the number of minutes
print ("Start Time:", firstTime)
print ("Stop Time:", lastTime)
print ("Processed Times:", len(rel_frequency_map))
Start Time: 2016-03-22 07:01:00
Stop Time: 2016-03-22 11:59:00
Processed Times: 299
In [7]:
fig, ax = plt.subplots()
fig.set_size_inches(11, 8.5)

plt.title("Tweet Frequencies")

sortedTimes = sorted(rel_frequency_map.keys())
postFreqList = [len(rel_frequency_map[x]) for x in sortedTimes]

smallerXTicks = range(0, len(sortedTimes), 30)
plt.xticks(smallerXTicks, [sortedTimes[x] for x in smallerXTicks], rotation=90)

xData = range(len(sortedTimes))

ax.plot(xData, postFreqList, color="blue", label="Posts")

ax.grid(b=True, which=u'major')
ax.legend()

plt.show()

Content and Sentiment Analysis

"Sentiment analysis" is used to figure out how people feel about a specific topic. Some tools also provide measurements like subjectivity/objectivity of text content.

We'll cover:

  • Topically Relevant Filtering
  • Sentiment, Subjectivity, and Objectivity

Sentiment Analysis w/ TextBlob

TextBlob is a nice Python package that provides a number of useful text processing capabilities. We will use it for sentiment analysis to calculate polarity and subjectivity for each relevant tweet.

In [8]:
# Sentiment values
polarVals_tb = []
objVals = []

# For each minute, pull the tweet text and search for the keywords we want
for t in sortedTimes:
    tweets = rel_frequency_map[t]
    
    # For calculating averages
    localPolarVals = []
    localObjVals = []
    
    for tweet in tweets:
        tweetString = tweet["text"]

        blob = TextBlob(tweetString)
        polarity = blob.sentiment.polarity
        objectivity = blob.sentiment.subjectivity
        
        localPolarVals.append(polarity)
        localObjVals.append(objectivity)
        
    # Add data to the polarity and objectivity measure arrays
    if ( len(tweets) > 0 ):
        polarVals_tb.append(np.mean(localPolarVals))
        objVals.append(np.mean(localObjVals))
    else:
        polarVals_tb.append(0.0)
        objVals.append(0.0)

        
# Now plot this sentiment data
fig, ax = plt.subplots()
fig.set_size_inches(11, 8.5)

plt.title("Sentiment")
plt.xticks(smallerXTicks, [sortedTimes[x] for x in smallerXTicks], rotation=90)

xData = range(len(sortedTimes))

# Polarity is scaled [-1, 1], for negative and positive polarity
ax.plot(xData, polarVals_tb, label="Polarity")

# Subjetivity is scaled [0, 1], with 0 = objective, 1 = subjective
ax.plot(xData, objVals, label="Subjectivity")

ax.legend()
ax.grid(b=True, which=u'major')

plt.show()

Sentiment Analysis with Vader

In [9]:
vader = nltk.sentiment.vader.SentimentIntensityAnalyzer()
In [10]:
# Sentiment values
polarVals_vader = []

# For each minute, pull the tweet text and search for the keywords we want
for t in sortedTimes:
    tweets = rel_frequency_map[t]
    
    # For calculating averages
    localPolarVals = []
    
    for tweet in tweets:
        tweetString = tweet["text"]

        polarity = vader.polarity_scores(tweetString)["compound"]
        
        localPolarVals.append(polarity)
        
    # Add data to the polarity and objectivity measure arrays
    if ( len(tweets) > 0 ):
        polarVals_vader.append(np.mean(localPolarVals))
    else:
        polarVals_vader.append(0.0)

        
# Now plot this sentiment data
fig, ax = plt.subplots()
fig.set_size_inches(11, 8.5)

plt.title("Sentiment")
plt.xticks(smallerXTicks, [sortedTimes[x] for x in smallerXTicks], rotation=90)

xData = range(len(sortedTimes))

# Polarity is scaled [-1, 1], for negative and positive polarity
ax.plot(xData, polarVals_vader, label="Polarity")

ax.legend()
ax.grid(b=True, which=u'major')

plt.ylim((-0.95, 0.95))
plt.show()

Vader vs. TextBlob

In [11]:
# Now plot this sentiment data
fig, ax = plt.subplots()
fig.set_size_inches(11, 8.5)

plt.title("Sentiment")
plt.xticks(smallerXTicks, [sortedTimes[x] for x in smallerXTicks], rotation=90)

xData = range(len(sortedTimes))

# Polarity is scaled [-1, 1], for negative and positive polarity
ax.plot(xData, polarVals_vader, label="VADER")
ax.plot(xData, polarVals_tb, label="TextBlob")

ax.legend()
ax.grid(b=True, which=u'major')

plt.ylim((-0.95, 0.95))
plt.show()

Sentiment Extremes

While sentiment analysis may seem a little silly given our focus on disaster, for some types of events, content that is especially negative or positive is interesting.

For example, who might be posting extremely positive tweets in the wake of a terrorist attack?

To this end, let us use VADER to find the top 5 most positive and negative tweets in our dataset.

In [12]:
top_k = 10
In [13]:
# Calculate sentiment for each relevant tweet
sentiment_pairs = [(tweet, vader.polarity_scores(tweet["text"])["compound"]) 
                   for tweet in relevant_tweets]

sorted_tweets = sorted(sentiment_pairs, key=lambda x: x[1])
In [14]:
# Most negative tweets
for tweet, sentiment in sorted_tweets[:top_k]:
    print("Author:", tweet["user"]["screen_name"], "Sentiment:", sentiment)
    print("Text:\n%s" % tweet["text"], "\n")
Author: ashiqhussainta2 Sentiment: -0.9744
Text:
RT @RanaHarbi: Lebanon's #Hezbollah condemns the terrorist attacks in #Brussels #Belgium
Describes the terrorists as evil, criminal &amp; blood… 

Author: Adel42Timmy Sentiment: -0.974
Text:
RT @pzf: TERROR ATTACK:
- 28 dead
- 55 injured
- 1 bomb at metro station
- 2 suicide bombers at Brussels Airport
https://t.co/qbW9ZaJfkE 

Author: chloexxoo50 Sentiment: -0.974
Text:
RT @pzf: TERROR ATTACK:
- 28 dead
- 55 injured
- 1 bomb at metro station
- 2 suicide bombers at Brussels Airport
https://t.co/qbW9ZaJfkE 

Author: katecarm Sentiment: -0.9719
Text:
RT @mashable: Latest on #Brussels:
-15 dead, 55 injured in metro attack
-13 dead in airport suicide attack
https://t.co/DoWCTwx9vi https://… 

Author: mari_osses Sentiment: -0.9719
Text:
RT @mashable: Latest on #Brussels:
-15 dead, 55 injured in metro attack
-13 dead in airport suicide attack
https://t.co/DoWCTwx9vi https://… 

Author: akundra1609 Sentiment: -0.9657
Text:
#Brussels
Memorise this line to live in "Secular"India
Terrorism has no Religion
Terrorism has no Religion.
Terrorism has no Religion. 

Author: wesleydenton Sentiment: -0.9601
Text:
RT @DailySignal: A tragic day in Belgium, terror attacks have killed at least 13, days following arrest of Paris attack fugative https://t.… 

Author: DrFerdowsi Sentiment: -0.959
Text:
RT @debraruh: Sad news. Hate seeing violence it hurts all of us. The violence in #Paris #Turkey &amp; now #Brussels. #PrayForBrussels https://t… 

Author: PickeringTj Sentiment: -0.9559
Text:
RT @BBCBreaking: "Many dead and injured" in Brussels attacks, airport blast "probably caused by suicide bomb" - officials  https://t.co/Nhd… 

Author: BBCPankajP Sentiment: -0.9559
Text:
RT @BBCBreaking: "Many dead and injured" in Brussels attacks, airport blast "probably caused by suicide bomb" - officials  https://t.co/Nhd… 

In [15]:
# Most positive tweets
for tweet, sentiment in sorted_tweets[-top_k:]:
    print("Author:", tweet["user"]["screen_name"], "Sentiment:", sentiment)
    print("Text:\n%s" % tweet["text"], "\n")
Author: Voodoo_Child_ Sentiment: 0.8931
Text:
RT @mastersreality: Musicians/artists always treated with kindness &amp; respect in Belgium. We will never stop coming to celebrate music w you… 

Author: VangelisVNZ Sentiment: 0.8934
Text:
RT @MouvEuropeen_Fr: Our strongest support to @EMInternational &amp; @EMBelgium , #Europe will stay strong &amp; united at your side https://t.co/O… 

Author: hc_jadc Sentiment: 0.9097
Text:
RT @JLin7: @Nelly_Mo honored to be in your photobomb!! thanks for the love 

Author: SwellBeatyu Sentiment: 0.91
Text:
RT @forgottenkam: brussels has always been one of my favorite places to play, my heart is with all of you. stay safe. love you immensely. 

Author: PyriteSoulfox Sentiment: 0.9217
Text:
major relief hearing from my friends in Brussels they are ok; wishing everyone else waiting for news from loved ones the same. 

Author: alexshamiltons Sentiment: 0.9217
Text:
11:11 my parents alex eliza and john happy and safe 
kat wins the lotto 
My parents letting me transition
Everyone in brussels staying safe 

Author: annagumbau Sentiment: 0.9348
Text:
Dear friends, safe and sound in Brussels. Thanks to all for your kind thoughts! 

Author: Steeler777 Sentiment: 0.9378
Text:
RT @Benna80: Now My thoughts go out to all  persons concerned in Brussels. 
It would be nice to have more love, peace and respect! 😪 

Author: david_earp Sentiment: 0.9458
Text:
RT @MikeBrightman: On a sad day (re Brussels) what a lovely. hopeful, inspiring, generous and heart warming smile! Thank you! https://t.co/… 

Author: gbinscotland Sentiment: 0.9531
Text:
RT @GBinEurope: This morning, we're praying for #Belgium. May God provide wisdom, comfort, love &amp; peace to its nation. https://t.co/8EcIjFD… 

Compared to TextBlob

In [16]:
# Calculate sentiment for each relevant tweet
sentiment_pairs = [(tweet, TextBlob(tweet["text"]).sentiment.polarity) 
                   for tweet in relevant_tweets]

sorted_tweets = sorted(sentiment_pairs, key=lambda x: x[1])

print("Most Negative Tweets:")
# Most negative tweets
for tweet, sentiment in sorted_tweets[:top_k]:
    print("Author:", tweet["user"]["screen_name"], "Sentiment:", sentiment)
    print("Text:\n%s" % tweet["text"], "\n")
    
print("------------------------")
print("Most Positive Tweets:")
# Most positive tweets
for tweet, sentiment in sorted_tweets[-top_k:]:
    print("Author:", tweet["user"]["screen_name"], "Sentiment:", sentiment)
    print("Text:\n%s" % tweet["text"], "\n")
Most Negative Tweets:
Author: andycorkhill1 Sentiment: -1.0
Text:
Bombs going off in Brussels airport. Horrible cunts these terrorists 

Author: Andrewhall19892 Sentiment: -1.0
Text:
RT @NicolaSturgeon: Horrific news from Brussels - thoughts with everyone involved. 

Author: BlackburnJA Sentiment: -1.0
Text:
Horrific news from Brussels. Thoughts and prayers with Belgium. 

Author: Bilco9 Sentiment: -1.0
Text:
RT @NicolaSturgeon: Horrific news from Brussels - thoughts with everyone involved. 

Author: HanumeAmir Sentiment: -1.0
Text:
RT @Grybauskaite_LT: Terrible acts of terrorism in #Brussels. Stand in solidarity with all those affected. 

Author: abi_anike Sentiment: -1.0
Text:
RT @GavinROfficial: Oh my gosh. It's just terrifying. #Brussels airport .... Thoughts and well wishes to anyone caught up in the chaos. 

Author: annabing Sentiment: -1.0
Text:
Horrible news coming out of Brussels #staystrongbrussels 

Author: scott_96 Sentiment: -1.0
Text:
Shocking news,when is all this going to stop - a end needs put to it..! #Brussels 

Author: AaronLucasCoach Sentiment: -1.0
Text:
Thoughts are with everyone affected by the terrible events in #Brussels 

Author: mywriteidea Sentiment: -1.0
Text:
So horrible. My thoughts are with the people in Brussels... #BrusselsAirport explosion
https://t.co/XGjB7a7D9B 

------------------------
Most Positive Tweets:
Author: JarnoReyniers Sentiment: 0.85
Text:
RT @RealSexNotes: If she can cook, communicates well, has goals &amp; is faithful, beautiful to you &amp; has bomb pussy, just stop there &amp; give he… 

Author: HaldoConnell Sentiment: 0.9765625
Text:
RT @clubparadiseco: FREE GIVEAWAY!!

Just Retweet to enter!
There will be one winner! 

1 x Nike Bomber Jacket

https://t.co/0h1bUjDwMj htt… 

Author: alfbrown99 Sentiment: 1.0
Text:
Wish all but the best out to Brussels this morning👀👀 

Author: lovatospancakes Sentiment: 1.0
Text:
my friends who lives in brussels parents and best friend who was supposed to take a plane today aren’t responding i fucidlgjfn,vjdkfnxmc 

Author: iammc43 Sentiment: 1.0
Text:
Okay, but what about the terror attacs in turkey????? #media #brussels #paris also #PRAYFORTURKEY !!!! https://t.co/3JoaDthVYv 

Author: buscadorelectri Sentiment: 1.0
Text:
RT @RiKo_Rick12: Wonderful Views That Will Please Your Eyes https://t.co/PFUJ1H3P5E #brussels #LeaveYourMark #twitterinfivewords #TEAM1DIAN… 

Author: riroudiag Sentiment: 1.0
Text:
RT @jeffstinco: Now playing: "Perfect World"
https://t.co/pfcH3AfasX #Brussels #Bruxelles 

Author: May_Asana Sentiment: 1.0
Text:
RT @GoldenJK0901: ARMY BOMB SEA SO BEAUTIFUL!!! Kobe lets go!! (Cr _EATME) https://t.co/G5hhnXCMlX 

Author: SoyNino_ATM9 Sentiment: 1.0
Text:
RT @InfosFrancaises: ⚠ALERTE EXPLOSIONS BELGIQUE : D’autres bombes ont été découvertes dans l'aéroport de Bruxelles. https://t.co/2gBghD3cly 

Author: alexya_w Sentiment: 1.0
Text:
RT @L0oiic: Be brave Belgium! Courage aux victimes et a leur proches! Condoléances aux familles! #PrayForBruxelles #22Mars2016 

In [ ]: