Having extracted a set of relevant tweets in the previous module, we will now explore the media posted in these tweets.
%matplotlib inline
import datetime
import json
import sys
import os
# For displaying HTML
from IPython.display import HTML, display
We'll keep this event description.
crisisInfo = {
"brussels": {
"name": "Brussels Transit Attacks",
"time": 1458629880, # Timestamp in seconds since 1/1/1970, UTC
# 22 March 2016, 6:58 UTC to 08:11 UTC
"directory": "brussels",
"keywords": ["brussels", "bomb", "belgium", "explosion"],
"box": {
"lowerLeftLon": 2.54563,
"lowerLeftLat": 49.496899,
"upperRightLon": 6.40791,
"upperRightLat": 51.5050810,
}
},
}
# Replace the name below with your selected crisis
selectedCrisis = "brussels"
We stored relevant tweets in a file at the end of the last module. To use that data here, let's go ahead and read that data into a list.
in_file_path = "/Users/cbuntain/relevant_tweet_output.json" # Replace this as necessary
relevant_tweets = []
with open(in_file_path, "r") as in_file:
for line in in_file:
relevant_tweets.append(json.loads(line.encode("utf8")))
len(relevant_tweets)
Tweets that contain images or video have an associated media
entity in the entities
field.
We'll use that to extract URLs that point to media files, which we can then use to find frequently shared images.
# A map for media counts
media_map = {}
# For mapping image IDs to data
media_info_map = {}
# For each tweet, check if it has a media entity
for tweet in relevant_tweets:
# If no "media" field, skip
if ( "media" not in tweet["entities"] ):
continue
# Get a list of shared media
mediaList = tweet["entities"]["media"]
# For each piece of media, get its URL and update the map
for media in mediaList:
media_id = media["id"]
media_map[media_id] = media_map.get(media_id, 0) + 1
media_info_map[media_id] = media
print ("Unique Media:", len(media_map.keys()))
# What are the most frequently shared media
sortedMedia = sorted(media_map, key=media_map.get, reverse=True)
print ("Top Media:")
for media_id in sortedMedia[:30]:
media = media_info_map[media_id]
print("\tID:", media_id, "Count:", media_map[media_id], "Type:", media["type"])
print("\t%s" % media["expanded_url"])
# Display the top images
for media_id in sortedMedia[:30]:
media = media_info_map[media_id]
print("\tID:", media_id)
display(HTML("<img src=\"%s\"/>" % media["media_url"]))