INST728E - Module 7. Geospatial Analysis

We've discussed temporal and textual relevance already, and now we'll move on to spatial relevance.

We can use this data to answer:

  • What content is being posted from around the disaster's location?
  • How does this content differ from global tweets?
In [1]:
%matplotlib inline

import datetime
import json
import os

import numpy as np

# For plotting
import matplotlib
import matplotlib.pyplot as plt

# For mapping
from mpl_toolkits.basemap import Basemap

from nltk.tokenize import TweetTokenizer

Carry Forward our Event Description

We'll keep this for the GPS bounding box, which we will use below.

In [29]:
crisisInfo = {
    "brussels": {
        "name": "Brussels Transit Attacks",
        "time": 1458629880, # Timestamp in seconds since 1/1/1970, UTC
                            # 22 March 2016, 6:58 UTC to 08:11 UTC
        "directory": "brussels",
        "keywords": ["brussels", "bomb", "belgium", "explosion"],
#         "place" : [
#             50.8503, # Latitude
#             4.3517 # Longitude
#         ],
#         "box": {
#             "lowerLeftLon": 2.54563,
#             "lowerLeftLat": 49.496899,
#             "upperRightLon": 6.40791,
#             "upperRightLat": 51.5050810,
#         }
        
       "place" : [
           38.887548, # Latitude
           -77.015183 # Longitude
       ],
     "box": {
           "lowerLeftLon": -77.163578,
           "lowerLeftLat": 38.80102,
           "upperRightLon": -76.89304,
           "upperRightLat": 38.998206,
       }
    },
}
In [23]:
# Replace the name below with your selected crisis
selectedCrisis = "brussels"

# Get data about our crisis
crisisMoment = crisisInfo[selectedCrisis]["time"] # When did it occur by epoch time
crisisTime = datetime.datetime.utcfromtimestamp(crisisMoment) # Convert to datetime
crisisTime = crisisTime.replace(second=0) # Flatten to a specific minute

# Print converted time
print ("Crisis Time:", crisisTime)
Crisis Time: 2016-03-22 06:58:00

Reading Relevant Tweets

Re-read our relevant tweets...

In [24]:
in_file_path = "/Users/cbuntain/relevant_tweet_output.json" # Replace this as necessary

relevant_tweets = []
with open(in_file_path, "r") as in_file:
    for line in in_file:
        relevant_tweets.append(json.loads(line.encode("utf8")))
        
print("Relevant Tweets:", len(relevant_tweets))
Relevant Tweets: 4687


Geographic Data

Twitter allows users to share their GPS locations when tweeting, but only about 2% of tweets have this information. We can extract this geospatial data to look at patterns in different locations.

  • General plotting
  • Filtering by a bounding box
  • Images from target location

Plotting GPS Data

Each tweet has a field called "coordinates" describing from where the tweet was posted. The field might be null if the tweet contains no location data, or it could contain bounding box information, place information, or GPS coordinates in the form of (longitude, latitude). We want tweets with this GPS data.

For more information on tweet JSON formats, check out https://dev.twitter.com/overview/api/tweets

In [6]:
# Save only those tweets with tweet['coordinates']['coordinates'] entity
def coordinate_filter(tweet):
    return "coordinates" in tweet and tweet["coordinates"] != None

geo_tweets = list(filter(coordinate_filter, relevant_tweets))
geo_tweet_count = len(geo_tweets)

print ("Number of Geo Tweets:", geo_tweet_count)
Number of Geo Tweets: 6
In [7]:
# Save only those tweets with tweet['place'] entity
def place_filter(tweet):
    return "place" in tweet and tweet["place"] != None and tweet["coordinates"] == None

placed_tweets = list(filter(place_filter, relevant_tweets))
placed_tweet_count = len(placed_tweets)

print ("Number of Place Tweets:", placed_tweet_count)
Number of Place Tweets: 68
In [8]:
# GPS-coded tweets vs. Place-coded tweets
print("GPS-coded Tweet:")
print(json.dumps(geo_tweets[0]["coordinates"], indent=2))
print(json.dumps(geo_tweets[0]["place"], indent=2))
print()

print("Place-coded Tweet:")
print(json.dumps(placed_tweets[0]["place"], indent=2))
GPS-coded Tweet:
{
  "type": "Point",
  "coordinates": [
    4.50716995,
    51.31918574
  ]
}
{
  "id": "ad0818e2fb208dde",
  "url": "https://api.twitter.com/1.1/geo/id/ad0818e2fb208dde.json",
  "place_type": "city",
  "name": "Brasschaat",
  "full_name": "Brasschaat, Belgi\u00eb",
  "country_code": "BE",
  "country": "Belgi\u00eb",
  "bounding_box": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          4.440425,
          51.268637
        ],
        [
          4.440425,
          51.351379
        ],
        [
          4.558809,
          51.351379
        ],
        [
          4.558809,
          51.268637
        ]
      ]
    ]
  },
  "attributes": {}
}

Place-coded Tweet:
{
  "id": "3cdad59a91d99400",
  "url": "https://api.twitter.com/1.1/geo/id/3cdad59a91d99400.json",
  "place_type": "city",
  "name": "Galway",
  "full_name": "Galway, Ireland",
  "country_code": "IE",
  "country": "Ireland",
  "bounding_box": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          -10.244525,
          52.967568
        ],
        [
          -10.244525,
          53.719058
        ],
        [
          -7.968484,
          53.719058
        ],
        [
          -7.968484,
          52.967568
        ]
      ]
    ]
  },
  "attributes": {}
}

Plotting GPS Data

Now that we have a list of all the tweets with GPS coordinates, we can plot from where in the world these tweets were posted. To make this plot, we can leverage the Basemap package to make a map of the world and convert GPS coordinates to (x, y) coordinates we can then plot.

In [9]:
# For each geo-coded tweet, extract its GPS coordinates
geoCoord = [x["coordinates"]["coordinates"] for x in geo_tweets]

# Now we build a map of the world using Basemap
land_color = 'lightgray'
water_color = 'lightblue'

# Create a nice, big figure
fig, ax = plt.subplots(figsize=(24,24))

# Build our map, focusing on most of the world and using
#  a Mercator project (many map projections exist)
worldMap = Basemap(projection='merc', llcrnrlat=-60, urcrnrlat=80,
                   llcrnrlon=-180, urcrnrlon=180, resolution='l')

# Make the map readable
worldMap.fillcontinents(color=land_color, lake_color=water_color, zorder=1)
worldMap.drawcoastlines()
worldMap.drawparallels(np.arange(-90.,120.,30.))
worldMap.drawmeridians(np.arange(0.,420.,60.))
worldMap.drawmapboundary(fill_color=water_color, zorder=0)
ax.set_title('World Tweets')

place_point = worldMap(
    crisisInfo[selectedCrisis]["place"][1], # Longitude
    crisisInfo[selectedCrisis]["place"][0], # Latitude
)
worldMap.scatter(place_point[0], place_point[1], 
                 s=1000, marker="o", color="blue", zorder=2,
                label="Disaster Point")

# Convert points from GPS coordinates to (x,y) coordinates
convPoints = [worldMap(p[0], p[1]) for p in geoCoord]

# Split out points for X,Y lists, which we'll use for our
#  standard Matplotlib plotting
x = [p[0] for p in convPoints]
y = [p[1] for p in convPoints]

# Plot the points on the map
worldMap.scatter(x, y, 
                 s=100, marker='x', color="red", zorder=2,
                label="GPS Tweets")

plt.legend()
plt.show()

Filtering By Location

We can use existing Geographic Information System (GIS) tools to determine from where a tweet was posted. For example, we could ask whether a particular tweet was posted from the United States. This filtering is often performed using shape files. For our purposes though, we established a bounding box along with the crisis data, so we'll use that as our filter for simplicity.

In [30]:
# Get the bounding box for our crisis
bBox = crisisInfo[selectedCrisis]["box"]

fig, ax = plt.subplots(figsize=(11,8.5))

# Create a new map to hold the shape file data
targetMap = Basemap(llcrnrlon=bBox["lowerLeftLon"], 
                    llcrnrlat=bBox["lowerLeftLat"], 
                    urcrnrlon=bBox["upperRightLon"], 
                    urcrnrlat=bBox["upperRightLat"], 
                    projection='merc',
                    resolution='h', area_thresh=1)

targetMap.fillcontinents(color=land_color, lake_color=water_color, 
                         zorder=1)
targetMap.drawcoastlines()
targetMap.drawstates()
targetMap.drawparallels(np.arange(-90.,120.,30.))
targetMap.drawmeridians(np.arange(0.,420.,60.))
targetMap.drawmapboundary(fill_color=water_color, zorder=0)
targetMap.drawcountries()

place_point = targetMap(
    crisisInfo[selectedCrisis]["place"][1], # Longitude
    crisisInfo[selectedCrisis]["place"][0], # Latitude
)
targetMap.scatter(place_point[0], place_point[1], 
                 s=100, marker="o", color="blue", zorder=2,
                label="Disaster Point")

# Now we build the polygon for filtering
# Convert from lon, lat of lower-left to x,y coordinates
llcCoord = targetMap(bBox["lowerLeftLon"], bBox["lowerLeftLat"])

# Same for upper-right corner
urcCoord = targetMap(bBox["upperRightLon"], bBox["upperRightLat"])

# Now make the polygon we'll us for filtering
boxPoints = np.array([[llcCoord[0], llcCoord[1]], 
                      [llcCoord[0], urcCoord[1]], 
                      [urcCoord[0], urcCoord[1]], 
                      [urcCoord[0], llcCoord[1]]])
boundingBox = matplotlib.patches.Polygon(boxPoints)

# For each geo-coded tweet, extract coordinates and convert 
# them to the Basemap space
convPoints = [targetMap(p[0], p[1]) for p in geoCoord]

# Track points within our bounding box
plottable = []

# For each point, check if it is within the bounding box or not
for point in convPoints:
    x = point[0]
    y = point[1]

    if ( boundingBox.contains_point((x, y))):
        plottable.append(point)

# Plot points in our target
targetMap.scatter([p[0] for p in plottable], [p[1] for p in plottable], s=100, 
                  marker='x', color="red", zorder=2)
            
print ("Tweets in Target Area:", len(plottable))
print ("Tweets outside:", (geo_tweet_count - len(plottable)))

plt.legend()
plt.show()
Tweets in Target Area: 0
Tweets outside: 6

Few GPS-coded Points Exist

So it's useful to use the place-coded tweets as well. Let's see where they are.

In [11]:
# This function takes a bounding box and finds its center point
#  NOTE: This is a not-so-great hack and can lead to strange behavior
#  (e.g., points in the middle of lakes or at random houses)
def flatten_bbox(tweet):
    lat = 0.0
    lon = 0.0
    
    p_count = 0
    for poly in tweet["place"]["bounding_box"]["coordinates"]:
        for p in poly:
            lat += p[1]
            lon += p[0]
            p_count += 1
        
    # Take the average location
    if ( p_count > 0 ):
        lat = lat / p_count
        lon = lon / p_count
        
    return (lon, lat)

# Extract flattened GPS coordinates
place_geocodes = [flatten_bbox(x) for x in placed_tweets]

# Now we build a map of the world using Basemap
land_color = 'lightgray'
water_color = 'lightblue'

# Create a nice, big figure
fig, ax = plt.subplots(figsize=(24,24))

# Build our map, focusing on most of the world and using
#  a Mercator project (many map projections exist)
worldMap = Basemap(projection='merc', llcrnrlat=-60, urcrnrlat=80,
                   llcrnrlon=-180, urcrnrlon=180, resolution='l')

# Make the map readable
worldMap.fillcontinents(color=land_color, lake_color=water_color, zorder=1)
worldMap.drawcoastlines()
worldMap.drawparallels(np.arange(-90.,120.,30.))
worldMap.drawmeridians(np.arange(0.,420.,60.))
worldMap.drawmapboundary(fill_color=water_color, zorder=0)
ax.set_title('Place-Coded Tweets')

place_point = worldMap(
    crisisInfo[selectedCrisis]["place"][1], # Longitude
    crisisInfo[selectedCrisis]["place"][0], # Latitude
)
worldMap.scatter(place_point[0], place_point[1], 
                 s=1000, marker="o", color="blue", zorder=2,
                label="Disaster Point")

# Convert points from GPS coordinates to (x,y) coordinates
convPoints = [worldMap(p[0], p[1]) for p in geoCoord]

# Split out points for X,Y lists, which we'll use for our
#  standard Matplotlib plotting
x = [p[0] for p in convPoints]
y = [p[1] for p in convPoints]

# Plot the points on the map
worldMap.scatter(x, y, 
                 s=100, marker='x', color="red", zorder=2,
                label="GPS Tweets")

# Place points in a different color
conv_place_points = [worldMap(p[0], p[1]) for p in place_geocodes]
# Plot the points on the map
worldMap.scatter([p[0] for p in conv_place_points], [p[1] for p in conv_place_points], 
                 s=100, marker='x', color="green", zorder=2,
                label="Placed Tweets")

plt.legend()
plt.show()