Pulling social network data for making graphs
# First we need to make sure we have tweepy installed...
import pip
pip.main(['install', 'tweepy'])
%matplotlib inline
import time
import json
import networkx as nx
Twitter's API is most useful and flexible but takes several steps to configure. To get access to the API, you first need to have a Twitter account and have a mobile phone number (or any number that can receive text messages) attached to that account. Then, we'll use Twitter's developer portal to create an "app" that will then give us the keys tokens and keys (essentially IDs and passwords) we will need to connect to the API.
So, in summary, the general steps are:
We will then plug these four strings into the code below.
# For our first piece of code, we need to import the package
# that connects to Twitter. Tweepy is a popular and fully featured
# implementation.
import tweepy
For more in-depth instructions for creating a Twitter account and/or setting up a Twitter account to use the following code, I will provide a walkthrough on configuring and generating this information.
First, we assume you already have a Twitter account. If this is not true, either create one real quick or follow along. See the attached figures.
Step 1. Create a Twitter account If you haven't already done this, do this now at Twitter.com.
Step 2. Setting your mobile number Log into Twitter and go to "Settings." From there, click "Mobile" and fill in an SMS-enabled phone number. You will be asked to confirm this number once it's set, and you'll need to do so before you can create any apps for the next step.
# Use the strings from your Twitter app webpage to populate these four
# variables. Be sure and put the strings BETWEEN the quotation marks
# to make it a valid Python string.
consumer_key = "xxx"
consumer_secret = "xxx"
access_token = "xxx"
access_secret = "xxx"
Once we have the authentication details set, we can connect to Twitter using the Tweepy OAuth handler, as below.
# Now we use the configured authentication information to connect
# to Twitter's API
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
print("Connected to Twitter!")
Now that we are connected to Twitter, let's do a brief check that we can read tweets by pulling the first few tweets from our own timeline (or the account associated with your Twitter app) and printing them.
# Get tweets from our timeline
public_tweets = api.home_timeline()
# print the first five authors and tweet texts
for tweet in public_tweets[:5]:
print (tweet.author.screen_name, tweet.author.name, "said:", tweet.text)
As mentioned, Twitter serves results in pages. To get all results, we can use Tweepy's Cursor implementation, which handles this iteration through pages for us in the background.
me = api.me()
# Handler for waiting if we exhaust a rate limit
def limit_handled(cursor):
while True:
try:
yield cursor.next()
except tweepy.RateLimitError:
# Determine how long we need to wait...
s = api.rate_limit_status()
dif = s["resources"]['friends']['/friends/list']['reset'] - int(time.time())
# If we have a wait time, wait for it
if ( dif > 0 ):
print("Sleeping for %d seconds..." % dif)
time.sleep(dif)
g = nx.Graph()
target = "codybuntain"
total_friends = 20
# Get the first few friends of mine and first few of each of them
# and add their links to the graph
for friend in limit_handled(tweepy.Cursor(api.friends, id=target).items(total_friends)):
g.add_node(friend.screen_name)
g.add_edge(target, friend.screen_name)
print("Processing:", friend.screen_name)
for friend_of_friend in limit_handled(tweepy.Cursor(api.friends, id=friend.screen_name).items(total_friends)):
g.add_node(friend_of_friend.screen_name)
g.add_edge(friend.screen_name, friend_of_friend.screen_name)
print("\t->", friend_of_friend.screen_name)
len(g.nodes())
subs = [x[0] for x in g.degree() if x[1] > 0]
nx.draw(nx.subgraph(g, subs))
nx.write_graphml(g, "twitter_codybuntain.graphml")