ENPM809G - Collecting Social Network Data

Pulling social network data for making graphs

In [1]:
# First we need to make sure we have tweepy installed...
import pip

pip.main(['install', 'tweepy'])
Requirement already satisfied: tweepy in /Users/cbuntain/Development/thirdparty/anaconda3/lib/python3.6/site-packages
Requirement already satisfied: six>=1.7.3 in /Users/cbuntain/Development/thirdparty/anaconda3/lib/python3.6/site-packages (from tweepy)
Requirement already satisfied: requests>=2.4.3 in /Users/cbuntain/Development/thirdparty/anaconda3/lib/python3.6/site-packages (from tweepy)
Requirement already satisfied: requests-oauthlib>=0.4.1 in /Users/cbuntain/Development/thirdparty/anaconda3/lib/python3.6/site-packages (from tweepy)
Requirement already satisfied: idna<2.7,>=2.5 in /Users/cbuntain/Development/thirdparty/anaconda3/lib/python3.6/site-packages (from requests>=2.4.3->tweepy)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /Users/cbuntain/Development/thirdparty/anaconda3/lib/python3.6/site-packages (from requests>=2.4.3->tweepy)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /Users/cbuntain/Development/thirdparty/anaconda3/lib/python3.6/site-packages (from requests>=2.4.3->tweepy)
Requirement already satisfied: certifi>=2017.4.17 in /Users/cbuntain/Development/thirdparty/anaconda3/lib/python3.6/site-packages (from requests>=2.4.3->tweepy)
Requirement already satisfied: oauthlib>=0.6.2 in /Users/cbuntain/Development/thirdparty/anaconda3/lib/python3.6/site-packages (from requests-oauthlib>=0.4.1->tweepy)
Out[1]:
0
In [2]:
%matplotlib inline

import time
import json
import networkx as nx


Twitter API

Twitter's API is most useful and flexible but takes several steps to configure. To get access to the API, you first need to have a Twitter account and have a mobile phone number (or any number that can receive text messages) attached to that account. Then, we'll use Twitter's developer portal to create an "app" that will then give us the keys tokens and keys (essentially IDs and passwords) we will need to connect to the API.

So, in summary, the general steps are:

  1. Have a Twitter account,
  2. Configure your Twitter account with your mobile number,
  3. Create an app on Twitter's developer site, and
  4. Generate consumer and access keys and secrets.

We will then plug these four strings into the code below.

In [5]:
# For our first piece of code, we need to import the package 
# that connects to Twitter. Tweepy is a popular and fully featured
# implementation.

import tweepy

Creating Twitter Credentials

For more in-depth instructions for creating a Twitter account and/or setting up a Twitter account to use the following code, I will provide a walkthrough on configuring and generating this information.

First, we assume you already have a Twitter account. If this is not true, either create one real quick or follow along. See the attached figures.

  • Step 1. Create a Twitter account If you haven't already done this, do this now at Twitter.com.

  • Step 2. Setting your mobile number Log into Twitter and go to "Settings." From there, click "Mobile" and fill in an SMS-enabled phone number. You will be asked to confirm this number once it's set, and you'll need to do so before you can create any apps for the next step.

  • Step 3. Create an app in Twitter's Dev site Go to (apps.twitter.com), and click the "Create New App" button. Fill in the "Name," "Description," and "Website" fields, leaving the callback one blank (we're not going to use it). Note that the website must be a fully qualified URL, so it should look like: http://test.url.com. Then scroll down and read the developer agreement, checking that agree, and finally click "Create your Twitter application."

  • Step 4. Generate keys and tokens with this app After your application has been created, you will see a summary page like the one below. Click "Keys and Access Tokens" to view and manage keys. Scroll down and click "Create my access token." After a moment, your page should refresh, and it should show you four long strings of characters and numbers, a consume key, consumer secret, an access token, and an access secret (note these are case-sensitive!). Copy and past these four strings into the quotes in the code cell below.

In [6]:
# Use the strings from your Twitter app webpage to populate these four 
# variables. Be sure and put the strings BETWEEN the quotation marks
# to make it a valid Python string.

consumer_key = "xxx"
consumer_secret = "xxx"
access_token = "xxx"
access_secret = "xxx"

Connecting to Twitter

Once we have the authentication details set, we can connect to Twitter using the Tweepy OAuth handler, as below.

In [7]:
# Now we use the configured authentication information to connect
# to Twitter's API
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

print("Connected to Twitter!")
Connected to Twitter!

Testing our Connection

Now that we are connected to Twitter, let's do a brief check that we can read tweets by pulling the first few tweets from our own timeline (or the account associated with your Twitter app) and printing them.

In [8]:
# Get tweets from our timeline
public_tweets = api.home_timeline()

# print the first five authors and tweet texts
for tweet in public_tweets[:5]:
    print (tweet.author.screen_name, tweet.author.name, "said:", tweet.text)
PARAMS: {}
NASAhistory NASA History Office said: #OTD 1965, Ranger 8 impacted the Moon as planned - providing close up pictures until 1/2 second prior to impact on… https://t.co/zlgokHqjnZ
smod4real Sweet Meteor O'Death said: Only time will tell if the eruption of Indonesia’s Mt. Cinnabon was as delicious as it sounds.
ENERGY Energy Department said: RT @JLab_News: Shoutout to our 125 staff #engineers who are #InspiringWonder everyday  with the work they do to enable nuclear physics rese…
smod4real Sweet Meteor O'Death said: I will indemnify* anyone arrested for wearing my shirts at the polls.

*As used herein, the term “indemnify” means… https://t.co/07uzJk1a57
HUDgov HUDgov said: HUD awards $35 million to public housing authorities to help low-income residents find jobs, become self-sufficient… https://t.co/synl9drQ0c

Dealing with Pages

As mentioned, Twitter serves results in pages. To get all results, we can use Tweepy's Cursor implementation, which handles this iteration through pages for us in the background.

In [9]:
me = api.me()
PARAMS: {}
PARAMS: {'screen_name': b'chitesting17'}
In [ ]:
 
In [10]:
# Handler for waiting if we exhaust a rate limit
def limit_handled(cursor):
    while True:
        try:
            yield cursor.next()
        except tweepy.RateLimitError:
            # Determine how long we need to wait...
            s = api.rate_limit_status()
            dif = s["resources"]['friends']['/friends/list']['reset'] - int(time.time())
            
            # If we have a wait time, wait for it
            if ( dif > 0 ):
                print("Sleeping for %d seconds..." % dif)
                time.sleep(dif)
In [11]:
g = nx.Graph()

target = "codybuntain"
total_friends = 20

# Get the first few friends of mine and first few of each of them
#  and add their links to the graph
for friend in limit_handled(tweepy.Cursor(api.friends, id=target).items(total_friends)):
    g.add_node(friend.screen_name)
    g.add_edge(target, friend.screen_name)
    print("Processing:", friend.screen_name)
    
    for friend_of_friend in limit_handled(tweepy.Cursor(api.friends, id=friend.screen_name).items(total_friends)):
        g.add_node(friend_of_friend.screen_name)
        g.add_edge(friend.screen_name, friend_of_friend.screen_name)
        print("\t->", friend_of_friend.screen_name)
PARAMS: {'cursor': b'-1', 'id': b'codybuntain'}
Processing: lajello
PARAMS: {'cursor': b'-1', 'id': b'lajello'}
	-> krishna_dubba
	-> moritz_stefaner
	-> tillnm
	-> Elijah_Meeks
	-> DirkBrockmann
	-> HamillHimself
	-> mcatanzaro
	-> TerribleMaps
	-> PreethiLahoti
	-> manovich
	-> jibiel
	-> AlfonsoSemeraro
	-> scorpiommma
	-> pinelopi_tr
	-> _kritts
	-> oboichak
	-> aleenachia
	-> marc_smith
	-> compstorylab
	-> jkbren
Processing: winteram
PARAMS: {'cursor': b'-1', 'id': b'winteram'}
/Users/cbuntain/Development/thirdparty/anaconda3/lib/python3.6/site-packages/ipykernel/__main__.py:13: DeprecationWarning: generator 'limit_handled' raised StopIteration
	-> Emma4Change
	-> ClintSmithIII
	-> samswey
	-> JamesSACorey
	-> AaronNagler
	-> PodSaveThePpl
	-> PodSaveAmerica
	-> Spacekatgal
	-> _faezahmed
	-> MKBHD
	-> Go2ndTimeGaijin
	-> kmiyer
	-> ashton1anderson
	-> aparnapkin
	-> BrianJFeldman
	-> arunamiller
	-> azomer
	-> mitgovlab
	-> EthanZ
	-> kumailn
Processing: GreatTeachGreg
PARAMS: {'cursor': b'-1', 'id': b'GreatTeachGreg'}
	-> dannykanell
	-> NoogaTrafficNet
	-> markschlereth
	-> ESPNRadio
	-> ryenarussillo
	-> notthefakeSVP
	-> Espngreeny
	-> SECNetwork
	-> SEC
	-> girlinatincan
	-> kristophernoah
	-> ClaytonESPN
	-> mortreport
	-> finebaum
	-> codybuntain
	-> AdamSchefter
	-> espn
	-> KirkHerbstreit
	-> CollegeGameDay
	-> SportsCenter
Processing: TheWebConf
PARAMS: {'cursor': b'-1', 'id': b'TheWebConf'}
	-> LPNews_AURA
	-> vgcerf
	-> WebComeLyon
	-> SNCF_Digital
	-> Laurent_acti
	-> nicolasantonini
	-> acti
	-> DavidKimelfeld
	-> Plus2sens
	-> www2018
	-> manat69
	-> lessig
	-> laurent_flory
	-> Coexiscience
	-> science_societe
	-> batier
	-> Kriisiis
	-> sirchamallow
	-> JulienSubercaze
	-> bymaddyness
Processing: CPlaisant
PARAMS: {'cursor': b'-1', 'id': b'CPlaisant'}
	-> chi2017
	-> sig_chi
	-> SenSanders
	-> KarlShipps
	-> gxwalsh
	-> codybuntain
	-> hcil_umd
	-> jvitak
	-> ebonsign
	-> PostGraphics
	-> nytgraphics
	-> deansittig
	-> leahfindlater
	-> benbendc
	-> bederson
	-> jonfroehlich
Processing: MohitIyyer
PARAMS: {'cursor': b'-1', 'id': b'MohitIyyer'}
	-> SmithaMilli
	-> KDTrey5
	-> emnlp2018
	-> Comey
	-> SethAbramson
	-> dannydanr
	-> dipanjand
	-> devanbu
	-> GuillaumeLample
	-> dasmiq
	-> rikkoncelkedzio
	-> eaclark07
	-> universeinanegg
	-> dkaushik96
	-> aardvarkhat
	-> jbrenier
	-> officialjaden
	-> NAACLHLT
	-> _beenkim
	-> jennwvaughan
Processing: chrmanning
PARAMS: {'cursor': b'-1', 'id': b'chrmanning'}
	-> eturner303
	-> lousylinguist
	-> stuartrobinson
	-> treeform
	-> JeffDean
	-> McFaul
	-> mark_riedl
	-> milesosborne
	-> theamitsinghal
	-> PaloAltoPolice
	-> JonathanBerant
	-> mmitchell_ai
	-> sebschu
	-> laurenahayes
	-> abigail_e_see
	-> rfpvjr
	-> harryshum
	-> kyosu
	-> rupertmurdoch
	-> sarahkendzior
Processing: SocNetAnalysts
PARAMS: {'cursor': b'-1', 'id': b'SocNetAnalysts'}
	-> MitchellSNA
	-> oliver_kathryn
	-> rongwangnu
	-> GretchenA
	-> BeatrizPatraca
	-> guzmanadrian
	-> nils_gilman
	-> MumbaCloud
	-> alyssabrennan3
	-> Appitive
	-> ereteog
	-> RonishaBrowdy
	-> andrybrew
	-> pypi
	-> artwisanggeni
	-> jcthomasphd
	-> joshuaaclark
	-> mariannesarkis
	-> facpsi
	-> 1_chrispeacock
Processing: thesickish
PARAMS: {'cursor': b'-1', 'id': b'thesickish'}
	-> jgaverrr
	-> metaviv
	-> ua_sociology
	-> SurvStudiesNet
	-> norrande
	-> HKingsmoreS
	-> SheaSerrano
	-> a_baronca
	-> TaylorLorenz
	-> cfiesler
	-> henryfarrell
	-> aaronclauset
	-> markverstraete
	-> brandoncgorman
	-> RollingSloan
	-> qhardy
	-> rodneyabrooks
	-> ce_tucker
	-> geomblog
	-> bulletproofexec
Processing: msaveski
PARAMS: {'cursor': b'-1', 'id': b'msaveski'}
	-> samuelwoolley
	-> afuste
	-> dgleich
	-> JFBonnefon
	-> wwbrannon
	-> ashton1anderson
	-> optiML
	-> NetContagions
	-> jasonbaumgartne
	-> roydanroy
	-> mnick
	-> hyejin_youn
	-> blaiseaguera
	-> McAndrew
	-> davidautor
	-> lessig
	-> Elibietti
	-> chinmayiarun
	-> alexstamos
	-> rgorwa
Processing: christinaacook3
PARAMS: {'cursor': b'-1', 'id': b'christinaacook3'}
	-> ChickfilA
	-> CoachVBell
	-> UTChattanooga
	-> RedBankHoops
	-> BacheloretteABC
	-> cobb_deco
	-> codychasee1
	-> The_Gospels
	-> drewmanning8
	-> MelaA7x92
	-> lil_shu2
	-> chancewillie
	-> _whitneyrenee11
	-> HarrisonLB17
	-> chaseamiller
	-> grace_adele32
	-> tgardner2626
	-> Kalen_skinner
	-> dubbsleezy
	-> typatterson9
Processing: lorengrush
PARAMS: {'cursor': b'-1', 'id': b'lorengrush'}
	-> BlairBigelow
	-> JenLucPiquant
	-> shaka_lulu
	-> samteller
	-> danahull
	-> coryzapatka
	-> EstesLynda
	-> HattonKaitlin
	-> CommanderMLA
	-> astroaddie
	-> Starry_Anna
	-> business
	-> EMSpeck
	-> byrnebeard
	-> ProfMcConville
	-> christianmazza
	-> EmreKelly
	-> jackiewattles
	-> IridiumBoss
	-> julia_bergeron
Processing: Summer_Ash
PARAMS: {'cursor': b'-1', 'id': b'Summer_Ash'}
	-> Sierrasayer
	-> majorlymichelle
	-> MolotovCupcake
	-> saracentury
	-> theblerdgurl
	-> taigooden
	-> KristyPuchko
	-> sarahchad_
	-> TaraandJohnny
	-> JohnnyGWeir
	-> taralipinski
	-> leahmfulmer
	-> eugenegu
	-> clementine_ford
	-> OmanReagan
	-> mayjeong
	-> _strangecharm
	-> easyright
	-> astrokiwi
	-> kat_volk
Processing: smohammed93
PARAMS: {'cursor': b'-1', 'id': b'smohammed93'}
	-> m__dehghani
	-> JeffDean
	-> sedielem
	-> kchonyc
	-> tuzhucheng
	-> totuta
	-> fastdotai
	-> math_rachel
	-> jeremyphoward
	-> NalKalchbrenner
	-> demishassabis
	-> OpenAI
	-> nschucher
	-> fchollet
	-> adampaulcoates
	-> AlecRad
	-> soumithchintala
	-> ch402
	-> negar_rz
	-> NicolasChapados
Processing: rts_coordinator
PARAMS: {'cursor': b'-1', 'id': b'rts_coordinator'}
PARAMS: {}
Sleeping for 892 seconds...
PARAMS: {'cursor': b'-1', 'id': b'rts_coordinator'}
PARAMS: {}
Sleeping for 900 seconds...
PARAMS: {'cursor': b'-1', 'id': b'rts_coordinator'}
	-> oranJess
	-> qu_rts17
	-> Karan_Sabhnani
	-> Rupeshpalwadi
	-> BenCarterette
	-> a25ghosh
	-> saipraneethm
	-> dimazest
	-> zjc1949
	-> DJF_UW
	-> AmiraGhenai
	-> MSurabathuni
	-> tanushree1992
	-> krishnavaidy
	-> AbhinavBommi
	-> y223kim
	-> AmmsA
	-> pranga003
	-> kwarrior8
	-> gowthamsarella7
Processing: jacobmcarthur
PARAMS: {'cursor': b'-1', 'id': b'jacobmcarthur'}
	-> TarynDempsey
	-> NovalisDMT
	-> pijul_org
	-> TerribleMaps
	-> rockstar_buddha
	-> quch3n
	-> codybuntain
	-> cocreature
	-> aisamanra
	-> drb226
	-> thumphriees
	-> jacobstanley
	-> tritlo
	-> Iceland_jack
	-> rufuse
	-> idrislang
	-> joyofhaskell
	-> easiestnameever
	-> noel_matty
	-> dysinger
Processing: GTRI
PARAMS: {'cursor': b'-1', 'id': b'GTRI'}
	-> team9spokes
	-> GT_Diversity
	-> editoratlbiz
	-> TEDxGeorgiaTech
	-> gtcn1
	-> sciam
	-> amerobotics
	-> GtScii
	-> mdjonline
	-> mcknightsltcn
	-> childrenshealth
	-> CPFNYC
	-> davearon
	-> CityParksAll
	-> tweeplers
	-> VAVetBenefits
	-> DeptVetAffairs
	-> stampshealth
	-> FerstCenter
	-> CarolineGWood
Processing: gvucenter
PARAMS: {'cursor': b'-1', 'id': b'gvucenter'}
	-> flerlagekr
	-> GT_Vis
	-> tescafitz
	-> deviparikh
	-> ICatGT
	-> GaTechCyber
	-> gatech_scs
	-> jerome_solomon
	-> AshKGoel
	-> DhruvBatraDB
	-> johntstasko
	-> uwdub
	-> juliekientz
	-> PoloDataClub
	-> mlatgt
	-> TAGthink
	-> CornellInfoSci
	-> HikingHack
	-> GTCUI
	-> SVSIGGRAPH
Processing: cjhutto
PARAMS: {'cursor': b'-1', 'id': b'cjhutto'}
	-> tictoc
	-> marcela
	-> Psych_Studies
	-> WileyPsychology
	-> SocialPsych
	-> PsychNews
	-> pospsych
	-> GirlsWhoCode
	-> jteevan
	-> niloufar_s
	-> cosleydr
	-> TurkerNational
	-> wslasecki
	-> rbmllr
	-> roboticwrestler
	-> jonfroehlich
	-> karenchurch
	-> sjjgo
	-> palen
	-> DrDavidJoyner
Processing: upoverandfrew
PARAMS: {'cursor': b'-1', 'id': b'upoverandfrew'}
	-> tomscocca
	-> CACSoccer
	-> Green__Century
	-> AltDeadspin
	-> _youhadonejob1
	-> ekellyfisch
	-> Orioles
	-> tbonier
	-> marciabe
	-> MoonPie
	-> ppppolls
	-> Env_Am_inah
	-> Ryan_M_Doyle
	-> Kara_B_Cook
	-> jennifereduffy
	-> CharlieCookDC
	-> amyewalter
	-> CookPolitical
	-> cpulisic_10
	-> TheTedAllen
/Users/cbuntain/Development/thirdparty/anaconda3/lib/python3.6/site-packages/ipykernel/__main__.py:8: DeprecationWarning: generator 'limit_handled' raised StopIteration
In [ ]:
 
In [ ]:
len(g.nodes())
In [ ]:
subs = [x[0] for x in g.degree() if x[1] > 0]
nx.draw(nx.subgraph(g, subs))
In [ ]:
nx.write_graphml(g, "twitter_codybuntain.graphml")
In [ ]: