Final Project - A Network of News

Fake news and its purveyors do not exist in an information vacuum. Rather, authors of fake news and the web sites that publish them are part of a larger information ecosystem. These authors of legitimate and fake content may be differentiated, however, in how they are placed within the larger network of journalists, authors, and web sites.

For this class's final project, you will use datasets of news articles to develop networks that capture their connections, links between websites, and/or citations between authors. The goal here is to develop a collection of approaches that could be used to recommend more legitimate content based on an article's placement in your developed network.

Timeline

You will be provided with a list of potential datasets you can use to develop your network of news articles. From there, you will be expected to adhere to the following milestones:

April 3: Dataset selection
April 10: Proposal for network construction rules
April 17: Write-up of sample network analysis
April 24: Proposal for article recommendation/ranking algorithm
May 1: Updated proposal for recommendation
May 8: Final project presentation

Example Graph Constructions

The following are some simple graph construction rules you could explore. They are non-exhaustive, so feel free to propose an alternate rule set.

Node: article, Edge: directed citation from citing article to source article
Node: article, Edge: undirected link based on similarity threshold
Node: article, Edge: undirected link based on common commenter names/content/social media sharing
Node: author, Edge: directed citation from citing author to source article's author
Node: web domain, Edge: directed citation from citing domain to source domain
Node: web domain, Edge: common domain ownership (see WHOIS records)

Example Tracks

The following are some example analyses you could leverage to recommend an article.

Track 1: Community analysis

Leverage modularity or some other community identification method (e.g., label propagation or spectral clustering) to identify whether an article, author, or web domain is in a community of legitimate or fake content.

Track 2: Link prediction

Given a new article and related metadata (e.g., author, domain, content, site owner), predict whether it will be linked to existing legitimate or fake content.

Track 3: Social network analysis

Using data from Twitter, Reddit, or some other online social network platform, recommend an article based on sharing behaviors in that platform. E.g., has it been highly shared/upvoted, do responses have much disagreement?

Track 4: Article comments

For articles with associated comment sections, can true/false articles be separated by the content or structure within the comments?

Example Dataset links:

https://github.com/several27/FakeNewsCorpus
https://github.com/BenjaminDHorne/fakenewsdata1