Final Project - A Network of News

Fake news and its purveyors do not exist in an information vacuum. Rather, authors of fake news and the web sites that publish them are part of a larger information ecosystem. These authors of legitimate and fake content may be differentiated, however, in how they are placed within the larger network of journalists, authors, and web sites.

For this class's final project, you will use datasets of news articles to develop networks that capture their connections, links between websites, and/or citations between authors. The goal here is to develop a collection of approaches that could be used to recommend more legitimate content based on an article's placement in your developed network.


You will be provided with a list of potential datasets you can use to develop your network of news articles. From there, you will be expected to adhere to the following milestones:

  1. April 3: Dataset selection
  2. April 10: Proposal for network construction rules
  3. April 17: Write-up of sample network analysis
  4. April 24: Proposal for article recommendation/ranking algorithm
  5. May 1: Updated proposal for recommendation
  6. May 8: Final project presentation

Example Graph Constructions

The following are some simple graph construction rules you could explore. They are non-exhaustive, so feel free to propose an alternate rule set.

Example Tracks

The following are some example analyses you could leverage to recommend an article.

Track 1: Community analysis

Leverage modularity or some other community identification method (e.g., label propagation or spectral clustering) to identify whether an article, author, or web domain is in a community of legitimate or fake content.

Track 2: Link prediction

Given a new article and related metadata (e.g., author, domain, content, site owner), predict whether it will be linked to existing legitimate or fake content.

Track 3: Social network analysis

Using data from Twitter, Reddit, or some other online social network platform, recommend an article based on sharing behaviors in that platform. E.g., has it been highly shared/upvoted, do responses have much disagreement?

Track 4: Article comments

For articles with associated comment sections, can true/false articles be separated by the content or structure within the comments?

Example Dataset links: