Fake news and its purveyors do not exist in an information vacuum. Rather, authors of fake news and the web sites that publish them are part of a larger information ecosystem. These authors of legitimate and fake content may be differentiated, however, in how they are placed within the larger network of journalists, authors, and web sites.
For this class's final project, you will use datasets of news articles to develop networks that capture their connections, links between websites, and/or citations between authors. The goal here is to develop a collection of approaches that could be used to recommend more legitimate content based on an article's placement in your developed network.
You will be provided with a list of potential datasets you can use to develop your network of news articles. From there, you will be expected to adhere to the following milestones:
The following are some simple graph construction rules you could explore. They are non-exhaustive, so feel free to propose an alternate rule set.
The following are some example analyses you could leverage to recommend an article.
Leverage modularity or some other community identification method (e.g., label propagation or spectral clustering) to identify whether an article, author, or web domain is in a community of legitimate or fake content.
Given a new article and related metadata (e.g., author, domain, content, site owner), predict whether it will be linked to existing legitimate or fake content.
Using data from Twitter, Reddit, or some other online social network platform, recommend an article based on sharing behaviors in that platform. E.g., has it been highly shared/upvoted, do responses have much disagreement?
For articles with associated comment sections, can true/false articles be separated by the content or structure within the comments?