Topological Data Analysis for detecting consistent patterns of spread for extremist content

This project aims to build a set of tools and methods to model, predict and control the spread of types of content, such as extremist content. The main goal is to account for the topological structure of the network, and the links between communities. The privileged avenue is Topological Data Analysis, a modern mathematical framework aimed at analysing high-dimensional spatial data in such a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise [5]. The main tool is the persistent homology – a method for computing topological features over a wide range of spatial scales, and for determining which features are more likely to represent true features of the underlying space rather than artefacts of sampling, noise, or particular choice of parameters. Persistent homology has previously been applied to differentiate between authorship networks and random networks [5]. In this project, we will identify the persistent features that topologically describe the diffusions of extremist content, and use them to differentiate against the diffusions of mainstream content.

Pre-requisites:

  • good programming skills
  • background knowledge of algebraic topology
  • performing (computer) experiments and analysing results
  • desirable: Git, R/Python, desire to make sense of real data and solve real issues.

Keywords: persistent homology, topological data analysis, modelling information spread, machine learning.

References:

  1. Neumann, P. R. (2013). Options and Strategies for Countering Online Radicalization in the United States. Studies in Conflict & Terrorism, 36(6), 431–459.
  2. Varol, Onur, et al. "Online human-bot interactions: Detection, estimation, and characterization." Eleventh international AAAI conference on web and social media. 2017.
  3. Allcott, Hunt, and Matthew Gentzkow. "Social media and fake news in the 2016 election." Journal of economic perspectives 31.2 (2017): 211-36. https://web.stanford.edu/~gentzkow/research/fakenews.pdf
  4. Kim, D., Graham, T., Wan, Z., & Rizoiu, M.-A. (2019). Analysing user identity via time-sensitive semantic edit distance (t-SED): A case study of Russian trolls on Twitter. Journal of Computational Social Science. http://arxiv.org/abs/1901.05228
  5. Carstens, C. J.; Horadam, K. J. (2013-06-04). "Persistent Homology of Collaboration Networks". Mathematical Problems in Engineering. 2013: 1–7.
    https://doi.org/10.1155/2013/815035.