In May of this year, with the internet fully in the grip of the UK EU referendum, hashtags used on Instagram showed that discussion was highly polarised between #Leave and #Remain. The high degree of ideological distance between the two camps indicated that each group functioned as a separate ‘echo-chamber’, in which they spoke mainly to their own membership. The Leave campaign had a much more coherent online identity, made better use of hashtags in general, and was simply more active in generating content, all of which may have contributed to their successes. In early June 2016, a study of Twitter content found similar biases: Out of 1.5 million individual tweets, 54% were pro-Leave, and only 20% were pro-Remain. Those findings are interesting enough on their own, but what really sparked our interest, was that a third of the sampled content was created by only 1% of relevant user accounts.

If you’re not familiar with the inner workings of Twitter: No-one has that kind of time. It is highly unlikely that all of those accounts were directly controlled by people, or even large groups of people, and much more likely that many were staffed by automated software robots, or ‘bots’: Simple computer scripts that simulate highly repetitive human activity. In fact, an independent analysis of the 200 Twitter accounts which most frequently shared pro-Leave or pro-Remain content found that only 10% of those accounts were likely to be human.

The EU-referendum is not the first time ‘bots have been observed in democratic discussion. In the 2010 US midterm elections bots were actively used to support certain candidates and hamper others. In 2012 Lee Jasper admitted to their use in a parliamentary by-election. In the 2012 Mexican elections, Emiliano Treré identified a more effective use of bots, calling it “the algorithmic manufacturing of consent”, and a form of ‘ectivism’ (which includes the creation of large numbers of false followers, a charge levelled at Mitt Romney during the 2012 US Presidential election). A very large ‘bot-net’ was also utilised in 2013 to produce apparent support for a controversial Mexican energy reform. Those bots may have gone entirely unnoticed had they not been operating too rapidly to successfully pose as human agents.

Bot-related tactics have not been confined solely to the generation of apparent support, but have also been used to drown out members of a campaign by rendering their hashtags useless. The challenge presented by bots is not the introduction of false information, but the falsification of endorsement and popularity. Political discussions around the popularity of a single issue are particularly vulnerable, as are the financial implications of stock-confidence. During 2014 a bot-campaign elevated the value of tech-company Cynk from pennies to almost $5 billion USD in a few days. The company’s president, CEO, CFO, chief accounting officer, secretary, treasurer, and director were all the same individual: Marlon Luis Sanchez, Cynk’s sole employee. By the time Cynk’s stock-maneuver was discovered and its assets frozen Sanchez had made no additional profit, but for the investors who had been caught in the scheme, the losses were real.

Bot network detection research is being conducted by various defence agencies (including DARPA) but the field is complex, constantly changing, and yet to prove itself effective. Meanwhile, the deployment of bots on social media is within the terms of service for most of the relevant platforms, as long as no additional crime is committed their use is yet to face prosecution, and even in the case of Cynk no social media platform has assumed any kind of liability for their use.

The most active political users of social media are social movement activists, politicians, party workers, and those who are already fully committed to political causes, but recent evidence suggests that “bots” could be added to that list. Given the echo chamber effect, the fact that many online followers of political discourse are often not real users at all, and the steady decline in political participation numbers in many countries, bot use (while cheap to mobilise) may not have much power over the individual voter. Their deployment in the U.S. and Mexico has instead been largely targeted at journalists employed by mainstream media outlets. Politicians, activists, and party-workers may all find democratic scrutiny harder to achieve if the ‘public mood’ or ‘national conversation’ is being mis-reported by journalists with a bot-skewed sense of online discussion. The 2015 Global Social Journalism survey shows that in 51% of cases, reporters from six countries, including the UK and US, “would be unable to do their job without social media”. In 2012 38% of journalists spent up to two hours a day on various networks, but by 2015 that number had climbed to 57%. If unethical actors can unduly influence these avenues of online discourse, an increasingly vulnerable news-media may suffer from, and pass-on, the political biases of anonymous others.

If voting is affected by media, written by reporters who live on the internet, the shape of which is determined by anonymous, innumerable, automated agents (which no-one can track), how do we proceed in pursuit of a fair democracy?

Demonology & Data Reduction

Imagine that your job is to describe all the ways things go wrong with the human psyche. And that you’re alive during the 16th century, or even the 11th. You have to describe how people are dangerous to themselves and others for the good of a society that is wracked by injury, disease, and rampant structural unfairness. Your system must be comprehensive, easy to understand and remember, and effective despite the fact that you have essentially no experimental data, very little in the way of diagnostic survey work, and basically you’re running off of shaky collective memory and folklore that is, itself, frequently destroyed or distorted by large scale civil trauma.

There is a system of mental disorder and civil unrest in the world, you have a minimal amount of information about it, and your goal is to represent it in a way that is useful to an ignorant populace.

Welcome to the world of demonology, and Data Reduction 101.


The Lanterne of Light is a 15th Century text that establishes, for the first time, the systematic hierarchy of Christian demons organised by sin. It existed as part of a social movement to interpret the Bible into the language of the uneducated (i.e., English), and has been so effective that to this day most of the secular west understand Lucifer as emblematic of Pride. By telling stories about the fall from heaven through this lens we have gained a surprisingly sophisticated view on how an emotional state functions and how it can destroy a life. Almost every westerner knows that there are seven deadly sins in total, even if they don’t comprehend that the movie “Seven” is essentially an interpretation of this anonymously sourced, early 1400s, English Lollard tract. What the Lanterne does is to systematise social threat, and to then represent that system in a way that is so compelling, it remains remembered while 2010 was it’s 600th anniversary.

This is, at its core, the exact same process that drives modern psychology. An understanding of the need and mechanism behind the Lanterne directly helps us to understand modern statistical practice. The only thing that has changed is a matter of degree: The amount and reliability of data available, the minimal level of complexity which retains utility, and the size of the community which can utilise the output. The Diagnostic and Statistical Manual of Mental Disorders (DSM), now in its fifth edition, is THE main tool for the categorisation of psychiatry. And it is, essentially, the Lanterne of Light (1410), Binsfield’s (1589), Michaelis’ (1613), and Barrett’s (1801) classifications of demons, and “Hierarchical Cluster Analysis: Comparison of Three Linkage Measures and Application to Psychological Data” (2015). These texts are all illustrative of the problems that plague (the demon “Merihem”) data reduction, and can be utilised to illustrate the misconceptions (“Pythius”) and temptations (“Mammon”) of trying to describe complex systems in simple terms:

  • Systematic scope (where does the body end? Pestilence and the hereditary condition)
  • Data resolution, sampling error and measurement bias (what is left unseen)
  • Reliance on surface condition (active psychosis and possession)
  • Validity and mutability of categories (seven sins, three lies)
  • An assumption of normality (psychological utility vs. anomaly)
  • Under-dispersion, overrepresentation, and other distributional limitations (how wide is hell? How dull are its legions?)
  • Interstitial domain and interpolation (everywhere there are faces, pareidolia)
  • Data reduction for communication (lies, damned lies, and statistics)
  • Cultural bias (the redefinition of gender and sexuality)
  • Motive (witch hunts of the 15th and 19th centuries)

Statistics, particularly when deployed systematically, are often misunderstood, but the manner in which they both succeed and fail are easily described in a huge number of contexts. Demonology is just one of the most fun.