Fake news sites target the filter bubbles of groups most aligned with that news. They use the power of social media to do so. Initially fake news of the social media era was relatively easy to spot. The claims of early social media fake news purveyors were often meant as entertainment. Language, fonts, and links were often indicators that could be used to determine veracity. It took only a short time for fake news to become more insidious, more plentiful, more subtle, and subverted for manipulation of information and public opinion. Fake news has many new social media outlets where it can appear and can spread quickly via both human and nonhuman actors. During the 2016 presidential election cycle for example, fake news appeared often.1 Determining what news was to be believed and what news was to be ignored became more a case of party affiliation than good sense.
Fake news sites and stories are shared for many different reasons. Some readers find the stories amusing. Some find them alarming. Others find them affirming of their beliefs. Many people share fake news without ever having read the content of the article.2 Sharing of fake news, whether because it is amusing or because people think it is real, only exaggerates the problem. Did Pope Francis endorse candidate Donald Trump? No, but that didn’t stop the story from appearing on social media and spreading widely.3 Did Hillary Clinton run a child sex ring out of a Washington, DC, pizza shop? No, but that didn’t stop a man with a gun from going there to exact vengeance.4
In the early days of the internet, fake news was not a big problem. There were some websites that sought to spoof, mislead, or hoax, but mostly it was all in good fun. While some websites sought to spread misinformation, their numbers were limited. It seemed as if the authority to shut down malicious websites was invoked more often. Creating a website on the early internet took time, effort, and computer programming skills that limited the number of people who could create fake news sites.
During the last decade, as an offshoot of the stream of information provided by the internet, social media platforms, such as Facebook and MySpace, were invented so that individuals could connect with others on the internet to point them to websites, share comments, describe events, and so on.
Following that came the invention of another type of social media—Twitter—which allows people to send very brief messages, usually about current events, to others who choose to receive those messages. One could choose to “follow” former President Barak Obama’s Twitter postings—to know where he is going, what is on his agenda, or what is happening at an event. This kind of information can be very useful for getting on-site information as it happens. It has proved useful in emergency situations as well. For example, during the Arab Spring uprisings, Twitter communications provided information in real time as events unfolded.5 During Hurricane Sandy, people were able to get localized and specific information about the storm as it happened.6 Twitter is also a convenient means of socializing, for getting directions, and for keeping up-to-date on the activities of friends and family.
The power of the various tools that use the power of the internet and the information supplied there is epic. The spread of the technology required to make use of these tools has been rapid and global. As with most tools, the power of the internet can be used for both good and evil. In the last decade, the use of the internet to manipulate, manage, and mislead has had a massive upswing.
The collection of massive amounts of data using bots has generated a new field of study known as “big data.”7 Some big data research applies to the activities of people who use the internet and social media. By gathering and analyzing large amounts of data about how people use the internet, how they use social media, what items they like and share, and how many people overall click on a link, advertisers, web developers, and schemers can identify what appear to be big trends. Researchers are concerned that big data can hide biases that are not necessarily evident in the data collected, and the trends identified may or may not be accurate.8 The use of big data about social media and internet use can result in faulty assumptions and create false impressions about what groups or people do or do not like. Manipulators of big data can “nudge” people to influence their actions based on the big data they have collected.9 They can use the data collected to create bots designed to influence populations.10
Information-collecting capabilities made possible by harnessing computer power to collect and analyze massive amounts of data are used by institutions, advertisers, pollsters, and politicians. Bots that collect the information are essentially pieces of computer code that can be used to automatically respond when given the right stimulus. For example, a bot can be programmed to search the internet to find particular words or groups of words. When the bot finds the word or words it is looking for, its programming makes note of the location of those words and does something with them. Using bots speeds up the process of finding and collecting sites that have the required information. The use of bots to collect data and to send data to specific places allows research to progress in many fields. They automate tedious and time-consuming processes, freeing researchers to work on other tasks.
Automated programming does good things for technology. There are four main jobs that bots do: “Good” bots crawl the web and find website content to send to mobile and web applications and display to users. They search for information that allows ranking decisions to be made by search engines. Where use of data has been authorized, the data is collected by bot “crawlers” to supply information to marketers. Monitoring bots can follow website availability and monitor the proper functioning of online features.
This kind of data collection is useful to those who want to know how many people have looked at the information they have provided. “In 1994, a former direct mail marketer called Ken McCarthy came up with the clickthrough as the measure of ad performance on the web. The click’s natural dominance built huge companies like Google and promised a whole new world for advertising where ads could be directly tied to consumer action.”11 Counting clicks is a relatively easy way to assess how many people have visited a website. However, counting clicks has become one of the features of social media that determines how popular or important a topic is. Featuring and repeating those topics based solely on click counts is one reason that bots are able to manipulate what is perceived as popular or important. Bots can disseminate information to large numbers of people. Human interaction with any piece of information is usually very brief before a person passes that information along to others. The number of shares results in large numbers of clicks, which pushes the bot-supplied information into the “trending” category even if the information is untrue or inaccurate. Information that is trending is considered important.
Good bots coexist in the technical world with “bad” bots. Bad bots are not used for benign purposes, but rather to spam, to mine users’ data, or to manipulate public opinion. This process makes it possible for bots to harm, misinform, and extort. The Imperva Incapsula “2016 Bot Traffic Report” states that approximately 30 percent of traffic on the internet is from bad bots. Further, out of the 100,000 domains that were studied for the report, 94.2 percent experienced at least one bot attack over the ninety-day period of the study.12 Why are bad bots designed, programmed, and set in motion? “There exist entities with both strong motivation and technical means to abuse online social networks—from individuals aiming to artificially boost their popularity, to organizations with an agenda to influence public opinion. It is not difficult to automatically target particular user groups and promote specific content or views. Reliance on social media may therefore make us vulnerable to manipulation.”13
In social media, bots are used to collect information that might be of interest to a user. The bot crawls the internet for information that is similar to what an individual has seen before. That information can then be disseminated to the user who might be interested. By using keywords and hashtags, a website can attract bots searching for specific information. Unfortunately, the bot is not interested in the truth or falsehood of the information itself.
Some social bots are computer algorithms that “automatically produce content and interact with humans on social media, trying to emulate and possibly alter their behavior. Social bots can use spam malware, misinformation slander or even just noise” to influence and annoy.14 Political bots are social bots with political motivations. They have been used to artificially inflate support for a candidate by sending out information that promotes a particular candidate or disparages the candidate of the opposite party. They have been used to spread conspiracy theories, propaganda, and false information. Astroturfing is a practice where bots create the impression of a grassroots movement supporting or opposing something where none exists. Smoke screening is created when a bot or botnet sends irrelevant links to a specific hashtag so that followers are inundated with irrelevant information.
When disguised as people, bots propagate negative messages that may seem to come from friends, family or people in your crypto-clan. Bots distort issues or push negative images of political candidates in order to influence public opinion. They go beyond the ethical boundaries of political polling by bombarding voters with distorted or even false statements in an effort to manufacture negative attitudes. By definition, political actors do advocacy and canvassing of some kind or other. But this should not be misrepresented to the public as engagement and conversation. Bots are this century’s version of push polling, and may be even worse for society.15
Social bots have become increasingly sophisticated, such that it is difficult to distinguish a bot from a human. In 2014, Twitter revealed in a SEC filing that approximately 8.5 percent of all its users were bots, and that number may have increased to as much as 15 percent in 2017.16 Humans who don’t know that the entity sending them information is a bot may easily be supplied with false information.
Researchers have studied how well humans can detect lies. Bond and DePaulo analyzed the results of more than 200 lie detection experiments and found that humans can detect lies in text only slightly better than by random chance.17 This means that if a bot supplies a social media user with false information, that person has just a little better than a 50 percent chance of identifying the information as false. In addition, because some bots have presented themselves and been accepted by humans as “friends,” they become trusted sources, making the detection of a lie even more difficult.
To improve the odds of identifying false information, computer experts have been working on multiple approaches to the computerized automatic recognition of true and false information.18
Written text presents a unique set of problems for the detection of lies. While structured text like insurance claim forms use limited and mostly known language, unstructured text like that found on the web has an almost unlimited language domain that can be used in a wide variety of contexts. This presents a challenge when looking for ways to automate lie detection. Two approaches have been used recently to identify fake news in unstructured text. Linguistic approaches look at the word patterns and word choices, and network approaches look at network information, such as the location from which the message was sent, speed of response, and so on.19
The following four linguistic approaches are being tested by researchers:
In the Bag of Words approach, each word in a sentence or paragraph or article is considered as a separate unit with equal importance when compared to every other word. Frequencies of individual words and identified multiword phrases are counted and analyzed. Part of speech, location-based words, and counts of the use of pronouns, conjunctions, and negative emotion words are all considered. The analysis can reveal patterns of word use. Certain patterns can reliably indicate that information is untrue. For example, deceptive writers tend to use verbs and personal pronouns more often, and truthful writers tend to use more nouns, adjectives, and prepositions.20
In the Deep Syntax approach, language structure is analyzed by using a set of rules to rewrite sentences to describe syntax structures. For example, noun and verb phrases are identified in the rewritten sentences. The number of identified syntactic structures of each kind compared to known syntax patterns for lies can lead to a probability rating for veracity.21
In the Semantic Analysis approach, actual experience of something is compared with something written about the same topic. Comparing written text from a number of authors about an event or experience and creating a compatibility score from the comparison can show anomalies that indicate falsehood. If one writer says the room was painted blue while three others say it was painted green, there is a chance that the first writer is providing false information.22
In Rhetorical Structure (RST), the analytic framework identifies relationships between linguistic elements of text. Those comparisons can be plotted on a graph, Vector Space Modeling (VSM) showing how close to the truth they fall.23
In approaches that use network information, human classifiers identify instances of words or phrases that are indicators of deception. Known instances of words used to deceive are compiled to create a database. Databases of known facts are also created from various trusted sources.24 Examples from a constructed database of deceptive words or verified facts can be compared to new writing. Emotion-laden content can also be measured, helping to separate feeling from facts. By linking these databases, existing knowledge networks can be compared to information offered in new text. Disagreements between established knowledge and new writing can point to deception.25
Social Network Behavior using multiple reference points can help social media platform owners to identify fake news.26 Author authentication can be verified from internet metadata.27 Location coordination for messages can be used to indicate personal knowledge of an event. Inclusion or exclusion of hyperlinks is also demonstrative of trustworthy or untrustworthy sources. (For example, TweetCred, available as a browser plugin, is software that assigns a score for credibility to tweets in real time, based on characteristics of a tweet such as content, characteristics of the author, and external URLs.28) The presence or absence of images, the total number of images by multiple sources, and their relationships and relevance to the text of a message can also be compared with known norms and are an indicator of the truth of the message. Ironically, all of this information can be collected by bots.
A variety of experiments have been conducted using multiple processes to create a score for information credibility.29 Research groups are prepared to supply researchers with data harvested from social media sites. Indiana University has launched a project called Truthy.30 As part of that project, researchers have developed an “Observatory of Social Media.” They have captured data about millions of Twitter messages and make that information available along with their analytical tools for those who wish to do research. Their system compares Twitter accounts with dozens of known characteristics of bots collected in the Truthy database to help identify bots.
Truthy
http://truthy.indiana.edu/about/
DARPA, Defense Advanced Research Projects Agency, is a part of the US Department of Defense. It is responsible for the development of emerging technologies that can be used by the US military. In early 2015, DARPA sponsored a competition whose goal was to identify bots known as influence bots. These bots are “realistic, automated identities that illicitly shape discussions on social media sites like Twitter and Facebook, posing a risk to freedom of expression.”31 If a means of identifying these bots could be discovered, it would be possible to disable them. The outcome of the challenge was that a semi-automated process that combines inconsistency detection and behavioral modeling, text analysis, network analysis, and machine learning would be the most effective means of identifying influence bots. Human judgment added to the computer processes provided the best results.
Many other experiments in the identification of bots have been reported in the computer science literature.32 Bots and botnets often have a specific task to complete. Once that task is completed, their accounts are eliminated. Detecting bots and botnets before they can do harm is critical to shutting them down. Unfortunately, the means for detecting and shutting down bots are in their infancy. There are too many bot-driven accounts and too few means for eliminating them.
What happens to the information that bots collect is one part of the story of fake news. During the 2016 US presidential campaign, the internet was used to advertise for political candidates. Official campaign information was created by members of each politician’s election team. News media reported about candidates’ appearances, rallies, and debates, creating more information. Individuals who attended events used social media to share information with their friends and followers. Some reports were factual and without bias. However, because political campaigns involve many people who prefer one candidate over another, some information presented a bias in favor of one candidate or not favoring another candidate.
Because it is possible for anyone to launch a website and publish a story, some information about the political candidates was not created by any official of the campaign. In fact, many stories appeared about candidates that were biased, taken out of context, or outright false. Some stories were meant as spoof or satire; others were meant to mislead and misinform. One story reported that the pope had endorsed presidential candidate Donald Trump. In any other context, the reader would likely have no trouble realizing that this story was not true.
Enter the bots. There have been some alarming changes in how, where, and for what bots are used in the past ten years. Bots are being programmed to collect information from social media accounts and push information to those accounts that meet certain criteria.
Social networks allow “atoms” of propaganda to be directly targeted at users who are more likely to accept and share a particular message. Once they inadvertently share a misleading or fabricated article, image video or meme, the next person who sees it in their social feed probably trusts the original poster, and goes on to share it themselves. These “atoms” then rocket through the information ecosystem at high speed powered by trusted peer-to-peer networks.33
Political bots have been central to the spread of political disinformation. According to Woolley and Guilbeault, the political bots used in the 2016 US elections were primarily used to create manufactured consensus:
Social media bots manufacture consensus by artificially amplifying traffic around a political candidate or issue. Armies of bots built to follow, retweet, or like a candidate’s content make that candidate seem more legitimate, more widely supported, than they actually are. Since bots are indistinguishable from real people to the average Twitter or Facebook user, any number of bots can be counted as supporters of candidates or ideas. This theoretically has the effect of galvanizing political support where this might not previously have happened. To put it simply: the illusion of online support for a candidate can spur actual support through a bandwagon effect.34
The Computational Propaganda Research project has studied the use of political bots in nine countries around the world. In Woolley and Guilbeault’s report on the United States, the authors state, “Bots infiltrated the core of the political discussion over Twitter, where they were capable of disseminating propaganda at mass-scale. Bots also reached positions of high betweenness centrality, where they played a powerful role in determining the flow of information among users.35
Social bots can affect the social identity people create for themselves online. Bots can persuade and influence to mold human identity.36 Guilbeault argues that online platforms are the best place to make changes that can help users form and maintain their online identity without input from nonhuman actors. To do that, researchers must identify and modify features that weaken user security. He identifies four areas where bots infiltrate social media:
While Guilbeault has identified practices on social media platforms where improvements or changes could be made to better protect users, those changes have yet to be made. A groundswell of opinion is needed to get the attention of social media platform makers. The will to remove or change a popular feature such as popularity rating doesn’t seem likely in the near future. In fact, while research is being done in earnest to combat the automated spread of fake or malicious news, it is mostly experimental in nature.39 Possible solutions are being tested, but most automatic fake news identification software is in its infancy. The results are promising in some cases, but wide application over social media platforms is nowhere in sight. The research that exists is mostly based on identifying and eliminating accounts that can be shown to be bots. However, by the time that has been accomplished, whatever the bot has been programmed to do has already been done. There are very few means to automatically identify bots and botnets and disable them before they complete a malicious task.
The social media platforms and search engines themselves have made some efforts to help detect and flag fake news. Facebook created an “immune system” to help protect itself from infection by bots.40 Google announced that it will increase its regulation of advertising and linked-to websites.41 Facebook has turned over the verification of information to five leading fact-checking organizations.42 Facebook has also initiated a feature in parts of Europe called Related Articles, which provides readers with access to the results of fact-checking of original stories.43 Google Digital News Initiative is creating programs to help users verify information themselves with Factmata. Overall, these attempts are reactive at best. The sheer volume of potential misinformation and the difficulty in identifying and shutting down bot accounts make these attempts seem feeble.
Factmata
It seems that the battle of the computer programmers will continue indefinitely. When one side develops a new means of manipulating information to mislead, misinform, or unduly influence people, the other side finds a way to counter or at least slow the ability to make use of the new idea. This cycle continues in a seemingly endless loop. Using technology to identify and stop fake news is a defensive game. There does not appear to be a proactive means of eliminating fake news at this time. Money, power, and political influence motivate different groups to create computer-driven means of human control.