Is it a human or a Twitter bot?
Researchers from the University of Mary Washington and the Naval Surface Warfare Center want to know for sure.
UMW computer science majors Bryan Holster and Chris Zimmerman, under the guidance of Professor Stephen Davies, have teamed with scientists at the center’s Dahlgren division to get to the bottom of this sometimes perplexing social media mystery. The partnership is one of several ongoing collaborations between the University and the Naval Surface Warfare Center.
For novices to the Twitter sphere, a Twitter bot is an automated software program that posts content or tweets to the online social network Twitter. For instance, the bot @everyword spent seven years tweeting every word in the English language, while @EnjoyTheFilm posts spoilers in response to the tweets of unsuspecting movie-goers.
Since the fall, Holster and Zimmerman have collected an innumerable number of tweets using a Web application that they built. Then they developed a classification system to create a second Web application that determines if a tweet is from a bot or a human.
“Identifying whether a tweet came from a real human or a robot is pretty easy for people to do, just by looking at it and using their intuition,” said Davies, associate professor of computer science. “But spelling out the rules by which this determination could be made automatically is fiendishly difficult. Consider your own thought process: what is it about a tweet that makes you think it was auto-generated? We’re using techniques from a field called ‘machine learning’ to ‘train’ our program on human-classified examples and to be able to perform this determination quickly and automatically.”
Working about 15 hours per week, the students meet weekly with Lead Scientist Elizabeth Hohman, Principal Scientist David Marchette and Davies to discuss their work. They share both their challenges and successes.
During one particularly exciting week, Holster reported to the team that their Web application is now processing 2.6 million tweets every 140 seconds.
“This is a window into a different world than I’ve been exposed to,” said Holster, a senior who learned to build Web applications in his classes at UMW. “It’s very nice being constantly exposed to people who know more than me every week…I learn more and I progress faster that way.”
Holster and Zimmerman are just scratching the surface with their Web applications.
“Twitter is an environment in which people all over the world can get their message out easily and instantly. But using this vast amount of data requires tools for processing and analyzing it. Chris and Bryan are contributing to this tool set,” said Hohman, who also teaches part-time at UMW. “They bring a new perspective to the problem, which often gives us new ideas.”
Next semester, the team hopes to begin analyzing their vast collection of Twitter posts and eventually would like to use Twitter for higher purposes like detecting, assessing and tracking virus outbreaks.
“The information about an outbreak can be very different from bots than it is from individuals affected by or commenting on an outbreak,” said Marchette, whose work includes social network analysis. “By distinguishing bots from the other users we can more accurately assess the event.”
In the meantime, the group as a whole has learned that Twitter posts aren’t always what they seem, as evidenced by their vast collection of data.
“I’ve learned a lot of the lower-level details about how Twitter works. As a programmer, I can appreciate Twitter and how they store information,” said Zimmerman, a UMW junior. “Now I know that this company does something that’s not magical, it’s explainable and you could teach anybody how Twitter works just by doing what we did.”
This particular project is one of four projects in partnership between Dahlgren and UMW this semester. Other projects include:
- Professor of Mathematics Keith Mellinger, student Alaina Morello and Scientist Jake Farinholt researching the theory of error-correcting codes;
- Assistant Professor of Mathematics Melody Denhere, students Jonathan Blauvelt and Travis Whitehead and Scientists Jeff Solka, Kristen Ash and Allen Parks researching citation prediction and analysis;
- Professor of Mathematics Debra Hydorn, student Michelle Craft and Scientists David Marchette and Elizabeth Hohman researching eigenvectors.