Even the most mundane online social commentary can have a purpose

Infodemiology is the buzz word for using social media data for gathering useful information on our trends


“Lovin’ this sandwich.” “Aargh, can’t sleep!” “The toilets in this restaurant are filthy.” This is the kind of dull commentary frequently tweeted by people. And for the cynical, yet oddly still connected, tweets like these are a source of frustration and anger.

With the right eye for opportunity though, they can be a potential source of useful information. Not on their own of course (no one cares if you can’t sleep), but when accumulated and geo-localised, social media can be used by big data analysts in practical ways, like improved urban planning, public sentiment analysis, even tracking the spread of diseases.

Urban planning

Computer Science Researchers, Enrique and

Vanessa Frías-Martínez

READ MORE

, from

Telefonica

Research and the University of Maryland (USA) respectively, recently published a study on how random Twitter use could help with urban planning.

Published in the journal, Engineering Applications of Artificial Intelligence, the researchers analysed literally millions of tweets being constantly reported by the social media junkies of Manhattan, Madrid and London. Armed with this info, they suggested using geolocalised tweets for planning cities and ascertaining optimum land use. For example, in the three major cities mentioned, they were able to accurately identify the nightlife areas.

We already know that Temple Bar in Dublin is a nightlife area. So why would we need social media research to tell us that?

Well, knowing what time and where someone is tweeting, regardless of what they are saying, is useful in terms of mapping out the preferred activities of a city at any given moment. Is an area residential, social or business-oriented? And does that status change at various times of the day (Temple Bar in Dublin may be a social zone 24/7, but Leeson Street changes its status pretty dramatically from morning to night).

Equipped with this kind of info, planners and businesses can more accurately determine what other amenities might be needed in a locality – a gym, a café or another pub? "What makes this work interesting and valuable is that it hits the sweet spot in extracting value from Twitter in terms of signal to noise ratio," explains Prof Alan Smeaton, director of the Insight Centre for Data Analytics at DCU. "By that I mean it's not trying to over-analyse and extract more information than can reliably be extracted.

"Often we see sentiment analysis applied to social media in order to determine the positive or negative sentiment of political issues, political parties or individuals. We have done some of this work ourselves in Ireland for elections. But the problem here is that this system can be "gamed" by parties or individuals or their supporters by flooding the social media stream with tweets, thus inflating and distorting the sampling method used. For cases where there are very large samples available, like US presidential elections, then if there is what you could loosely call a form of "digital ballot stuffing" going on, it tends to even itself out. In other words, statistics once more become our friend, and the data generated is reliable.

“At the other end of the spectrum, I’ve come across work from a few years back which tried to use geolocation of tweets to monitor the spread of viruses across the globe, particularly around the time of the H1N1 virus spread. How many times have you landed at an airport and tweeted “just landed”, or “just landed at X”? The work mined Twitter for this in real time, and guesstimated that whoever just landed at city X had travelled from their home location, Y, and thus used this as an estimate that somebody had travelled from Y to X. The automatically tagged geolocation of the Tweet, or the mention of the destination “just landed in Orlando”, is enough to give this information.”

Weren’t we talking about planning?

Yes we were. In reality, we can’t be one hundred per cent certain what activity is going on in a locality, regardless of its perceived raison d’etre. Does every person who wants to “socialise” in Dublin go to Temple Bar? For many, it is considered to be a wholly anti-social place. “So we often don’t know where nightlife is,” says Dr

Aonghus Lawlor

from UCD’s branch of the Insight Centre for Data Analytics. “What about local bars in residential areas? Where people undertake activities in cities is not really known in a precise way. So this type of research helps us understand why people choose to go to a particular part of a city for a particular activity.”

Research like this may have limited applicability in existing urban centres but new cities are being planned all the time. “Some of it happens organically,” says Lawlor. “But planning in Ireland always seems to be catching up with everybody else. Info like this could be very useful in the future.”

Twitter as a medium is particularly useful for data analysts as it is not hampered by privacy or data protection issues. All info on the feed is volunteered and open to all. “The other great thing about it is that it can tell you not only where people are and at what time, but also how people actually use a city in real time rather than doing a survey after the fact,” says Lawlor.

Limits of Twitter usage analysis

The elephant in this digital space is of course all the people out there who don’t use Twitter. Like your dad. “Nobody really knows how many Twitter users there are,” says Lawlor. Then you have the issue of usage rates. “In Manhattan you could have about 80 tweets per square kilometre versus 40 in London and 10 in Madrid. In Dublin it would be even less. So how might you apply this in even more remote areas?

“Plus no one knows who is tweeting. This may eliminate some groups entirely, perhaps older people. If you could correct for these variables, then you could make a good estimate.”

Efforts in areas outside of urban planning are being made to make Twitter analysis more consistent. “We’re doing work on creating a cohort of reliable, dependable and balanced Twitter users from whom we can mine sentiment on topics,” says Smeaton.

“It’s like conducting a conventional poll but based on much larger numbers of users rather than the usual 1,000 or so. We build a corpus of users on a given topic by taking all those who have tweeted about it, and then mine the rest of their profile and activities to, for example, filter out those who are just occasional users, or those whose past tweets demonstrate radical views across a range of topics. It’s an emerging area called infodemiology and points at how social media data can be leveraged for determining what is public opinion, or in the case of the Telefonica work, public behaviour, on a range of topics.”