All the data that’s fit to print – and archive for posterity

If data journalism is indeed the new punk, then a crew at NUI Galway are charting a fast, can-do new world order, says Bahareh Heravi, data scientist


'Data is now transforming the media industry," says Bahareh Heravi, head of the Digital Humanities and Journalism group at the Insight Centre for Data Analytics at NUI Galway, where she leads a number of projects on the ways in which "data" and analytics techniques can help news organisations.

Dr Heravi recently joined The Irish Times as lead data scientist with an aim to boost the paper's data analytics and data journalism.

“Data is everywhere and is growing at a phenomenal rate, in various formats and from various sources,” she says. “For example, data could come from traffic cameras, social media accounts, weather sensors, hospitals, research centres – even from our schools or universities. They can also come from official government sources or statistics offices. There is so much data out there that no supercomputer can process them all.

" At the same time, it gives us a great opportunity to find new things and stories based on existing facts. If you're a student, are you curious to know whether or not weather affects exam performance? I bet you can find some data on weather and also some other data from schools around you, and, voilà, you can now merge and analyse them and see if your hypothesis was a valid excuse for your – low? – mark on that rainy day!"

READ MORE

Simon Rogers, former Guardian data blog writer and current data editor at Twitter, has called data journalism "the new punk" in its use of "data analytics techniques" in news reporting and storytelling.

“This means anyone can do it,” Dr Heravi says. “The punk movement in the 1970s was to think out of the box, away from predefined rules for music and to encourage young people to pick up an instrument and just start playing it, with little or no musical training.”

The punk movement created a subculture of its own, including fashion, visual art, dance and literature.

“Punk was about using the available ‘tools’ to create ‘something new’,” Dr Heravi says, “in a way that it was doable for ordinary young people, and not necessarily the musical experts. So is data analytics.

“There are a multitude of data sources that we can use to come up with some great stories and great analytics. We may be able to change the world. We may be able to change the way the traffic flows, we may be able to change the way the news is created, we may be able to change the way we study, learn and teach. We may be able to come up with a new cure for cancer, and we may be able to change the meaning of democracy and transparency.

“We just need to start playing with the data and make use of these great – in many cases free – tools. Maybe we can even be the creator of the new future tool and technique. Pick up your computer and start doing some interesting data analytics.

“Don’t forget: anyone can do it.”

Overview

The Insight Centre for Data Analytics is a joint initiative between researchers at Dublin City University, NUI Galway, University College Cork and University College Dublin, as well as other partner institutions.

Insight brings together more than 200 researchers from these institutions with more than 30 industry partners. The intent is to position Ireland at the heart of global data analytics research.

The vision of Oliver Daniels, the centre's new chief executive, is to make Insight an international brand with a recognition similar to the likes of Google, Facebook or Hewlett-Packard.

Supported by Science Foundation Ireland and industry partners, Insight focuses on new ways to capture and understand data from the world around us, make better decisions for people, communities and industry and create a more informed society in a healthier, more productive world.

Insight’s research focus encompasses a broad range of data analytics technologies and challenges, from machine learning, decision analytics and social network analysis to linked data, recommender systems and the sensor web.

With more than 30 partner companies, Insight researchers are solving critical challenges in the areas of Connected Health and the Discovery Economy.

Bianca Pereira: Semantic Web

“Imagine a computer that can learn from knowledge published on the web and teach. This is my ambition,” writes Bianca Pereira, a PhD student with Insight.

“Born in Rio de Janeiro, Brazil, I moved to Ireland in 2013 to conduct studies in the area of the Semantic Web. My research topic envisions enabling computers to understand concepts appearing in text as humans can do. Computers are very good on spotting a sequence of characters, but there is still a long journey to go on to make them understand the meaning of those characters. The outcome of my research may enable a series of intelligent applications that can learn from text. This can have both social and business benefits by enabling users to have access and learn from the knowledge already available on the web.

“There is still a lot of work to do to make it a reality, so I encourage young people interested in working with computers to enter in on this ongoing journey.”

Ihab Salawdeh: Social media

“Social media is ever-present in society – 76 per cent of Irish adults use it,” says Ihab Salawdeh , from Nablus, Palestine , a research assistant in Insight’s Digital Humanities and Journalism group, where he is part of a team building the Social Repository of Ireland.

"More than 270,000 tweets, 680,000 Facebook statuses and 100 hours of YouTube videos are uploaded every minute," he says. "History is written as it happens, and through social media people are reporting, filming and writing down their own history and the history of their nation and culture. But one has to ask: where does this data end up, and how do we preserve this history and get access to it for future research?

“The huge amount of data uploaded per minute makes it almost impossible for researchers to go through all the media content,” he adds. “Moreover, searching for media and information of interest is a complex and time-consuming task, and this gets worse when looking for past events. It is like searching for a needle in a haystack.

"For that reason, in collaboration with the Digital Repository of Ireland, which is part of the Royal Irish Academy, our group is building the 'Social Repository of Ireland', which archives social media content and provides easy access for journalists and researchers.

“As a first step, we started with Twitter and have developed a suite of tools that collects tweets automatically, annotates the collected tweets using natural language processing and machine-learning techniques to detect Irish-related entities such as Irish people, locations and organisations from the tweet text and user profile.

"These tweets are classified into multiple topics, for example sport, politics, culture, as well as events such as elections 2014, the Galway Arts Festival or the Dublin Horse Show.

“They are then preserved and archived with all their corresponding media content attached, such as images, videos or documents, within the DRI. An easy-to-use and user-friendly interface is provided to search and explore the repository by journalists and researchers.

“We firmly believe that social media contains part of the history and the culture of each nation and country as reported by its own people, and in this project we will preserve part of Ireland’s heritage and culture.”