Getting computers to analyse opinions in blogs

Andrew Salway and his colleagues have collected 1.4 million blog posts about climate change. They aim to create a tool which will allow computers to analyse opinions expressed in large volumes of text.

By Camilla Aadland

Andrew Salway of Uni Research Computing is researching the automated analysis of blog texts and  blogging communities. Photo: Helge Skodvin

“There is an enormous number of blogs about climate change. If you google climate change, you get a lot of hits, but if you want to know more, it can be overwhelming. We’re trying to develop technology which will help people to understand big, complicated debates in a variety of different subject areas. This will be useful for both lay people and social scientists who want to analyse debates,” says Salway, a researcher at Uni Research Computing.

Analytical tools

Salway and his colleagues have so far collected 3,000 blogs with 1.4 million blog posts published from 2005 onwards. They aim to develop technology which will allow computers to automatically capture the essence of blogs on various subjects in different languages.

They are doing this as part of project NTAP (Networks of Texts and People). Uni Research and the University of Bergen, supported by the Research Council of Norway, are collaborating on this project which will continue until the summer of 2015.

“The technology will attempt to automatically capture the main point of what is being said, but it will also show who said it, when they said it and how much influence they have on other people’s blog posts and networks,” explains Salway.

The researchers are also interested in observing changes in blog networks over time – how opinions are formed and changed.

“We will be able to see how polarised the debate is, whether there is a network of people who only talk to one another and provide links to one another’s posts. This is how people reinforce their opinions without being open to other points of view. We can trace whether or not the debate is becoming more polarised,” says Salway.

Polarized debate

Halfway through the project, the researchers’ impression is that the blogosphere climate debate is very polarised. Participants in the debate can be split broadly into two groups: sceptics and acceptors. Their blogs are characterised by different word types. Sceptics’ blogs look more at science, and words such as theory, IPCC, cooling, absurdity and Gore appear more often than in believers’ blogs.

“Acceptors talk more about significance and consequences. They have accepted the science and are looking to the future,” says Salway.

In this group, words such as changed, fundamental, background, conditions, grandchildren and oil are mentioned more often than in sceptics’ blogs.

The researchers’ hypothesis is that more sceptics find their way into the blogosphere than into traditional media.

“It’s best when you get an overview of all the different points of view. If you choose the wrong blog, which then redirects you to others with the same opinion, you’re forming your own opinion on the basis of biased information.

Researchers have chosen to look more closely at climate blogs because climate change is an important and relevant topic.

“It is also challenging because of its scope; it’s a topic which encompasses technology, science, politics and people. It’s a challenge for the language technology to work with such a broad spectrum of concepts,” continues Salway.

Instead of searching for individual key words, the researchers are developing a program which will be able to relate key concepts so that it is possible to find the opinion in what is being said.

“We are taking an inductive approach. This means that rather than trying to code grammar and opinions into the computer so that it can use them to understand text, we give it a large volume of text and ask it to look for patterns in that text which may provide an opinion,” explains Salway.

Recognizing patterns

The patterns arise through how words appear in the text. One example of a pattern like this is:

(to (fight|slow|minimise|curb|tackle) climate change).

Most of the texts the researchers have collected are in English, but they are gathering comparable Norwegian and French texts. The technology should work regardless of language or topic.

“When we’re finished I hope we’ll have a tool which social scientists can use in their work, as well as a deeper understanding of how opinions can be extracted from a text just by looking at patterns,” says Salway.

Eventually he hopes to be able to compare what is happening in the blogosphere with what is happening in the news. One of the researchers, Knut Hofland, has collected news articles from selected media every day for 15 years.

“We hope to be able to extract political statements about climate change, for example, and compare these with opinions in the blogosphere,” says Salway.

Fact box

  • NTAP – Network of Texts and People
  • A research project which will develop methods of detecting, analysing and visualising the development of knowledge and opinions via various social networks.
  • The project is a collaboration between the University of Bergen and Uni Research. International collaboration partners are the University of Sheffield in the UK and Ontario College of Arts and Design in Canada.
  • Funding comes from the Norwegian Research Council’s VERDIKT programme.
  • The project started in January 2012 and will continue until July 2015.

June 13, 2014, 9:46 a.m.