DCU researchers create World Cup Twitter translation tool

Brazilator translates tweets from 12 languages, including Irish, into English and back

Twitter’s Brazilator service instantly converts World Cup tweets in 12 different languages into English. André Schürrle  scores Germany’s seventh goal during the 2014 Fifa World Cup semi-final  between Brazil and Germany. Photograph: Michael Steele/Getty Images
Twitter’s Brazilator service instantly converts World Cup tweets in 12 different languages into English. André Schürrle scores Germany’s seventh goal during the 2014 Fifa World Cup semi-final between Brazil and Germany. Photograph: Michael Steele/Getty Images

Twitter is using systems developed by DCU researchers in collaboration with Microsoft to translate tweets for users.

The social networking and microblogging service began using the university’s systems at the start of the World Cup to translate tweets in Spanish, Portuguese and Croatian into English and vice-versa.

Dr Lamia Tounsi, who works out of the Centre for Global Intelligent Content (CNGL) at Dublin City University, said: “Twitter is by default using our systems to translate. When you ask for automatic translation on Twitter, it uses our system to do so.”

She said the systems were being used to translate all types of tweets, not just World Cup related ones. Nine more languages have since been added to the service, including Irish.

READ SOME MORE

Brazilator

The DCU team have also announced a separate translation service exclusively for following Fifa World Cup tweets. The Brazilator service, launched in time for the World Cup’s final stages, enables football fans to follow what supporters in 24 of the 32 original competing countries are saying.

It instantly converts into English tweets from 12 languages: Irish, German, French, Spanish, Italian, Portuguese, Croatian, Greek, Japanese, Korean, Chinese and Farsi. Tweets in English can also be translated into these languages.

Dr Tounsi, the co-leader of the Brazilator project team, said translating tweets presented a significant technical challenge. “Tweets typically contain noisy, diverse and unstructured language, such as incomplete sentences, misspellings, abbreviations, web links, emoticons and hashtags – these are just some of the issues that have to be addressed.”

She said the service evaluated machine translation systems and helped to identify the most effective translation options for this type of content.

The service also gathers information about user behaviour across languages and cultures, thus providing greater insights into social media usage across the world.

It can provide sentiment analysis of each match, for fans to look back on previous games and see how the Twittersphere reacted to a team’s performance. “You can see for each match how many positive and negative tweets each team received. You can see the most popular hashtags during that game and the sentiment attached to those hashtags.”

Funded by Science Foundation Ireland and industry partners, CNGL is co-led by DCU and Trinity College Dublin. it 130 researchers are developing technologies to adapt digital content and services to the needs of global users.