University of Groningen PhD student Danilo Barbosa Coimbra has designed a new algorithm to analyse big datasets. It is faster and more accurate than currently available programs, and could be used to analyse pictures of suspected skin cancer lesions or search through data provided by municipal governments. Coimbra also used it to build a tool to help football fans enjoy quick playback of match highlights. Coimbra will defend his PhD thesis on 18 March 2016 at the University of Groningen.
The world produces ever more data, and analytical techniques have to keep up with this growth in big data. Danilo Coimbra of the University of Groningen’s Johann Bernoulli Institute has developed an algorithm that produces multidimensional projections of large datasets.
‘These datasets typically have a large number of items, and for each item a large number of variables or dimensions’, explains Coimbra. For example, all Dutch municipalities produce information on a range of subjects such as average income, the level of local taxes and so on. It is easy to compare all municipalities on one subject – the dog licence fee rate, for example – but much more difficult to mine all these data for unexpected outcomes.
‘The algorithm calculates the difference between items over all the different dimensions. Then the items can be plotted, and similar items are plotted closer to each other.’ The resulting projection can then be used to search for patterns. ‘The next step would be to analyse which dimensions are responsible for this pattern.’
Coimbra’s algorithm works much quicker and produces fewer errors in the projections than other algorithms. The projections can be made in 2D or in 3D. The latter is more difficult to analyse, but Coimbra has also designed tools to show which dimensions are important with each viewing direction.
The algorithm can process all sorts of information. The group of Coimbra’s supervisor, Prof. Alex Telea, can use it to analyse pictures of suspicious spots on the skin to see whether they are benign or malignant.
Brazilian-born Coimbra has also used his algorithm for a more entertaining application. He built a tool to analyse data from video recordings of the 2014 football World Cup in Brazil. ‘It uses information like the noise of the crowd, the commentary and timestamps for highlights like goals or yellow cards.’ Based on this information, the tool calculates the relative importance of 10-second sections. ‘All sections are presented as stills, and importance is expressed in size and color saturation. Things like goals are marked by icons.’
This allows football fans to quickly navigate through the highlights of a match, and to play particular clips by clicking on a still. ‘But you can also compare matches, or look at the differences that appear when you use commentary in different languages for the same match.’ After all, commentators tend to react more enthusiastically when their own side scores. The tool is not yet available for fans, says Telea. ‘But it is ready to use, should an interested party call.’
Danilo Barbosa Coimbra (1985) studied Computer Science and Video Analysis at the COC College and University of São Paulo (Brazil). His PhD research took place at the University of São Paulo and the University of Groningen, as part of a ‘double degree’ program. He is interested in building software tools to make the handling of Big Data easier. His thesis is titled
Multidimensional projections for the visual exploration of multimedia data
. His promotor is Prof. Alexandru Telea.
Scientists in Groningen and Madrid have managed to capture unique images of the first steps of virus assembly. By using a very fast scanning probe microscope, they were able to film how HIV proteins begin to form a virus. It appears that the virus...
Restanten van een vergeten bolhoop ontdekt rond de Melkweg
Een nieuw internationaal onderzoek laat zien dat bestanden van trekkende zoetwatervissen zoals zalm, paling en houting in de afgelopen vijftig jaar met 76% zijn afgenomen. In Europa is de afname met 93% het grootste. De trekvissen hebben het vooral...