Recording, interpreting and eventually communicating, that is how software should develop
He was actually a science student, but he chose humanities where he rediscovered the fascination for science in the field of ‘computational lexicology’. This is not a usual combination and so Vossen regularly acts as a connector between computational scientists and linguists. “It often does take a while before those parties speak each other’s language. We are both a technical as well as a conceptual intermediary in this.
Fathoming language is a process in which Big Data plays a great part. Vossen not only wants to fathom language in its entirety, he is indexing financial news for the EU in Spanish, Italian, English, and Dutch simultaneously in order to develop a ‘history recorder’. “We would like to discover systematics in how sources write about certain things. What opinions there are in texts, what is or is not left out consciously in texts.”
“We work together with the LexisNexis news database. According to them about two million new messages are added per working day,” Vossen explains. Therefore, Vossen and his team require substantial computing power to be able to process these messages. “Currently we have submitted an application at SURFsara and the Netherlands eScience Center. We would like to do ‘what it takes’ to build such an infrastructure. As far as we are concerned we will use all possible hours, mental power, and speed available to achieve this. Vossen learned to view his work in a different way when working for a startup company in the field of search engines. “In academia artificial experimental applications are followed very meticulously. Now I worked with real data and I suddenly did not need to solve certain things, I could just postpone them. It is just fun to be able to invent new applications.”
A history recorder is one part, but eventually Vossen would like “a computer model that examines texts meticulously and observes texts in an interpretation-controlled way to determine from what point of view those texts are written.” Coupled to more perceptual interpretation, this eventually has to lead to a much more “holistic approach,” says Vossen. “Recording, interpreting, and eventually communicating, that is how software should develop. It is difficult to record that ambiguity. We are now at 60/70%. That should be brought to a minimum of 80%.” Vossen, who worked in industry for years himself to optimize search engines, knows the science part of his work only too well. “I have really learned to program there.” Vossen left academia after a lot of fuss regarding contracts and insecurities. “I often had to hire myself and fire myself at the end of a project. At a certain point I had enough of that.”
Words we use most often, have most meanings.
Piek Vossen returned to VU University Amsterdam as Professor in 2006. “With the Spinoza* bonus I now have a period of five years. It is important to decide how to proceed.” Because that look ahead and that reflecting look backwards is often lacking in academia, Vossen observes. At the same time Vossen emphasizes there should be room for a broader outlook at universities. “Actually there should also be subsidies for generating theory. There is nothing wrong with getting people out of their pigeon holes, but you also have to stimulate them in other ways.”
“The reflection that we have been working on the last five years is just so important. It forces you to dwell upon a question like ‘are we making any progress this way?’ Nowadays that is a difficult point. Young researchers have to do a lot of publications and in the meantime look for a new job after their promotion in which they can also use their ideas.”
Vossen himself does get that space, now that he has received the highest Dutch award in science. “Now I have time to expand more on the contemporary information in language. The linguistic usages we have are much more. diverse than we actually think. Real language recognition by computers. I hope we can achieve this in five years time.”