Climbing data mountains with text mining

The age of big data presents us with a wealth of digital information. It is nowadays no longer possible for the human mind to reliably grasp the mass of text contents and contexts it comes up against. Text mining makes it easier for numerous unstructured documents and texts to be accessed, enabling relevant information and precise analyses to quickly be derived from enormous data quantities. With the assistance of software, it is possible for documents to be selected, relevant contents and contexts extracted, and patterns and trends can also be spotted.

In commerce and management, too, text mining has gained enormous significance for companies, and can help to reveal potential at several different levels of business e.g. specifying or expanding company objectives, or reputation management. In our talk with Maria, the computer linguist at EDAG PS, she gave us a few insights into text mining from her field of activity.

EDAG PS computer linguist Maria BartelsEDAG-PS editor: Where did you get the idea of being a computer linguist from?

Maria: I studied German, and more than anything wanted to do a technical Master's degree, because I wanted to be a programmer or do something that ties up with the zeitgeist of today. And, I knew when I was doing my first degree that I did not want to become a teacher.

EDAG-PS editor: What made a computer linguist decide to work for an engineering service provider?

Maria: With my own particular brand of knowledge, I can certainly shake things up a bit. Also, of course, most of the work we do today is intercultural and interdisciplinary, and so, having specialised in language, I can make a significant contribution to the world of engineering. Language is versatile, ambivalent and sometimes a little bit puzzling.

EDAG-PS editor: As a linguist, how do you rate the linguistic abilities of engineers?

Maria: Better than is generally assumed. A great many technicians and engineers are talented writers and speakers.

EDAG-PS editor: Let's talk about the core activities of computer linguists: language technology. What is text mining?

Maria: I have just said something to the effect that language is ambivalent. To put it simply, language has structures and rules which can be analysed using computer programs. By the way, there are a great many of these … personally, I prefer to work with R and RStudio; this is free software, and you know what is going on under the surface. The R Community provides packages for dealing with extremely diverse problems, and these can be used free of charge.

EDAG-PS editor: Can you give us some examples of language technology applications that will make people in a technical environment happy?

Maria: It has often struck me that, in a technical setting, different people use different terms for the same concept, and then have to discuss the matter to work out exactly what they mean. One totally obvious linguistic analysis is what is the lexical analysis. Working with a number of different sources such as the Internet, intranet and documents on the servers, we extract text and examine the vocabulary used.

EDAG-PS editor: Very interesting! Can anyone do that?

Maria: You need to be a competent technical authority. Someone who can really interpret it correctly and have a look if "strange" results are produced.

EDAG-PS editor: What do you do with these word lists?

Maria: This is where the real work starts. A standard corporate language can save a great deal of time in in-house communication and significantly improve the company's image. Before that, though, it is first necessary to agree exactly what form the corporate language should take, what terms are to be used for what concepts. The results are stored in terminology databases and then made available to the relevant author target groups.

EDAG-PS editor: That sounds like a lot of work.

Maria: Yes, but definitely worthwhile.

EDAG-PS editor: Thank you for an interesting conversation!

Do you have any more questions to ask Maria or would you like to discuss the possibilities offered by terminology management with the help of text mining? Then contact Maria.