Comprehensibility of Wikipedia articles on the subject of disease

As part of a student research project, a comprehensive study of the readability of Wikipedia articles on the topic of disease in German, English and Russian was carried out for the first time and concluded with a successful international publication. The result of the project shows that Wikipedia articles need to be linguistically improved in order to be understood by readers.

Nowadays, information on illness and health can be obtained very quickly and easily via the Internet, for example via a Wikipedia article. But how comprehensible are the texts on diseases? Does the information help us, or do we perhaps understand so little that we are deterred from continuing our search? The Master student Jelizaveta Gordejeva has asked itself these questions. As part of a first-time research project in the master's program in medical informatics, a study on the comprehensibility of Wikipedia articles on the topic of diseases was conducted and has now been published.

The comprehensive study in cooperation with the scientific staff of Heilbronn University (HHN) Richard Zowalla, Dr. Monika Pobiruchin, as well as Martin Wiesner explored the topic of disease in German, English and Russian on the basis of Wikipedia metadata. Such a detailed study on the readability of disease-related Wikipedia texts is so far unique and has found corresponding resonance in scientific circles. "The successful publication of this topic in an international journal shows that the concept of combining research and teaching in the form of a research project during the course of study is not only well received by students," says Prof. Dr. Alexandra Reichenbach, Head of the Master's program in Medical Informatics.

For the research work, a snapshot of all data from Wikipedia, as of the beginning of July 2021, on the topic of disease was used. The FRE score or the fourth Wienersach text formula was used as a measure for the readability of a text. Among other things, longer medical terms, long or convoluted sentences are rated as more difficult to read. These readability measures evaluate a text either as a point value (score) or in terms of school years. "If a text receives a score of 11, it means that readers* should have completed 11th grade to grasp it clearly in terms of language. In the case of health communication, however, we would recommend that texts should already be understood by students in middle school," explains co-author Martin Wiesner. However, the study proves and clarifies that in all three languages studied, the level is significantly higher. That is, at 13 to 14 years of schooling, which corresponds to a university degree. Hard-to-read texts on health- or disease-related topics thus create a barrier when dealing with information from the Internet.

A central recommendation of the study participants is therefore that Wikipedia readers and authors in particular should be provided with tools that allow Wikipedia articles to be better indexed and made more readable. "For example, built-in readability indicators for Wikipedia authors that are automatically calculated in the background are conceivable," says co-author Richard Zowalla from the informatics perspective. "This could also be extremely important for Russian- and English-language Wikipedia articles on the topic," adds master's student Gordejeva. There is still a need for action and research here, which could be addressed in future projects at HHN's computer science faculty.

Contact person:
Andreas Wiesner, Research Associate
Phone: 07131-504-6947

Research Communication Heilbronn University:
Vera Winkler
Phone: 07131-504-1156

Source: from HHN press release, June 28, 2022.