Large-scale machine learning of media outlets for understanding public reactions to nation-wide viral infection outbreaks

Sungwoon Choi, Jangho Lee, Min Gyu Kang, Hyeyoung Min, Yoon Seok Chang, Sungroh Yoon

Research output: Contribution to journalArticle

3 Scopus citations


From May to July 2015, there was a nation-wide outbreak of Middle East respiratory syndrome (MERS) in Korea. MERS is caused by MERS-CoV, an enveloped, positive-sense, single-stranded RNA virus belonging to the family Coronaviridae. Despite expert opinions that the danger of MERS might be exaggerated, there was an overreaction by the public according to the Korean mass media, which led to a noticeable reduction in social and economic activities during the outbreak. To explain this phenomenon, we presumed that machine learning-based analysis of media outlets would be helpful and collected a number of Korean mass media articles and short-text comments produced during the 10-week outbreak. To process and analyze the collected data (over 86 million words in total) effectively, we created a methodology composed of machine-learning and information-theoretic approaches. Our proposal included techniques for extracting emotions from emoticons and Internet slang, which allowed us to significantly (approximately 73%) increase the number of emotion-bearing texts needed for robust sentiment analysis of social media. As a result, we discovered a plausible explanation for the public overreaction to MERS in terms of the interplay between the disease, mass media, and public emotions.

Original languageEnglish
Pages (from-to)50-59
Number of pages10
StatePublished - 1 Jan 2017


  • Machine learning
  • Middle East respiratory syndrome (MERS)
  • Natural language processing
  • Sentiment analysis

Cite this