ECMiner™ Text Mining Module

ECMiner’s Text Mining utilizes morpheme analysis and part-of-speech tagging models unique to characteristics of Korean. It also supports diverse operational tasks through its mining algorithm.



ECMiner’s Text Mining ensures system flexibility for evolving language through basic dictionary as well as learning probabilistic model of morpheme analysis and part-of-speech tagging. By using three word units – syntax, morpheme and syllable – as analysis base, it offers efficiency, accuracy and promptness.

Basic Theory on Morpheme Analysis

  •  Syntax : Word spacing unit
  • Morpheme : Smallest meaningful unit of a language
  • Syllable : Sequence of a speech sound (referring to each letter)
  • Agglutinative Language : One or more morphemes combined to create syntax
  • Morpheme Analysis :
    – Extracting morphemes from syntax
    – Generating all possible analysis that are grammatically correct
  • Part-of-Speech Tagging :
    – Allocating part-of-speech on word units (morphemes or words) within a given sentence

ECMinier Text Mining Characteristics

  •  Morpheme Analysis : Using part-of-speech tagging corpus as data source, automatically obtains linguistic knowledge required for morpheme analysis
    – No manual task required in establishing and maintaining linguistic knowledge database
    – Flexible and not limited to certain rules for analysis
    – Validity proven by providing values of analysis result calculated by probabilistic model
  •  Part-of-speech Tagging : Model a considering unique Korean phoneme phenomenon applied to analyze uncommon words. Most approximate value adopted considering the surrounding syntax and context using morpheme analysis results.


Analysis Process

ECMiner’s Text Mining enhances relevance of the analysis by handling stop words according to operations characteristics identified by issues analysis and emotion analysis over synonym patterns.



ECMiner’s Text Mining offers user dictionary registration feature which provides analysis on words not registered in regular dictionaries. It also provides rule-based relevance analysis ontology hierarchy establishment along with similarity analysis between documents using Cosine Similarity method.