ECMiner™ Text Mining Module

ECMiner’s Text Mining utilizes morpheme analysis and part-of-speech tagging models unique to characteristics of Korean. It also supports diverse operational tasks through its mining algorithm.

Text-Mining-Module1-1

Features

ECMiner’s Text Mining ensures system flexibility for evolving language through basic dictionary as well as learning probabilistic model of morpheme analysis and part-of-speech tagging. By using three word units – syntax, morpheme and syllable – as analysis base, it offers efficiency, accuracy and promptness.

Basic Theory on Morpheme Analysis

  •  Syntax : Word spacing unit
  • Morpheme : Smallest meaningful unit of a language
  • Syllable : Sequence of a speech sound (referring to each letter)
  • Agglutinative Language : One or more morphemes combined to create syntax
  • Morpheme Analysis :
    – Extracting morphemes from syntax
    – Generating all possible analysis that are grammatically correct
  • Part-of-Speech Tagging :
    – Allocating part-of-speech on word units (morphemes or words) within a given sentence

ECMinier Text Mining Characteristics

  •  Morpheme Analysis : Using part-of-speech tagging corpus as data source, automatically obtains linguistic knowledge required for morpheme analysis
    – No manual task required in establishing and maintaining linguistic knowledge database
    – Flexible and not limited to certain rules for analysis
    – Validity proven by providing values of analysis result calculated by probabilistic model
  •  Part-of-speech Tagging : Model a considering unique Korean phoneme phenomenon applied to analyze uncommon words. Most approximate value adopted considering the surrounding syntax and context using morpheme analysis results.

Text-Mining-Module1-2

Analysis Process

ECMiner’s Text Mining enhances relevance of the analysis by handling stop words according to operations characteristics identified by issues analysis and emotion analysis over synonym patterns.

Text-Mining-Module1-3

Features

ECMiner’s Text Mining offers user dictionary registration feature which provides analysis on words not registered in regular dictionaries. It also provides rule-based relevance analysis ontology hierarchy establishment along with similarity analysis between documents using Cosine Similarity method.

Text-Mining-Module1-4