An unsupervised approach to recognizing new words in power domain
Author:
Affiliation:

Clc Number:

TM930.9;TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The terminology word recognition in power domain lays the foundation for a profound language understanding of power documents and the intelligent knowledge graph construction. By incorporating the morphology of the power domain vocabulary, an unsupervised approach to recognizing new terminology words in documents is proposed. Firstly, the common dictionary is used to segment the corpus. Then segmented words are combined with terminology feature-based sliding window of different sizes constituting candidate words. Furthermore, four statistics including accessor variety, information entropy, point-wise mutual information, and word frequency are computed. Finally, based on the linguistics statistics and three types of word-formation grammatical rules, those words are screened generating the last electric new words. Experimental results on a public dataset demonstrate the effectiveness of our proposed method.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 05,2020
  • Revised:July 18,2020
  • Adopted:March 01,2020
  • Online: December 01,2020
  • Published: November 28,2020