Abstract:The terminology word recognition in power domain lays the foundation for a profound language understanding of power documents and the intelligent knowledge graph construction. By incorporating the morphology of the power domain vocabulary, an unsupervised approach to recognizing new terminology words in documents is proposed. Firstly, the common dictionary is used to segment the corpus. Then segmented words are combined with terminology feature-based sliding window of different sizes constituting candidate words. Furthermore, four statistics including accessor variety, information entropy, point-wise mutual information, and word frequency are computed. Finally, based on the linguistics statistics and three types of word-formation grammatical rules, those words are screened generating the last electric new words. Experimental results on a public dataset demonstrate the effectiveness of our proposed method.