Abstract:The operation and maintenance management of transformers has accumulated a large amount of unstructured defect recording data in the form of text. However,the lack of effective mining method has led to an extremely low utilization rate. A text mining method for transformer defect recording text based on a character-word level ensemble integrated model is proposed in this paper. Firstly,the transformer defect recording texts are preprocessed with text segmentation,stop word removal,text augmentation,and text feature representation to convert the data into mathematical vectors for input. By integrating multiple word- and character-level classification models,the method can realize accurate identification and classification of transformer defect types through the synergistic and complementary effects of meta-learners on the individual base learners. Compared to single-text classification algorithms,this method can obtain the semantic features of the text more comprehensively,achieving a classification precision of 91% and F1 score of 0.9,which is the comprehensive evaluation score for model precision and recall. By applying natural language processing technology to precise power equipment defect recoding text classification and efficient fault recognition,data resources are awakened,and the intelligent management level of power transformers is significantly improved.