Abstract:With the deepening reform of electric power enterprise and the development of big data technology, traditional power supply companies and integrated energy service enterprises have to change the present extensive marketing mode for offering rapid response to consumers' requirement. In order to improve the accurate identification of potential customers in integrated energy services, this paper marks the tags of potential customers, and proposes an improved parallel K-means clustering algorithm based on spark memory computing platform. Firstly, the selection of initial cluster center and the evaluation of sample influencing factors are improved. Secondly, based on the optimized weight of factors, cluster analysis is carried out on the data setting to identify the potential customers of integrated energy services. Finally, the recent transaction data of integrated energy service enterprises are collected, and the experimental results are carried out on a multi-node physical machine. The results show that the accuracy of improved K-means clustering model is boosted. In terms of executive effectiveness, the algorithm with high concurrency has better parallel ability than that with single thread.