Abstract:
SinglePass is a very common text clustering algorithm, which is widely used in topic detection and tracking.But in process of SinglePass algorithm, clustering result is not satisfactory, and furthermore, similarity matching reduces processing speed.Focusing on the two defects, the idea of reconstructing text is put forward, which considers the typical features of a topic as theme part and the content as details part through the reorganization of the forum or web information.On this basis, SinglePass clustering algorithm is improved.The results of the experiment prove that this method can solve the problem of sparse sample characteristics effectively, detect network hot topics timely and accurately, display the hierarchical structure at the same time.