based on segmented word joint, with a customized noise library and multi-level filtering strategies, hid was provided with the ability to detect hotspot information in large amount of internet pages.
基于多级滤噪进行切分词拼接,利用特定的噪声库与多级滤噪策略严格控制拼接过程,挑选合理的收录策略,提取出能够準确反映海量网络数据中热门事件的热点信息串。