A Large Probabilistic Semantic Network based Approach to Compute Term Similarity
Measuring semantic similarity between two terms is essential for a variety of text analytics and understanding applications. Currently, there are two main approaches for this task, namely the knowledge based and the corpus based approaches. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Contrary to these existing techniques, an efficient and effective approach is proposed for semantic similarity using a large scale semantic network. This semantic network is automatically acquired from billions of web documents. It consists of millions of concepts, which explicitly model the context of semantic relationships. To first show how to map two terms into the concept space, and compare their similarity there. Then, a clustering approach is introduced to orthogonalize the concept space in order to improve the accuracy of the similarity measure.