by Thad Hughes, Daniel Ramage

Abstract:

Many systems for tasks such as question answering, multi-document summarization, and information retrieval need robust numerical measures of lexical relatedness. Standard thesaurus-based measures of word pair similarity are based on only a single path between those words in the thesaurus graph. By contrast, we propose a new model of lexical semantic relatedness that incorporates information from every explicit or implicit path connecting the two words in the entire graph. Our model uses a random walk over nodes and edges derived from WordNet links and corpus statistics. We treat the graph as a Markov chain and compute a word-specific stationary distribution via a generalized PageRank algorithm. Semantic relatedness of a word pair is scored by a novel divergence measure, ZKL, that outperforms existing measures on certain classes of distributions. In our experiments, the resulting relatedness measure is the WordNet-based measure most highly correlated with human similarity judgments by rank ordering at = .90.

Reference:

Lexical Semantic Relatedness with Random Graph Walks (Thad Hughes, Daniel Ramage), In Computational Linguistics, Association for Computational Linguistics, volume 7, 2007.

Bibtex Entry:

@article{Hughes2007, abstract = {Many systems for tasks such as question answering, multi-document summarization, and information retrieval need robust numerical measures of lexical relatedness. Standard thesaurus-based measures of word pair similarity are based on only a single path between those words in the thesaurus graph. By contrast, we propose a new model of lexical semantic relatedness that incorporates information from every explicit or implicit path connecting the two words in the entire graph. Our model uses a random walk over nodes and edges derived from WordNet links and corpus statistics. We treat the graph as a Markov chain and compute a word-specific stationary distribution via a generalized PageRank algorithm. Semantic relatedness of a word pair is scored by a novel divergence measure, ZKL, that outperforms existing measures on certain classes of distributions. In our experiments, the resulting relatedness measure is the WordNet-based measure most highly correlated with human similarity judgments by rank ordering at = .90.}, author = {Hughes, Thad and Ramage, Daniel}, journal = {Computational Linguistics}, keywords = {SML-LIB-BIBLIO,lang:ENG}, mendeley-tags = {SML-LIB-BIBLIO,lang:ENG}, number = {June}, pages = {581--589}, publisher = {Association for Computational Linguistics}, title = {{Lexical Semantic Relatedness with Random Graph Walks}}, url = {http://acl.ldc.upenn.edu/D/D07/D07-1061.pdf}, volume = {7}, year = {2007} }

Powered by bibtexbrowser