by Francois Fouss, Alain Pirotte, Jean-michel Renders, Marco Saerens

Abstract:

This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markov-chain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the "length" of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commute-time distance. This graph PCA provides a nice interpretation to the "Fiedler vector," widely used for graph partitioning. The model is evaluated on a collaborative-recommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacian-based similarities perform well in comparison with other methods. The model, which nicely fits into the so-called "statistical relational learning" framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machine-learning and pattern-recognition tasks involving a relational database

Reference:

Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation (Francois Fouss, Alain Pirotte, Jean-michel Renders, Marco Saerens), In IEEE Transactions on Knowledge and Data Engineering, IEEE Educational Activities Department, volume 19, 2007.

Bibtex Entry:

@article{Fouss2007, abstract = {This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markov-chain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the "length" of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commute-time distance. This graph PCA provides a nice interpretation to the "Fiedler vector," widely used for graph partitioning. The model is evaluated on a collaborative-recommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacian-based similarities perform well in comparison with other methods. The model, which nicely fits into the so-called "statistical relational learning" framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machine-learning and pattern-recognition tasks involving a relational database}, author = {Fouss, Francois and Pirotte, Alain and Renders, Jean-michel and Saerens, Marco}, doi = {10.1109/TKDE.2007.46}, issn = {1041-4347}, journal = {IEEE Transactions on Knowledge and Data Engineering}, keywords = {Fiedler vector,Graph analysis,SML-LIB-BIBLIO,collaborative recommendation,graph and database mining,graph kernels,lang:ENG,proximity measures,spectral clustering,statistical relational learning.}, mendeley-tags = {SML-LIB-BIBLIO,lang:ENG}, month = mar, number = {3}, pages = {355--369}, publisher = {IEEE Educational Activities Department}, title = {{Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation}}, url = {http://dl.acm.org/citation.cfm?id=1263132.1263335}, volume = {19}, year = {2007} }

Powered by bibtexbrowser