Journal article

Employing document dependency in blog search

  • Keikha, Mostafa Facoltà di scienze informatiche, Università della Svizzera italiana, Svizzera
  • Carman, Mark James Faculty of IT, Monash University, Australia
  • Crestani, Fabio Facoltà di scienze informatiche, Università della Svizzera italiana, Svizzera
    2012
Published in:
  • Journal of the American society for information science and technology. - Wiley. - 2012, vol. 63, no. 2, p. 354–365
English The goal in blog search is to rank blogs according to their recurrent relevance to the topic of the query. State-of-the-art approaches view it as an expert search or resource selection problem. We investigate the effect of content-based similarity between posts on the performance of the retrieval system. We test two different approaches for smoothing (regularizing) relevance scores of posts based on their dependencies. In the first approach, we smooth term distributions describing posts by performing a random walk over a document-term graph in which similar posts are highly connected. In the second, we directly smooth scores for posts using a regularization framework that aims to minimize the discrepancy between scores for similar documents. We then extend these approaches to consider the time interval between the posts in smoothing the scores. The idea is that if two posts are temporally close, then they are good sources for smoothing each other's relevance scores. We compare these methods with the state-of the-art approaches in blog search that employ Language Modeling-based resource selection algorithms and fusion-based methods for aggregating post relevance scores. We show performance gains over the baseline techniques which do not take advantage of the relation between posts for smoothing relevance estimates.
Language
  • English
Classification
Computer science and technology
License
License undefined
Identifiers
Persistent URL
https://n2t.net/ark:/12658/srd1318478
Statistics

Document views: 32 File downloads:
  • crestani_JASIST_2012.pdf: 32