Methods for ranking user-generated text streams : a case study in blog feed retrieval
      
      
        
      
      
      
      
      
      
      
      
      
      
      
      
        120 p
        
        
      
      
      
      
      
      
      
      Thèse de doctorat: Università della Svizzera italiana, 2012
      
      
      
      
      
      
      
       
      
      
      
        
        English
        
        
        
          User generated content are one of the main sources of information on the Web nowadays. With the huge amount of this type of data being generated everyday, having an efficient  and effective retrieval system is essential. The goal of such a retrieval system is to enable users to search through this data and retrieve documents relevant to their information  needs. Among the different retrieval tasks of user generated content, retrieving and ranking streams is one of the important ones that has various applications. The goal of this task is  to rank streams, as collections of documents with chronological order, in response to a user query. This is different than traditional retrieval tasks where the goal is to rank single  documents and temporal properties are less important in the ranking. In this thesis we investigate the problem of ranking user-generated streams with a case study in blog feed  retrieval. Blogs, like all other user generated streams, have specific properties and require new considerations in the retrieval methods. Blog feed retrieval can be defined as  retrieving blogs with a recurrent interest in the topic of the given query. We define three different properties of blog feed retrieval each of which introduces new challenges in the  ranking task. These properties include: 1) term mismatch in blog retrieval, 2) evolution of topics in blogs and 3) diversity of blog posts. For each of these properties, we investigate its  corresponding challenges and propose solutions to overcome those challenges. We further analyze the effect of our solutions on the performance of a retrieval system. We show  that taking the new properties into account for developing the retrieval system can help us to improve state of the art retrieval methods. In all the proposed methods, we specifically  pay attention to temporal properties that we believe are important information in any type of streams. We show that when combined with content-based information, temporal  information can be useful in different situations. Although we apply our methods to blog feed retrieval, they are mostly general methods that are applicable to similar stream ranking  problems like ranking experts or ranking twitter users.
        
        
       
      
      
      
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        - 
          Language
        
- 
          
        
- 
          Classification
        
- 
          
              
                
                  Computer science and technology
                
              
            
          
        
- 
          License
        
- 
          
        
- 
          Identifiers
        
- 
          
        
- 
          Persistent URL
        
- 
          https://n2t.net/ark:/12658/srd1318452