Book chapter

Investigating the statistical properties of user-generated documents

  • Inches, Giacomo Facoltà di scienze informatiche, Università della Svizzera italiana, Svizzera
  • Carman, Mark J. Facoltà di scienze informatiche, Università della Svizzera italiana, Svizzera
  • Crestani, Fabio Facoltà di scienze informatiche, Università della Svizzera italiana, Svizzera
    2011
Published in:
  • Lecture notes in computer science. - Springer. - 2011, vol. 7022, p. 198-209
English The importance of the Internet as a communication medium is reflected in the large amount of documents being generated every day by users of the different services that take place online. In this work we aim at analyzing the properties of these online user-generated documents for some of the established services over the Internet (Kongregate, Twitter, Myspace and Slashdot) and comparing them with a consolidated collection of standard information retrieval documents (from the Wall Street Journal, Associated Press and Financial Times, as part of the TREC ad-hoc collection). We investigate features such as document similarity, term burstiness, emoticons and Part-Of-Speech analysis, highlighting the applicability and limits of traditional content analysis and indexing techniques used in information retrieval to the new online usergenerated documents.
Language
  • English
Classification
Computer science and technology
License
License undefined
Identifiers
Persistent URL
https://n2t.net/ark:/12658/srd1318278
Statistics

Document views: 58 File downloads:
  • crestani_LNCS_2011_2.pdf: 255