RT Dissertation/Thesis T1 Improving search effectiveness in sentence retrieval and novelty detection A1 Teijeira Fernández, Ronald K1 Física K1 Electrónica e computación AB In this thesis we study thoroughly sentence retrieval and novelty detec-tion. We analyze the strengths and weaknesses of current state of the artmethods and, subsequently, new mechanisms to address sentence retrievaland novelty detection are proposed.Retrieval and novelty detection are related tasks: usually, we initiallyapply a retrieval model that estimates properly the relevance of passages(e.g. sentences) and generates a ranking of passages sorted by their relevance.Next, this ranking is used as the input of a novelty detection module, whichtries to filter out redundant passages in the ranking.The estimation of relevance at sentence level is di cult. Standard meth-ods used to estimate relevance are simply based on matching query andsentence terms. However, queries usually contain two or three terms andsentences are also short. Therefore, the matching between query and sen-tences is poor. In order to address this problem, we study how to enrichthis process with additional information: the context. The context refersto the information provided by the surrounding sentences or the documentwhere the sentence is located. Such context reduces ambiguity and suppliesadditional information not included in the sentence itself. Additionally, it isimportant to estimate how important (central) a sentence is within the docu-ment. These two components are studied following a formal framework basedon Statistical Language Models. In this respect, we demonstrate that thesecomponents yield to improvements in current sentence retrieval methods.In this thesis we work with collections of sentences that were extractedfrom news. News not only explain facts but also express opinions that peoplehave about a particular event or topic. Therefore, the proper estimation ofwhich passages are opinionated may help to further improve the estimationof relevance for sentences. We apply a formal methodology that helps us toincorporate opinions into standard sentence retrieval methods. Additionally,we propose simple empirical alternatives to incorporate query-independentfeatures into sentence retrieval models. We demonstrate that the incorpo-ration of opinions to estimate relevance is an important factor that makessentence retrieval methods more effective. Along this study, we also analyzequery-independent features based on sentence length and named entities.The combination of the context-based approach with the incorporationof opinion-based features is straightforward. We study how to combine thesetwo approaches and its impact. We demonstrate that context-based modelsare implicitly promoting sentences with opinions and, therefore, opinion-based features do not help to further improve context-based methods.The second part of this thesis is dedicated to novelty detection at sentence level. Because novelty is actually dependent on a retrieval ranking, we con-sider here two approaches: a) the perfect-relevance approach, which consistsof using a ranking where all sentences are relevant; and b) the non-perfect rel-evance approach, which consists of applying first a sentence retrieval method.We rst study which baseline performs the best and, next, we propose anumber of variations. One of the mechanisms proposed is based on vocab-ulary pruning. We demonstrate that considering terms from the top rankedsentences in the original ranking helps to guide the estimation of novelty. Theapplication of Language Models to support novelty detection is another chal-lenge that we face in this thesis. We apply di erent smoothing methods in thecontext of alternative mechanisms to detect novelty. Additionally, we test amechanism based on mixture models that uses the Expectation-Maximizationalgorithm to obtain automatically the novelty score of a sentence.In the last part of this work we demonstrate that most novelty methodslead to a strong re-ordering of the initial ranking. However, we show that thetop ranked sentences in the initial list are usually novel and re-ordering themis often harmful. Therefore, we propose di erent mechanisms that determinethe position threshold where novelty detection should be initiated. In thisrespect, we consider query-independent and query-dependent approaches.Summing up, we identify important limitations of current sentence re-trieval and novelty methods, and propose novel and effective methods. PB Universidade de Santiago de Compostela. Servizo de Publicacións e Intercambio Científico SN 978-84-9887-624-6 YR 2011 FD 2011-06-23 LK http://hdl.handle.net/10347/3089 UL http://hdl.handle.net/10347/3089 LA eng NO TEIJEIRA FERNÁNDEZ, Ronald: «Improving search effectiveness in sentence retrieval and novelty detection». Santiago de Compostela: Universidade. Servizo de Publicacións e Intercambio Científico, 2011. ISBN 978-84-9887-624-6 DS Minerva RD 23 abr 2026