辅导案例-CE306-6

CE306-6-SP UNIVERSITY OF ESSEX Undergraduate Examinations 2019 INFORMATION RETRIEVAL Time allowed: TWO hours Candidates are permitted to bring into the examination room: Calculator – Casio FX-83GT Plus or Casio FX-85GT Plus ONLY Candidates must answer ALL questions. The paper consists of THREE questions. The percentages shown in brackets provide an indication of the proportion of the total marks for the PAPER which will be allocated. Please do not leave your seat unless you are given permission by an invigilator. Do not communicate in any way with any other candidate in the examination room. Do not open the question paper until told to do so. All answers must be written in the answer book(s) provided. All rough work must be written in the answer book(s) provided. A line should be drawn through any rough work to indicate to the examiner that it is not part of the work to be marked. At the end of the examination, remain seated until your answer book(s) have been collected and you have been told you may leave. CE306-6-SP 2 Question 1 Basics (a) Briefly explain the motivation for using the inverse document frequency (idf) in the weighting formula tf.idf. (b) Discuss the implications of Zipf’s law on the distribution of words in a document collection and in the queries submitted to search this collection. Briefly discuss how increasing the corpus size might or might not address these implications. (c) Briefly discuss three different reasons that might explain the popularity of Elasticsearch over alternative search engines when applying it to a local Web site. (d) Briefly discuss the problems a tokenizer might encounter when processing texts which contain the period character (‘.’). [5%] [10%] [10%] [5%] CE306-6-SP 3 Question 2 Applications and Evaluation (a) Outline the typical steps that need to be performed by an enterprise search engine to match a user request against the documents stored in the system’s database. Discuss how enterprise search might differ from Web search. (b) Several evaluation metrics have been developed to assess the quality of results returned by search engines. Two such measures are precision and recall. What can you say about precision and recall for queries for which no relevant documents exist in the collection? Discuss whether discounted cumulative gain or mean reciprocal rank might or might not be suitable alternative measures for the given scenario. (c) Discuss the applicability of the PageRank algorithm in an enterprise search setting. (d) Outline a search scenario in which you would apply A/B testing to evaluate a search system within an enterprise search setting. [10%] [15%] [10%] [5%] CE306-6-SP 4 Question 3 Advanced Concepts (a) Separating fake news from real news is one of the major search engine challenges that have emerged in recent years. One step in that direction is automated fact-checking. Imagine you are tasked with developing a system for automated fact-checking. Assume that your system will be incorporated in a Web search engine and is called whenever a user submits a query that is classified as a claim. Outline a possible processing pipeline that could confirm or reject a claim. Discuss important design decisions. (b) Contextual information is frequently being used in modern Web search engines. Discuss the difficulties in contextualising a query in the result ranking stage of the retrieval process. (c) Log analysis can be used to personalize a search engine. Present three possible motivations for this approach. Discuss how one might integrate log analysis in the query submission stage of an information retrieval system. END OF PAPER CE306-6-SP [10%] [10%] [10%]

辅导案例-CE306-6

Related

Previous Post辅导案例-MATH100

Next Post辅导案例-MATH6005

Author admin