- May 15, 2020
题意： 实现基于query reduction的信息检索技术，和part 1中实现的技术方法进行比较解析：
Title: Rapid diagnostic tests for diagnosing uncomplicated P. falciparum
malaria in endemic countries Query:Exp Malaria/Exp Plasmodium/Malaria.ti,ab1or2or3Exp Reagent kits, diagnostic/ 6. rapid diagnos* test*.ti,abRDT.ti,abDipstick*.ti,ab
查询关键词是Rapid diagnostic tests for diagnosing uncomplicated P. falciparum malaria in endemic countries
以BM25为标准，按每个话题比较2017和2018结果的得失，评估精度使用trec_eval。 涉及知识点： 信息检索、IDF-r、KLI更多可加微信讨论微信号：ITCSdaixie
INFS7410 Project – Part 2PreambleThe due date for this assignment is 19 September 2019 17:00 Eastern Australia Standard Time,together with part 1.This part of the project is worth 10% of the overall mark for INFS7410 (part 1 is woth 5% — andthus the whole submission of part 1 + 2 is worth 15%). A detailed marking sheet for thisassignment is provided at the end of this document.AimProject aim: The aim of this project is to implement a state-of-the-art information retrievalmethod, evaluate it and compare it to the baseline and rank fusion methods obtained in part 1 inthe context of a real use-case.Project Part 2 aimThe aim of part 2 is to:Use the evaluation infrastructure setup for part 1implement state-of-the-art information retrieval methods, based on query reductionevaluate, compare and analyse the developed state-of-the-art methods against baseline andranking fusion methodsThe Information Retrieval Task: Ranking of studies forSystematic ReviewsPart 2 of the project considers the same problem described in part 1: re-rank a set of documentsretrieved for the compilation of a systematic review. A description of the wider task is provided inpart 1.What we provide you with (same as part 1)We provide:for each dataset, a list of topics to be used for training. Each topic is organised into a file.Each topic contains a title and a Boolean query.for each dataset, a list of topics to be used for testing. Each topic is organised into a file. Eachtopic contains a title and a Boolean query.each topic file (both those for training and those for testing), includes a list of retrieveddocuments in the form of their PMIDs: these are the documents that you have to rank. Takenote: you do not need to perform the retrieval from scratch (i.e. execute the query againstthe whole index); instead you need to rank (order) the provided documents.for each dataset, and for each train and test partition, a qrels file, containing relevanceassessments for the documents to be ranked. This is to be used for evaluation.for each dataset, and for test partitions, a set of runs from retrieval systems thatparticipated to CLEF 2017/2018 to be considered for fusion.a Terrier index of the entire Pubmed collection. This index has been produced using theTerrier stopword list and Porter stemmer.a Java Maven project that contains the Terrier dependencies and a skeleton code to give youa start. NOTE: Tip #1 provides you with a restructured skeleton code to make the processingof queries more efficient.a template for your project report.What you need to produceYou need to produce:correct implementations of the state-of-the-art methods required by this projectspecificationscorrect evaluation, analysis and comparison of the state-of-the-art method, includingcomparison with the methods implemented in part 1. This should be written up into areport following the provided template.a project report that, following the provided template, details: an explanation of the state-ofthe-art retrieval method used (with your own words), an explanation of the evaluationsettings followed, the evaluation of results (as described above), inclusive of analysis, adiscussion of the findings. Note that you will need to provide a unique report thatencompasses both part 1 and part 2.Required methods to implementIn part 2 of the project you are required to implement the following query reduction retrievalmethod:Query reduction using IDF-r. We have discussed this method in the week 6 lecture (onlinevideo) and in the week 6 tutorial. This method is described in Koopman, Bevan, LiamCripwell, and Guido Zuccon, “Generating clinical queries from patient narratives: Acomparison between machines and humans.” Proceedings of the 40th international ACM SIGIRconference on Research and development in information retrieval. ACM, 2017. (see the firstparagraph of section 3.1 if you want a description from the literature — ignore the settings ofdescribed in that publication). You may have already implemented this for part 1 forreducing the boolean queries (tip 4), and in the relevant tutorial.Query reduction using Kullback-Liebler informativeness (KLI). This reduction method ispartially described in Daniel Locke, Guido Zuccon, and Harrisen Scells, “Automatic QueryGeneration from Legal Texts for Case Law Retrieval.” Asia Information Retrieval Symposium.Springer, Cham, 2017. (top of page 187)For IDF-r, we ask you explore reduction on the query formed by the title query. Queries will bereduced at a reduction of , where is the retantion rate, i.e. means retaining 85%of the original terms. We ask you explore three retantion rates on the training set: 85%, 50% and30%. When rounding the number of query terms to retain to an integer number, use the ceilingfunction.For implementing KLI, consider the following, revised definition of this method. The KLI of a termis formally defined bywhere is the set of documents provided to rank (i.e. the documents initially retrieved by theBoolean query), and is the entire collection as indexed in the provided index. Thus, you need tocompute, for each query term, the probability of the term appearing in the provided retrieved set(i.e. term frequency in the set — note, here is not representing one document!, but the setof initially retrieved documents): use MLE to compute this. Similarly, use MLE to compute theprobability of term appearing in the collection. Query reduction is then performed by rankingquery terms in decresing value of , and applying the retaintion rate . For KLI, perform asimilar exploration of retation rates as for IDF- .For both methods, rank documents according to the reduced queries using BM25 with the bestparameters found from part 1 for the dataset you are experimenting in.When tuning, tune with respect to MAP.We strongly recommend you use and extend the Maven project provided for part 1 to implementthese methods. You should have already attempted the implementation of IDF- as part of therelevant tutorial exercise.In the report, detail how the methods were implemented, including which formula youimplemented.What queries to useFor part 2, we ask you to consider the queries for each topic created from the title field of eachtopic. For example, consider the example (partial) topic listed below: the query will be Rapiddiagnostic tests for diagnosing uncomplicated P. falciparum malaria in endemiccountries (you may consider performing text processing). This is the same query type used inpart 1.Above: example topic fileRequired evaluation to performIn part 1 of the project you are required to perform the following evaluation:1. For all methods, train on the training set for the 2017 topics with respect to the retaintionrate and test on the testing set for the 2017 topics (using the parameter value you selectedfrom the training set). Report the results of every method on the training (the best selected)and on the testing set, separately, into one table. Perform statistical significance analysisacross the results of the methods.2. Comment on the results reported in the previous table by comparing the methods on the2017 dataset.3. For all methods, train on the training set for the 2018 topics (with respect to the retaintionrate and test on the testing set for the 2018 topics (using the parameter value you selectedfrom the training set). Report the results of every method on the training (the best selected)and on the testing set, separately, into one table. Perform statistical significance analysisacross the results of the methods.4. Comment on the results reported in the previous table by comparing the methods on the2018 dataset.5. Perform a topic-by-topic gains/losses analysis for both 2017 and 2018 results on the testingdatasets, by considering as baseline (tuned) BM25.6. Comment on trends and differences observed when comparing the findings from 2017 and2018 results. Is there a query reduction method that consistently outperform the others?In terms of evaluation measures, evaluate the retrieval methods with respect to mean averageprecision (MAP) using trec_eval . Remember to set the cut-off value ( -M , i.e. the maximumnumber of documents per topic to use in evaluation) to the number of documents to be reranked for each of the queries. Using trec_eval , also compute Rprecision (Rprec), which is theprecision after R documents have been retrieved (by default, R is the total number of relevantdocs for the topic).For all statistical significance analysis, use paired t-test; distinguish between p<0.05 and p<0.01.Topic: CD008122Title: Rapid diagnostic tests for diagnosing uncomplicated P. falciparummalaria in endemic countriesQuery:1. Exp Malaria/2. Exp Plasmodium/3. Malaria.ti,ab4. 1or2or35. Exp Reagent kits, diagnostic/ 6. rapid diagnos* test*.ti,ab7. RDT.ti,ab8. Dipstick*.ti,abHow to submitYou will have to submit 3 files:1. the report, formatted according to the provided template, saved as PDF or MS Worddocument. Note, write the report by combining part 1 (the previous assignment) and part 2(this assignment) results and methods. make sure you clearly label methods and results thatbelong to the different assignments.2. a zip file containing a folder called runs-part2 , which itself contains the runs (result files)you have created for the implemented methods.3. a zip file containing a folder called code-part2 , which itself contains all the code to re-runyour experiments. You do not need to include in this zip file the runs we have given to you.You may need to include additional files e.g. if you manually process the topic files into anintermediate format (rather than automatically process them from the files we provide you),so that we can re-run your experiments to confirm your results and implementation.If your set of runs is too big, please do the following:include in the zip the test runinclude in the zip the best train run you used to decide upon the parameter tuningcreate a separate zip file with all the runs; upload it to a file sharing service like dropbox orgoogle drive (or similar), then make sure it is visible without login and add the link to it toyour report. Please ensure that the link to the resources is available for at least 6 days afterthe submission of the assignment.All items need to be submitted via the relevant Turnitin link in the INFS7410 Blackboard site, by 19September 2019 17:00 Eastern Australia Standard Time, together with part 1, unless you havebeen given an extension (according to UQ policy), before the due date of the assignment. Note:appropriate, separate links are provided in the Assignment 2 folder in Blackboard for submissionof the report, or runs-part1, runs-part2, code-part1, and code-part2.INFS 7410 Project Part 2 – Marking SheetCriterion % 7100%450%FAIL 10%IMPLEMENTATIONThe ability to:• Understandimplement andexecute commonIR baseline• Understandimplement andexecute rankfusion methods• Perform textprocessing4 • Correctly implements bothquery reduction methods• Correctly implements only one ofthe specified query reductionmethods• No implementationEVALUATIONThe ability to:• Empirically evaluateand compare IRmethods• Analyse the results ofempirical IRevaluation• Analyse the statisticalsignificancedifference betweenIR methods’effectiveness5 • Correct empiricalevaluation has beenperformed• Uses all requiredevaluation measures• Correct handling of thetuning regime (train/test)• Reports all results for theprovided query sets intoappropriate tables• Provides graphical analysisof results on a query-byquery basis usingappropriate gain-loss plots• Provides correct statisticalsignificance analysis withinthe result table; andcorrectly describes thestatistical analysisperformed• Provides a writtenunderstanding anddiscussion of the resultswith respect to themethods• Provides examples ofwhere query reductionworks, and were it doesnot, and why, e.g.,discussion with respect toqueries, runs.• Correct empirical evaluation hasbeen performed• Uses all required evaluationmeasures• Correct handling of the tuningregime (train/test)• Reports all results for the providedquery sets into appropriate tables• Provides graphical analysis ofresults on a query-by-query basisusing appropriate gain-loss plots• Does not perform statisticalsignificance analysis, or errors arepresent in the analysis• No or only partial empirical evaluationhas been conducted, e.g. only on atopic set, or a subset of topics• Only report a partial set of evaluationmeasures• Fails to correctly handle training andtesting partitions, e.g. train on test,reports only overall resultsWRITE UPBinary score: 0/2The ability to:• use fluentlanguage withcorrect grammar,spelling andpunctuation• use appropriateparagraph,sentencestructure• use appropriatestyle and tone ofwriting• produce aprofessionallypresenteddocument,according to theprovidedtemplate1 • Structure of the documentis appropriate and meetsexpectations• Clarity promoted byconsistent use of standardgrammar, spelling andpunctuation• Sentences are coherent• Paragraph structureeffectively developed• Fluent, professional styleand tone of writing.• No proof reading errors• Polished professionalappearance• Written expression andpresentation are incoherent, with littleor no structure, well belowrequired standard• Structure of the document is notappropriate and does not meetexpectations• Meaning unclear as grammar and/orspelling contain frequent errors.• Disorganised or incoherent writing.