This repository contains an UIMA based modular question answering (QA) pipeline that automatically answers multiple-choice questions for the entrance exams in English about world history, which provides an end-to-end baseline system for NTCIR QALab-1 challenge (stage 1).
- Pipeline source code and UIMA descriptors of each component:
- XML collection readers
- question analysis annotator (also called hypothesis generator)
- document retrieval based evidence collector
- rule based answer selection and evaluation CAS consumers
-
A specification document describes the overall system architecture, UIMA type systems, and each pipeline phases.
-
Example intermedia data (in UIMA XMI format) from every steps of the baseline pipeline.
-
UIMA-AS client descriptors that calls baseline annotator UIMA-AS services hosted in CMU servers.
###Folder Structures
The overall folder structure:
ntcir-qalab-cmu-baseline/
├── data/
│ └── baseline_xmi/ /* intermediate xmi files of baseline steps */
├── doc/ /* brief document describing pipeline phases */
├── goldstandard/ /* gold standard xml files for each year and combined */
│ ├── 1997/
│ ├── ...
│ └── 97-01-05-09/
├── input/ /* input xml files for each year and combined */
│ ├── 1997/
│ ├── ...
│ └── 97-01-05-09/
├── solr/ /* (optional) solr configuration files to create wikipedia index */
└── src/ /* source code and UIMA descriptors */
The input XML documents are in the sub-directories of the directory input. The name of the current input directory is specified in the configuration file of the collection reader. The current input folder is input/97-01-05-09/, which contains questions for four exam years. The file with the gold standard data is one of the sub-directories of the directory gold standards. The name of this directory is also specified in the configuration file of the collection reader.
The CMU baseline system requires JDK, git, and maven. Please refer to the prerequisites document regarding installing and setting up them. It has been tested on Linux and Mac OS. Other platforms should work, but have not been significantly tested.
To download and build the baseline:
git clone [email protected]:oaqa/ntcir-qalab-cmu-baseline.git
cd ntcir-qalab-cmu-baseline
mvn install
To run the baseline pipeline from command line:
mvn compile exec:java -Dexec.mainClass=edu.cmu.lti.ntcir.qalab.runner.SimpleRunCPE
For more details, please see our technical report:
Di Wang, Leonid Boytsov, Jun Araki, Alkesh Patel, Jeff Gee, Zhengzhong Liu, Eric Nyberg, and Teruko Mitamura. 2014. "CMU Multiple-choice Question Answering System at NTCIR-11 QA-Lab." In Proceedings of the 11th NTCIR Conference, Tokyo, Japan. 2014. [BibTex]
and the overview paper:
Hideyuki Shibuki, Kotaro Sakamoto, Yoshinobu Kano, Teruko Mitamura, Madoka Ishioroshi, Kelly Y. Itakura, Di Wang, Tatsunori Mori and Noriko Kando. "Overview of the NTCIR-11 QA-Lab Task." In Proceedings of the 11th NTCIR Conference, Tokyo, Japan. 2014. [BibTex]
1.0
Apache License, Version 2.0