What is HotpotQA?
HotpotQA is a question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts to enable more explainable question answering systems. It is collected by a team of NLP researchers at Carnegie Mellon University, Stanford University, and Université de Montréal.
For more details about HotpotQA, please refer to our EMNLP 2018 paper:
If you work on open-domain multi-hop question answering, you might also be interested in a new dataset one of our authors (Peng Qi) published more recently, BeerQA, which features open-domain questions that might require varying hops of reasoning to answer, and which HotpotQA is made part of.
Getting started
HotpotQA is distributed under a CC BY-SA 4.0 License. The training and development sets can be downloaded below.
A more comprehensive summary about data download, preprocessing, baseline model training, and evaluation is included in our GitHub repository, and linked below.
Once you have built your model, you can use the evaluation script we provide below to evaluate model performance by running python hotpot_evaluate_v1.py <path_to_prediction> <path_to_gold>
To submit your models and evaluate them on the official test sets, please read our submission guide hosted on Codalab.
We also release the processed Wikipedia used in the process of creating HotpotQA (also under a CC BY-SA 4.0 License), serving both as the corpus for the fullwiki setting in our evaluation, and hopefully as a standalone resource for future researches involving processed text on Wikipedia. Below please find the link to the documentation for this corpus.
Stay connected!
Join our Google group to receive updates or initiate discussions about HotpotQA!
If you use HotpotQA in your research, please cite our paper with the following BibTeX entry
@inproceedings{yang2018hotpotqa, title={{HotpotQA}: A Dataset for Diverse, Explainable Multi-hop Question Answering}, author={Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D.}, booktitle={Conference on Empirical Methods in Natural Language Processing ({EMNLP})}, year={2018} }
Model | Code | Ans | Sup | Joint | ||||
---|---|---|---|---|---|---|---|---|
EM | F1 | EM | F1 | EM | F1 | |||
1 Aug 7, 2023 |
Beam Retrieval (single model) BUPT & Tencent (Zhang, Zhang, Zhang, et al. 2023) |
72.69 | 85.04 | 66.25 | 90.09 | 50.53 | 77.54 | |
2 Jul 7, 2022 |
PipNet (single model) Tencent Cloud Xiaowei |
72.26 | 84.86 | 63.71 | 89.41 | 48.76 | 76.95 | |
3 Jun 27, 2022 |
Smoothing R3 (single model) Fudan University & Huawei Poisson Lab Rethinking Label Smoothing on Multi-hop Question Answering |
72.07 | 84.34 | 65.44 | 89.55 | 49.73 | 76.69 | |
4 Jan 28, 2022 |
FE2H on ALBERT (single model) Nanjing University From Easy to Hard: Two-stage Selector and Reader for Multi-hop Question Answering |
71.89 | 84.44 | 64.98 | 89.14 | 50.04 | 76.54 | |
5 May 16, 2022 |
R3 (single model) Fudan University & Huawei Poisson Lab Rethinking Label Smoothing on Multi-hop Question Answering |
71.27 | 83.57 | 65.25 | 88.98 | 49.81 | 76.02 | |
6 May 28, 2021 |
SAE+ (single model) JD AI Research |
70.74 | 83.61 | 63.70 | 88.95 | 48.15 | 75.72 | |
7 Jul 12, 2021 |
S2G+EGA (single model) Shanghai Jiao Tong University |
70.92 | 83.44 | 63.86 | 88.68 | 48.76 | 75.47 | |
8 Feb 27, 2021 |
S2G+ (single model) Shanghai Jiao Tong University |
70.72 | 83.53 | 64.30 | 88.72 | 48.60 | 75.45 | |
9 Jan 11, 2021 |
AMGN+ (single model) Anonymous |
70.53 | 83.37 | 63.57 | 88.83 | 47.77 | 75.24 | |
10 Mar 23, 2022 |
RD Model (single model) |
70.35 | 82.86 | 63.57 | 88.81 | 47.96 | 75.17 | |
11 Feb 14, 2022 |
FE2H on ELECTRA (single model) Anonymous |
69.54 | 82.69 | 64.78 | 88.71 | 48.46 | 74.90 | |
12 Sep 6, 2020 |
SpiderNet-large (single model) Kingsoft AI Lab |
70.15 | 83.02 | 63.82 | 88.85 | 47.54 | 74.88 | |
13 Feb 25, 2023 |
GIT (single model) KAIST |
70.07 | 82.86 | 62.59 | 88.53 | 47.22 | 74.84 | |
14 Feb 20, 2021 |
S2G+ (single model) Anonymous |
69.38 | 82.17 | 64.30 | 88.72 | 48.00 | 74.36 | |
15 Dec 30, 2021 |
AnonymousS (single model) Anonymous |
69.66 | 82.42 | 62.99 | 87.85 | 47.84 | 74.27 | |
16 Nov 23, 2020 |
Anonymous (single model) Anonymous |
70.24 | 82.36 | 62.26 | 88.46 | 46.81 | 74.27 | |
17 Dec 1, 2019 |
HGN-large (single model) Anonymous |
69.22 | 82.19 | 62.76 | 88.47 | 47.11 | 74.21 | |
18 Nov 15, 2020 |
AMGN (single model) Anonymous |
69.89 | 82.79 | 62.67 | 88.12 | 46.59 | 74.20 | |
19 Dec 15, 2021 |
BoSe (single model) Anonymous |
69.66 | 82.43 | 62.52 | 87.73 | 47.52 | 74.18 | |
20 Jun 10, 2020 |
BFR-Graph (single model) Anonymous |
70.06 | 82.20 | 61.33 | 88.41 | 45.92 | 74.13 | |
21 Apr 9, 2021 |
KIFGraph (single model) LAB |
69.53 | 82.42 | 61.79 | 87.98 | 46.49 | 74.12 | |
22 Dec 14, 2021 |
Anonymous (single model) Anonymous |
69.43 | 82.47 | 61.85 | 87.59 | 46.57 | 73.93 | |
23 May 11, 2020 |
GSAN-large (single model) Anonymous |
68.57 | 81.62 | 62.36 | 88.73 | 46.06 | 73.89 | |
24 Sep 14, 2021 |
GIT (single model) KAIST |
69.12 | 82.01 | 62.05 | 88.19 | 46.50 | 73.87 | |
25 Oct 6, 2020 |
FFReader-large (single model) Kyoto University (Alkhaldi et al., 2021) |
68.89 | 82.16 | 62.10 | 88.42 | 45.61 | 73.78 | |
26 May 28, 2020 |
ETC-large (single model) Anonymous |
68.12 | 81.18 | 63.25 | 89.09 | 46.40 | 73.62 | |
27 May 28, 2020 |
Longformer (single model) Anonymous |
68.00 | 81.25 | 63.09 | 88.34 | 45.91 | 73.16 | |
28 May 24, 2021 |
RealFormer (single model) Anonymous |
67.41 | 80.59 | 63.38 | 89.00 | 46.14 | 73.13 | |
29 Apr 15, 2022 |
EGF Reader-large (single model) Anonymous |
68.10 | 80.96 | 62.60 | 88.20 | 46.15 | 72.96 | |
30 Oct 18, 2019 |
C2F Reader (single model) Joint Laboratory of HIT and iFLYTEK Research (Shao, Cui et al. 2020) |
67.98 | 81.24 | 60.81 | 87.63 | 44.67 | 72.73 | |
31 Feb 11, 2021 |
Text-CAN large (single model) Usyd NLP |
67.53 | 80.80 | 61.62 | 86.95 | 45.75 | 72.52 | |
32 Jun 15, 2020 |
SEGraph (single model) Anonymous |
68.03 | 81.17 | 61.70 | 87.43 | 44.86 | 72.40 | |
33 Jan 24, 2021 |
S2G-large (single model) Anonymous |
67.34 | 80.24 | 62.66 | 87.61 | 45.80 | 72.26 | |
34 Jun 29, 2021 |
() |
67.44 | 80.27 | 60.08 | 86.16 | 44.69 | 71.46 | |
Jun 30, 2021 |
() (single model) Anonymous |
67.44 | 80.27 | 60.08 | 86.16 | 44.69 | 71.46 | |
36 Nov 19, 2019 |
SAE-large (single model) JD AI Research Tu, Huang et al., AAAI 2020 |
66.92 | 79.62 | 61.53 | 86.86 | 45.36 | 71.45 | |
37 Sep 27, 2019 |
HGN (single model) Microsoft Dynamics 365 AI Research Fang et al., 2019 |
66.07 | 79.36 | 60.33 | 87.33 | 43.57 | 71.03 | |
38 Aug 19, 2020 |
SpiderNet-Base (single model) Anonymous |
66.38 | 79.53 | 60.35 | 86.90 | 43.83 | 70.90 | |
39 Jul 29, 2019 |
TAP 2 (ensemble) IBM Research AI & IISc |
66.64 | 79.82 | 57.21 | 86.69 | 41.21 | 70.65 | |
40 Oct 1, 2019 |
EPS + BERT(wwm) (single model) Anonymous |
65.79 | 79.05 | 58.50 | 86.26 | 42.47 | 70.48 | |
41 Mar 2, 2021 |
S2G-base (single model) Anonymous |
63.72 | 77.02 | 61.33 | 87.19 | 43.74 | 69.51 | |
42 Feb 24, 2021 |
BDR+JNM (single model) Anonymous |
65.13 | 77.96 | 56.85 | 85.03 | 41.91 | 69.12 | |
43 Jul 29, 2019 |
TAP 2 (single model) IBM Research AI & IISc |
64.99 | 78.59 | 55.47 | 85.57 | 39.77 | 69.12 | |
44 Dec 3, 2020 |
AnonymousK (single model) Anonymous |
63.63 | 77.15 | 57.00 | 86.17 | 40.04 | 68.75 | |
45 May 5, 2021 |
GAR-BERT (single model) York University |
62.67 | 76.35 | 59.50 | 87.98 | 40.64 | 68.74 | |
46 May 31, 2019 |
EPS + BERT(large) (single model) Anonymous |
63.29 | 76.36 | 58.25 | 85.60 | 41.39 | 67.92 | |
47 Jul 30, 2020 |
() |
60.66 | 74.67 | 57.05 | 87.02 | 37.85 | 66.65 | |
48 May 11, 2020 |
GSAN-base (single model) Anonymous |
61.25 | 74.74 | 57.74 | 86.28 | 39.56 | 66.62 | |
49 Feb 12, 2021 |
Text-CAN (single model) Usyd NLP |
60.17 | 73.99 | 58.33 | 85.75 | 39.31 | 65.95 | |
50 Aug 31, 2019 |
SAE (single model) JD AI Research Tu, Huang et al., AAAI 2020 |
60.36 | 73.58 | 56.93 | 84.63 | 38.81 | 64.96 | |
51 Mar 13, 2021 |
GAR (single model) York University |
56.61 | 71.40 | 58.36 | 87.27 | 36.79 | 64.01 | |
Mar 15, 2021 |
() |
56.61 | 71.40 | 58.36 | 87.27 | 36.79 | 64.01 | |
53 Jun 13, 2019 |
P-BERT (single model) Anonymous |
61.18 | 74.16 | 51.38 | 82.76 | 35.42 | 63.79 | |
54 Sep 16, 2019 |
LQR-net 2 + BERT-Base (single model) Anonymous |
60.20 | 73.78 | 56.21 | 84.09 | 36.56 | 63.68 | |
55 Apr 11, 2019 |
EPS + BERT (single model) Anonymous |
60.13 | 73.31 | 52.55 | 83.20 | 35.40 | 63.41 | |
56 May 16, 2019 |
PIPE (single model) Anonymous |
59.77 | 72.77 | 52.53 | 82.82 | 35.54 | 62.92 | |
57 Dec 1, 2019 |
SEval (single model) Anonymous |
61.87 | 74.37 | 45.73 | 80.50 | 33.32 | 62.73 | |
58 Jun 8, 2019 |
TAP (single model) |
58.63 | 71.48 | 46.84 | 82.98 | 32.03 | 61.90 | |
59 Aug 14, 2019 |
SAQA (single model) Anonymous |
55.07 | 70.22 | 57.62 | 84.19 | 35.94 | 61.72 | |
60 Sep 2, 2019 |
MKGN (single model) Anonymous |
57.09 | 70.69 | 54.26 | 83.54 | 35.59 | 61.69 | |
61 Apr 19, 2019 |
GRN + BERT (single model) Anonymous |
55.12 | 68.98 | 52.55 | 84.06 | 32.88 | 60.31 | |
62 Jun 19, 2019 |
LQR-net + BERT-Base (single model) Anonymous |
57.20 | 70.66 | 50.20 | 82.42 | 31.18 | 59.99 | |
63 Apr 22, 2019 |
DFGN (single model) Shanghai Jiao Tong University & ByteDance AI Lab (Xiao, Qu, Qiu et al. ACL19) |
56.31 | 69.69 | 51.50 | 81.62 | 33.62 | 59.82 | |
64 Nov 21, 2018 |
QFE (single model) NTT Media Intelligence Laboratories (Nishida et al., ACL'19) |
53.86 | 68.06 | 57.75 | 84.49 | 34.63 | 59.61 | |
65 Jun 3, 2020 |
IRC (single model) NTT Media Intelligence Laboratories (Nishida et al., 2021) |
58.54 | 72.67 | 36.56 | 79.53 | 23.57 | 59.43 | |
66 Apr 17, 2019 |
LQR-net (ensemble) Anonymous |
55.19 | 69.55 | 47.15 | 82.42 | 28.42 | 58.86 | |
67 Mar 4, 2019 |
GRN (single model) Anonymous |
52.92 | 66.71 | 52.37 | 84.11 | 31.77 | 58.47 | |
68 Mar 1, 2019 |
DFGN + BERT (single model) Anonymous |
55.17 | 68.49 | 49.85 | 81.06 | 31.87 | 58.23 | |
69 Mar 4, 2019 |
BERT Plus (single model) CIS Lab |
55.84 | 69.76 | 42.88 | 80.74 | 27.13 | 58.23 | |
70 May 18, 2019 |
KGNN (single model) Tsinghua University (Ye et al., 2019) |
50.81 | 65.75 | 38.74 | 76.79 | 22.40 | 52.82 | |
71 Jul 14, 2021 |
RoBERTa-L Two-step Model (single model) Anonymous |
67.61 | 80.36 | 1.10 | 64.01 | 0.76 | 52.50 | |
72 Mar 13, 2021 |
GAR-NOSF (single model) York University |
56.20 | 71.17 | 9.37 | 54.76 | 6.25 | 41.42 | |
Mar 15, 2021 |
() |
56.20 | 71.17 | 9.37 | 54.76 | 6.25 | 41.42 | |
74 Aug 24, 2020 |
() |
56.78 | 70.93 | 8.35 | 53.77 | 5.23 | 40.89 | |
75 Oct 10, 2018 |
Baseline Model (single model) Carnegie Mellon University, Stanford University, & Universite de Montreal (Yang, Qi, Zhang, et al. 2018) |
45.60 | 59.02 | 20.32 | 64.49 | 10.83 | 40.16 | |
76 Aug 24, 2020 |
() |
52.61 | 68.17 | 9.00 | 53.62 | 5.76 | 39.25 | |
- Feb 3, 2020 |
Unsupervised Decomposition (single model) Facebook AI Research, New York University & University College London Perez et al. EMNLP 2020 |
66.33 | 79.34 | N/A | N/A | N/A | N/A | |
- Sep 24, 2019 |
ChainEx (single model) UT Austin (Chen et al., 2019) |
61.20 | 74.11 | N/A | N/A | N/A | N/A | |
- Feb 27, 2019 |
DecompRC (single model) University of Washington (Min et al., ACL'18) |
55.20 | 69.63 | N/A | N/A | N/A | N/A |
Model | Code | Ans | Sup | Joint | ||||
---|---|---|---|---|---|---|---|---|
EM | F1 | EM | F1 | EM | F1 | |||
1 May 10, 2021 |
AISO (single model) Institute of Computing Technology, Chinese Academy of Sciences (Zhu, Pang et al., EMNLP 2021) |
67.46 | 80.52 | 61.17 | 86.02 | 44.87 | 72.00 | |
2 Jan 31, 2023 |
Chain-of-Skills (single model) Carnegie Mellon University, Microsoft Research and UIUC Ma et al. ACL 2023 |
67.38 | 80.14 | 61.25 | 85.31 | 45.65 | 71.65 | |
3 Feb 1, 2021 |
TPRR (single model) Huawei Poisson Lab & Parallel Distributed Computing Lab |
66.95 | 79.50 | 59.43 | 84.25 | 44.37 | 70.83 | |
4 Jan 15, 2021 |
HopRetriever + Sp-search (single model) Huawei Noah's Ark Lab & Huawei Cloud (Li, Li, Shang, et al. 2020) |
67.13 | 79.91 | 57.38 | 83.52 | 43.20 | 70.61 | |
5 Dec 1, 2020 |
EBS-Large (single model) Samsung SDS AI Research |
66.18 | 79.32 | 57.29 | 83.98 | 41.95 | 70.04 | |
6 Dec 18, 2020 |
HopRetriever (single model) Huawei Noah's Ark Lab |
67.13 | 79.91 | 57.23 | 82.59 | 43.10 | 69.84 | |
7 Nov 30, 2020 |
IRRR+ (single model) Stanford University & Samsung Research (Qi, Lee, Sido, and Manning. 2020) |
66.33 | 79.10 | 56.92 | 83.24 | 42.75 | 69.60 | |
8 Dec 31, 2020 |
Anonymous (single model) Anonymous |
65.68 | 78.49 | 58.24 | 83.31 | 43.44 | 69.54 | |
9 Sep 7, 2020 |
EBS-SH (single model) Samsung SDS AI Research |
65.53 | 78.61 | 55.90 | 83.13 | 40.91 | 68.94 | |
10 Aug 3, 2020 |
IRRR (single model) Stanford University & Samsung Research (Qi, Lee, Sido, and Manning. 2020) |
65.71 | 78.19 | 55.93 | 82.05 | 42.14 | 68.59 | |
11 Oct 27, 2020 |
Anonymous (single model) Anonymous |
65.21 | 78.02 | 56.61 | 82.44 | 42.26 | 68.54 | |
12 Sep 10, 2020 |
Anonymous (single model) Anonymous |
65.05 | 78.02 | 55.35 | 82.69 | 40.51 | 68.37 | |
13 Aug 6, 2020 |
Anonymous (single model) Anonymous |
64.94 | 78.18 | 54.49 | 82.48 | 39.44 | 68.10 | |
14 Aug 28, 2020 |
Anonymous (ensemble) Anonymous |
65.26 | 78.27 | 54.22 | 82.21 | 40.02 | 68.08 | |
15 Oct 29, 2020 |
HopRetriever-V2 (single model) anonymous |
64.83 | 77.81 | 56.08 | 81.79 | 40.95 | 67.75 | |
16 May 13, 2021 |
Anonymous (single model) Anonymous |
62.90 | 75.82 | 57.71 | 81.26 | 42.18 | 67.08 | |
17 Dec 4, 2021 |
AFSGraph-retriever (single model) Anonymous |
64.55 | 77.79 | 55.65 | 81.23 | 41.05 | 66.98 | |
18 May 19, 2021 |
Anonymous (single model) Anonymous |
62.67 | 75.51 | 57.54 | 80.93 | 42.03 | 66.87 | |
19 Aug 26, 2020 |
Recursive Dense Retriever (single model) Facebook AI & UCSB & UMass Xiong, Li et al., ICLR 2021 |
62.28 | 75.29 | 57.46 | 80.86 | 41.78 | 66.55 | |
20 May 21, 2020 |
Step-by-Step Retriever (single model) Joint Laboratory of HIT and iFLYTEK Research |
62.95 | 75.43 | 54.61 | 80.00 | 40.36 | 66.22 | |
21 Nov 28, 2020 |
Anonymous (single model) Anonymous |
61.79 | 74.71 | 53.51 | 80.05 | 38.43 | 64.45 | |
22 Jun 9, 2020 |
HopRetriever-V1 (single model) anonymous |
60.83 | 73.93 | 53.07 | 79.26 | 38.00 | 63.91 | |
23 May 21, 2020 |
DDRQA (single model) Georgia Institute of Technology & Peking University (Yuyu, Ping et al. 2020) |
62.53 | 75.91 | 51.01 | 78.86 | 36.04 | 63.88 | |
24 Jul 6, 2020 |
Anonymous (single model) Anonymous |
64.29 | 77.23 | 51.12 | 78.57 | 36.29 | 63.75 | |
25 Mar 6, 2020 |
DR model large (single model) Anonymous |
62.01 | 75.32 | 49.88 | 77.77 | 35.44 | 62.95 | |
26 Nov 24, 2021 |
() |
61.71 | 74.57 | 50.04 | 77.16 | 36.77 | 62.92 | |
Nov 24, 2021 |
HopAns (single model) ptf |
61.71 | 74.57 | 50.04 | 77.16 | 36.77 | 62.92 | |
28 Nov 21, 2020 |
Anonymous (single model) Anonymous |
60.44 | 73.22 | 52.01 | 77.05 | 37.98 | 62.86 | |
29 Nov 15, 2021 |
Multi-dimensional-AFSGraph (single model) Anonymous |
61.53 | 74.61 | 50.33 | 77.24 | 36.21 | 62.44 | |
30 Feb 11, 2020 |
HGN-albert + SemanticRetrievalMRS IR (single model) Anonymous |
59.74 | 71.41 | 51.03 | 77.37 | 37.92 | 62.26 | |
31 Aug 19, 2021 |
Tree-shaped-cluster (single model) Anonymous |
60.31 | 73.14 | 49.87 | 76.83 | 35.85 | 61.73 | |
32 Feb 6, 2021 |
AFSgraph (single model) Anonymous |
60.08 | 72.97 | 49.96 | 76.85 | 35.89 | 61.66 | |
33 Nov 6, 2019 |
Robustly Fine-tuned Graph-based Recurrent Retriever (single model) Salesforce Research & University of Washington (Asai et al., ICLR 2020) |
60.04 | 72.96 | 49.08 | 76.41 | 35.35 | 61.18 | |
34 Oct 4, 2020 |
AFSgraph model (single model) Anonymous |
60.06 | 72.97 | 48.49 | 75.94 | 35.03 | 60.90 | |
35 Dec 1, 2019 |
HGN-large + SemanticRetrievalMRS IR (single model) Anonymous |
57.85 | 69.93 | 51.01 | 76.82 | 37.17 | 60.74 | |
36 Jan 24, 2021 |
DPR-recurrent (single model) Anonymous |
59.79 | 72.65 | 47.95 | 74.89 | 34.54 | 60.23 | |
37 Jan 19, 2021 |
RoBERTa-DenseRetriever (single model) Anonymous |
59.60 | 72.43 | 47.87 | 74.79 | 34.53 | 60.05 | |
38 Oct 7, 2019 |
HGN + SemanticRetrievalMRS IR (single model) Microsoft Dynamics 365 AI Research Fang et al., 2019 |
56.71 | 69.16 | 49.97 | 76.39 | 35.63 | 59.86 | |
39 Jul 27, 2020 |
() |
58.89 | 71.60 | 48.03 | 75.69 | 34.46 | 59.84 | |
40 Jan 21, 2021 |
GraphRR-Fast (single model) Anonymous |
58.21 | 70.86 | 42.91 | 71.30 | 30.95 | 56.85 | |
41 Feb 13, 2020 |
DR model (single model) Anonymous |
58.82 | 71.68 | 41.55 | 72.54 | 29.34 | 56.82 | |
42 Dec 8, 2019 |
Quark + SemanticRetrievalMRS IR (single model) Allen Institute for AI and Indian Institute of Technology A Simple Yet Strong Pipeline for HotpotQA |
55.50 | 67.51 | 45.64 | 72.95 | 32.89 | 56.23 | |
43 May 6, 2021 |
GAR-BERT (single model) York University |
52.28 | 64.84 | 49.00 | 74.73 | 33.00 | 56.10 | |
44 Sep 20, 2019 |
Graph-based Recurrent Retriever (single model) Anonymous |
56.04 | 68.87 | 44.14 | 73.03 | 29.18 | 55.31 | |
45 Sep 28, 2019 |
MIR+EPS+BERT (single model) Anonymous |
52.86 | 64.79 | 42.75 | 72.00 | 31.19 | 54.75 | |
46 Mar 14, 2021 |
GAR (single model) York University |
48.22 | 61.33 | 48.34 | 73.89 | 30.61 | 52.95 | |
47 Feb 4, 2020 |
Transformer-XH-final(BERT-base) (single model) University of Maryland, Microsoft AI & Research (Zhao et al. ICLR 2020) |
51.60 | 64.07 | 40.91 | 71.42 | 26.14 | 51.29 | |
48 Sep 21, 2019 |
Transformer-XH (single model) Anonymous |
48.95 | 60.75 | 41.66 | 70.01 | 27.13 | 49.57 | |
49 May 15, 2019 |
SemanticRetrievalMRS (single model) UNC-NLP (Nie et al., EMNLP'2019) |
45.32 | 57.34 | 38.67 | 70.83 | 25.14 | 47.60 | |
50 Nov 28, 2020 |
() |
43.22 | 54.35 | 38.62 | 63.61 | 25.37 | 44.88 | |
51 Feb 21, 2020 |
DrKIT (single model) Carnegie Mellon University, Google Research (Dhingra et al, ICLR 2020) |
42.13 | 51.72 | 37.05 | 59.84 | 24.69 | 42.88 | |
52 Nov 28, 2020 |
() |
38.94 | 50.72 | 38.29 | 62.19 | 23.33 | 41.77 | |
53 Jul 31, 2019 |
Entity-centric BERT Pipeline (single model) Anonymous |
41.82 | 53.09 | 26.26 | 57.29 | 17.01 | 39.18 | |
54 May 21, 2019 |
GoldEn Retriever (single model) Stanford University (Qi et al., EMNLP-IJCNLP 2019) |
37.92 | 48.58 | 30.69 | 64.24 | 18.04 | 39.13 | |
55 Aug 14, 2019 |
PR-Bert (single model) KingSoft AI Lab |
43.33 | 53.79 | 21.90 | 59.63 | 14.50 | 39.11 | |
56 Dec 4, 2019 |
SAFSr-Bert (single model) Anonymous |
39.35 | 51.40 | 24.21 | 58.54 | 13.34 | 37.00 | |
57 Feb 21, 2019 |
Cognitive Graph QA (single model) Tsinghua KEG & Alibaba DAMO Academy (Ding et al., ACL'19) |
37.12 | 48.87 | 22.82 | 57.69 | 12.42 | 34.92 | |
58 Mar 14, 2021 |
GAR-NOSF (single model) York University |
47.50 | 60.62 | 7.62 | 44.79 | 4.88 | 33.36 | |
59 Apr 12, 2021 |
IKFGraph (single model) anonymous |
35.82 | 45.33 | 15.97 | 51.20 | 11.46 | 30.38 | |
60 Jul 8, 2022 |
AnonymousQ (single model) Anonymous |
36.85 | 45.95 | 15.25 | 46.76 | 11.54 | 29.07 | |
Feb 12, 2024 |
() |
36.85 | 45.95 | 15.25 | 46.76 | 11.54 | 29.07 | |
62 May 15, 2023 |
HGN Model-reproduce (single model) Peking University |
33.51 | 42.69 | 15.59 | 49.32 | 10.95 | 28.40 | |
63 Mar 5, 2019 |
MUPPET (single model) Technion (Feldman and El-Yaniv, ACL'19) |
30.61 | 40.26 | 16.65 | 47.33 | 10.85 | 27.01 | |
64 Apr 7, 2019 |
GRN + BERT (single model) Anonymous |
29.87 | 39.14 | 13.16 | 49.67 | 8.26 | 25.84 | |
65 May 20, 2019 |
Entity-centric IR (single model) Anonymous |
35.36 | 46.26 | 0.06 | 43.16 | 0.02 | 25.47 | |
66 May 19, 2019 |
KGNN (single model) Tsinghua University (Ye et al., 2019) |
27.65 | 37.19 | 12.65 | 47.19 | 7.03 | 24.66 | |
67 Aug 16, 2019 |
SAQA (single model) Anonymous |
28.44 | 38.62 | 14.69 | 47.17 | 8.62 | 24.49 | |
68 Mar 4, 2019 |
GRN (single model) Anonymous |
27.34 | 36.48 | 12.23 | 48.75 | 7.40 | 23.55 | |
69 Nov 25, 2018 |
QFE (single model) NTT Media Intelligence Laboratories (Nishida et al., ACL'19) |
28.66 | 38.06 | 14.20 | 44.35 | 8.69 | 23.10 | |
70 Nov 29, 2019 |
SAFSr_model (single model) Anonymous |
28.91 | 39.14 | 8.03 | 40.55 | 4.06 | 20.90 | |
71 Oct 12, 2018 |
Baseline Model (single model) Carnegie Mellon University, Stanford University, & Universite de Montreal (Yang, Qi, Zhang, et al. 2018) |
23.95 | 32.89 | 3.86 | 37.71 | 1.85 | 16.15 | |
72 Nov 26, 2023 |
() |
7.35 | 12.14 | 0.00 | 7.84 | 0.00 | 1.11 | |
73 Jan 30, 2021 |
graph-recurrent-retriever+roberta-base w. S/R-pretraining (single model) Anonymous |
58.13 | 70.96 | 0.00 | 0.00 | 0.00 | 0.00 | |
74 Mar 1, 2019 |
() |
30.00 | 40.65 | 0.00 | 0.00 | 0.00 | 0.00 | |
75 Jun 25, 2024 |
Mistral multi hop with very large sources (single model) Anonymous |
7.98 | 22.14 | 0.00 | 0.00 | 0.00 | 0.00 | |
- Dec 13, 2022 |
() |
58.05 | 71.08 | N/A | N/A | N/A | N/A | |
- May 19, 2019 |
TPReasoner w/o BERT (single model) Anonymous |
36.04 | 47.43 | N/A | N/A | N/A | N/A | |
- Mar 3, 2019 |
MultiQA (single model) Anonymous |
30.73 | 40.23 | N/A | N/A | N/A | N/A |