Data for NLP Research

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The Multi-Genre NLI Corpus (MultiNLI)

General Language Understanding Evaluation (GLUE)

Reading Comprehension Dataset (RACE)

Heuristic Analysis for NLI Systems Data set (HANS)

The Cross-Lingual NLI Corpus (XNLI)

Standford NLP - Sentiment Treebank


CoLA: The Corpus of Linguistic Acceptability


Data Sources

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Swag: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference