Ben Chuanlong Du's Blog

It is never too late to learn.

Data for NLP Research

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The Multi-Genre NLI Corpus (MultiNLI)

General Language Understanding Evaluation (GLUE)

The Stanford Question Answering Dataset

SWAG (Situations With Adversarial Generations)

Reading Comprehension Dataset (RACE)

Heuristic Analysis for NLI Systems Data set (HANS)

The Cross-Lingual NLI Corpus (XNLI)

Standford NLP - Sentiment Treebank


CoLA: The Corpus of Linguistic Acceptability


Data Sources

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Swag: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference