Data for NLP Research

Mar 06, 2020

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The Multi-Genre NLI Corpus (MultiNLI)

General Language Understanding Evaluation (GLUE)

The Stanford Question Answering Dataset

SWAG (Situations With Adversarial Generations)

Reading Comprehension Dataset (RACE)

Heuristic Analysis for NLI Systems Data set (HANS)

The Cross-Lingual NLI Corpus (XNLI)

Standford NLP - Sentiment Treebank

WordNet

CoLA: The Corpus of Linguistic Acceptability

References

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Swag: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

Comments