Posts

Paper URL | Code and dataset URL Most current benchmarks for natural language understanding (NLU) (e.g. GLUE, SuperGLUE) contain tasks that are easily represented as classification tasks and provide models with large amount of task-specific training data. In these circumstances, when some model exceeds “human-level” performance on these benchmarks, it’s not completely comparable to humans who perform these tasks given only a few demonstrations. To combat this limitation authors seek to provide a standardized evaluation of different few-shot learning approaches and demonstrate a significant gap in the few-shot learning performance between humans and machines for NLU tasks....