Welcome to the EBM-NLP corpus for PICO Extraction

We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical ra ndomized controlled trials. Annotations include demarcations of textspans that describe the Patient populatione nrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the 'PICO' elemen ts). These spans are further annotated at amore granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary.

The complete details are described in our ACL 2018 publication. Since this publication, we have made improvements to the dataset:

  • Aligned the test set for the granular labels with the test set for the starting span labels to better support end-to-end systems and nested NER tasks.
  • Recollected granular labels for documents with low confidence to increase average quality of the training set.
  • Expanded the training set for the granular labels to improve coverage of documents in the starting span training set.

Current Results

The test set of 200 documents was annotated by medical professionals hired on upwork. To be added to the leaderboard, you can submit your labels for the test set and a discription of your model via the submission page.

Leaderboard
model f1 precision recall
logreg 0.45 0.31 0.82
lstm-crf 0.68 0.7 0.66
lstm-crf-bert 0.68 0.69 0.66
model f1 precision recall
logreg 0.41 0.28 0.83
lstm-crf 0.78 0.74 0.82
lstm-crf-bert 0.68 0.78 0.57
model f1 precision recall
logreg 0.35 0.22 0.85
lstm-crf 0.57 0.55 0.6
lstm-crf-bert 0.57 0.53 0.62
model f1 precision recall
logreg 0.56 0.44 0.79
lstm-crf 0.65 0.8 0.55
lstm-crf-bert 0.78 0.72 0.85
model f1 precision recall
logreg 0.36 0.3 0.47
lstm-crf 0.46 0.64 0.38
model f1 precision recall
logreg 0.45 0.44 0.45
lstm-crf 0.4 0.83 0.26
model f1 precision recall
logreg 0.25 0.17 0.44
lstm-crf 0.5 0.58 0.44
model f1 precision recall
logreg 0.38 0.3 0.52
lstm-crf 0.48 0.52 0.45