NLP Challenges for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE1.0)

Adverse drug events (ADEs) are common and occur in approximately 2-5% of hospitalized adult patients. Each ADE is estimated to increase healthcare cost by more than $3,200. Severe ADEs rank among the top 5 or 6 leading causes of death in the United States. Prevention, early detection and mitigation of ADEs could save both lives and dollars. Employing natural language processing (NLP) techniques on electronic health records (EHRs) provides an effective way of real-time pharmacovigilance and drug safety surveillance.

We’ve annotated 1092 EHR notes with medications, as well as relations to their corresponding attributes, indications and adverse events. It provides valuable resources to develop NLP systems to automatically detect those clinically important entities. Therefore we are happy to announce a public NLP challenge, MADE1.0, aiming to promote deep innovations in related research tasks, and bring researchers and professionals together exchanging research ideas and sharing expertise. The ultimate goal is to further advance ADE detection techniques to improve patient safety and health care quality.

Tentative Timelines

  • Registration: begins August 1st, 2017
  • Training data release: November 1st, 2017, changed to November 15th, 2017, due to change in Data Recipient Agreement.
  • System submission: Feb 1st, 2018
  • Workshop: in conjunction with AMIA summit 2018, March 2018

Annotated Data

The entire dataset contains 1092 de-identified EHR notes from 21 cancer patients. Each EHR note was annotated with medication information (medication name, dosage, route, frequency, duration), ADEs, indications, other signs and symptoms, and relations among those entities. We split the data into a training set consisting of ~900 notes and a test set consisting of ~180 notes. Both will be released in BioC format.

Participanting groups must sign a Data Recipient Agreement and provide their own institute IRB approval. Registered users can download the Data Recipient Agreement here. Each participating team only needs to submit one Data Recepient Agreement and one IRB approval notice. All team members listed on the IRB are allowed to use the data. Click here to upload completed Date Recepient Agreement. Click here to upload IRB approval notice.

Task Definitions

MADE1.0 challenge consists of three tasks defined as follows.

  • Named entity recognition (NER): develop systems to automatically detect mentions of medication name and its attributes (dosage, frequency, route, duration), as well as mentions of ADEs, indications, other signs & symptoms.
  • Relation identification (RI): given the truth entity annotations, build system to identify relations between medication name entities and its attribute entities, as well as relations between medication name entities and ADE, indications and other sign & symptoms.
  • Integrated task (IT): design and develop a integrative system to conduct the above two tasks together.


All the participated teams will not get the test data itself, instead we will have a platform available to accept system submissions where all the submitted systems will be run on the same withheld test data, in the same computing environment and evaluated by the same evaluation scripts with the same metrics on different tasks. Each team is allowed to submit up to three versions of systems for each task, and each team can choose to participate either one or more tasks in this challenge.

Below are the publications of two baseline systems. In this competition, we used a different approach to partition the training and testing data. Therefore, the performance of the two baseline systems can be used as an approximation only.

1. Structured prediction models for RNN based sequence labeling in clinical text. (Abhyuday Jagannatha, Hong Yu; EMNLP 2016)

2. Bidirectional Recurrent Neural Networks for Medical Event Detection in Electronic Health Records. (Abhyuday Jagannatha, Hong Yu; NAACL HLT 2016)


Participants are asked to submit a short paper describing their methodologies. It can contain a graphical summary of the proposed architecture. The document should not exceed 4 pages, 1.5 line spacing, 12 font size. The authors of either top performing systems or particularly novel approaches will be invited to present or demonstrate their systems at the workshop. A special issue of a journal will be organized following the workshop.


Об авторе Лидия Пивоварова

СПбГУ - старший преподаватель, University of Helsinki - PhD student
Запись опубликована в рубрике Ресурсы/Софт. Добавьте в закладки постоянную ссылку.

Добавить комментарий

Ваш e-mail не будет опубликован. Обязательные поля помечены *