Skip to main content

Machine Learning to Code Work-Related Injury Narratives - 2020

The National Institute for Occupational Safety and Health (NIOSH) collects free-text injury narratives in a variety of data-sources. These narratives, often composed by the injured worker themselves, describe the circumstances leading to a work-place injury. In order to create basic surveillance reports to aid future preventative strategies, these narratives need to be categorized into a standardized coding system that identifies the cause of the injury. Traditionally, this was done manually however, recent efforts were made to automate this process. About 8 years ago, NIOSH developed an auto-coder that read a narrative and assigned it to the appropriate causation category. It was estimated that this auto-coder was about 82% accurate. This auto-coder relieved the manual burden of reading thousands of claims and was often more consistent in its designations. The purpose of this project was to host a public competition to improve upon NIOSH's current algorithm."

Project URL: https://github.com/NASA-Tournament-Lab/CDC-NLP-Occ-Injury-Coding

Geographic Scope: remotely, world-wide

Project Status: Complete - not recruiting volunteers

Participation Tasks: Classification or tagging, Data analysis, Problem solving,

Start Date: 10/17/2019

Project Contact: inh4@cdc.gov

Federal Government Sponsor:

CDC logo

Other Federal Government Sponsor: National Aeronautics and Space Administration (NASA); Bureau of Labor and Statistics

Fields of Science: Computers and techology, Health and medicine

Intended Outcomes: Automatically classify free-text injury narratives into standardized coding systems using state-of-the-art machine learning, natural language processing algorithms to ease the burden on manual coders.