Pittsburgh Partnership in Translational and Clinical Research Clarifies Ambiguity in Clinical Text

Clinical abbreviation ambiguity: a barrier to effective care and clinical automation

Time is a critical resource in any clinical setting, and every second saved is a moment reinvested into improving the lives of patients. A common way hospital staff save time is through the use of acronyms and abbreviations when writing clinical records, which is especially useful in a field as jargon-heavy as medicine. You can imagine the lost time and wrist pain that accompany dutifully writing phrases like “percutaneous transluminal coronary angioplasty” or “medical resonance cholangiopancreatography ” all day. Acronyms and abbreviations can, however, have ambiguous or overlapping definitions. For example, while a cardiologist may read “LA” as “left atrium”, an infectious disease specialist might incorrectly read the acronym as “lymphadenopathy” or “local anesthetic”.

This ambiguity has a negative impact on hospitals, propagating clinical miscommunication and disrupting patient care. In fact, a national review of medication errors found that nearly 5% of errors could be attributed to ambiguous acronym use, and resulted in improper prescribing, dosing, and medication preparation1. For example, in a prescription for “10U of insulin”, a hastily scrawled “U” may falsely mimic the number “0”. This extraordinarily dangerous mistake is reported to be the second most common abbreviation error, and can transform a prescribed “10 unit” dose of insulin into a “100 Unit” dose of insulin.

Physicians from unrelated disciplines are particularly susceptible to incorrectly interpreting clinical abbreviations, with one study showing that physicians from outside of pediatric care could only identify correctly abbreviations within pediatric records 31-63% of the time2. These errors of interpretation could result in patient harm in today’s highly trans-disciplinary practice of medicine.

Not only does this ambiguity confound clinicians reading clinical text, it also serves as a barrier to analyses of clinical text via natural language processing (NLP) systems3.

NLP is a discipline of artificial intelligence design that is concerned with analyzing and interpreting human language. NLP systems are a part of our everyday lives, powering the listening ears of “Siri” and “Alexa” while they translate our instructions, or autocompleting autocomplete the words we text. You can also blame NLP for every time you accidentally summoned Clippy (the defunct and much-maligned Microsoft Word helper) to tell you “it looks like you’re writing a letter”.

While these flashy tech applications of NLP are the most apparent examples, the clinical applications of NLP are among the most groundbreaking. NLP systems have the potential to rapidly sort through millions of medical files within hospital systems, and retrieve meaningful data that could improve medical research and clinical decision support. For example, a key bottle-neck in clinical research is the ability to identify a large population of eligible patients to enroll into a clinical trial. This is currently done manually, or at an expense of a large amount of time and money. NLP processing could quickly comb through medical records to identify and “match” patients to promising clinical trials.

However, just like our hypothetical physician who confuses “left atrium” with “lymphadenopathy”, computers operating these systems are similarly vexed by ambiguous acronyms in medical records.

UPMC Enterprises, the innovation and commercialization arm of UPMC, has recognized the problem of clinical abbreviation ambiguity and has been investing in development of NLP pipelines to improve data analytics at UPMC. UPMC Enterprises is part of the Pittsburgh Health Data Alliance (PHDA), which is a collaboration between UPMC, the University of Pittsburgh, and Carnegie Mellon University, that brings together large amounts of clinical data, and expertise in biomedical and machine learning research to develop innovative healthcare solutions. In March 2018, through the PHDA’s Center for Commercial Applications of Healthcare Data (CCA), housed at the University of Pittsburgh, Daqing He, PhD, a Professor at the School of Computing and Information of the University of Pittsburgh, was awarded with a translational research grant to address the unmet need in clinical abbreviation disambiguation. In collaboration with a team led by UPMC Enterprises’ Vice President of Analytics Rebecca Jacobson, PhD, Dr. He has created a promising new technology called Clinical Abbreviation Resolution Engine (CARE) to solve this major clinical problem.

Taking CARE to Demystify Clinical Reports

The CARE technology identifies acronyms within clinical records, analyzes the words and context surrounding that acronym, and uses machine learning models to predict the correct interpretation for that acronym.

While other groups have attempted to perform the task of clinical abbreviation and acronym disambiguation, they have relied on small clinical datasets of several thousand documents, and a few hundred abbreviations. Dr. He and his team are aiming much higher, harnessing the power of a UPMC-provided dataset with over 7 million clinical records, which have been appropriately stripped of patient identifiers, to train the abbreviation prediction models.

Although impressive, the use of such a vast amount of de-identified data created a new and important challenge for the CARE team. Accurate machine learning currently requires some degree of human guidance, often through the careful labeling and annotation of provided data. These annotations serve to help AI identify recognizable patterns that will help NLP systems learn, for example, how to tell the difference between an unambiguous everyday word (all) and an acronym/abbreviation (ALL, Acute lymphoblastic leukemia).

While having a human annotator perform this on a smaller data set is manageable, it is nearly impossible to have a team of annotators comb through 7 million documents.

“Initially we were focused on building up the machine learning tools, but very quickly we found that the big data aspect of annotation is a very interesting and an important stage of this project. We’ve spent a lot of time on this, because we need to annotate a large quantity of data, but at the same time we needed to figure out an effective and efficient way of doing that.”

Daqing He, PhD, Professor at the School of Computing and Information of the University of Pittsburgh and the Principal Investigator of CARE

In response to this technological roadblock, Dr. He’s team created ClusterAnnot, a tool that can annotate hundreds of sentences at a time by carefully grouping similar sentences into specific groups or clusters. While this process still requires human annotation, each annotation can accurately be applied to hundreds or thousands of similar sentences within a cluster, saving an incredible amount of time. Using this ClusterAnnot tool, Drs. He and Jacobson have been able to annotate over 2.7 million sentences and over 1,100 abbreviations.

Although work on fully annotating the UPMC dataset is ongoing, the CARE team has made remarkable progress on developing their disambiguation engine using annotated data from both UPMC and publicly available sources. Currently, the resolution engine is effective at predicting correct abbreviation definitions within tested clinical records, with an accuracy rate of 99%.

However, new challenges must be overcome when moving to clinical abbreviations used by different specialties or at different hospitals. Dr. He’s team hopes to be able to perform this disambiguation on a broad range of medical documents, which include diverse materials like discharge reports and patient education reports. To address this, the research team will be hard at work over the next few months developing even more robust machine learning models that can handle unseen abbreviations and their interpretations.

While work is still ongoing to refine CARE and ensure generalizability and compatibility with UPMC NLP systems, the research team has achieved an extraordinary amount of progress in 18 months. But developing a viable commercial product also requires constructing and validating a commercial translation path, which has been provided by sciVelo, part of the Innovation Institute.

Driving Commercialization of an Exciting Pittsburgh Technology

CARE is one of twelve University of Pittsburgh research projects funded by PHDA through the CCA, which seeks to develop basic scientific, translational, and clinical research products into new products and companies by supporting both technical development and commercial translation planning.

However, according to Andrew Brown, PhD, the Assistant Director of Commercial Translation Programs at sciVelo, CARE is a unique project due to its very direct path towards clinical impact. Most CCA-funded projects will require a spinout company to be formed in order to obtain venture investment, apply for FDA clearance or approval, and ultimately sell their products to hospitals, patients or other companies. In CARE’s case, the likely commercial endpoints are more immediate, including the possibility of incorporation with UPMC Enterprises’ existing NLP toolbox.

In addition to the clear commercial translation path, Brown emphasized that CARE represents an impressive technical collaboration between a CCA-funded Pitt research team and a research team within UPMC Enterprises.

There had been a lot of strong collaboration between researchers at Pitt and clinicians at UPMC providing clinical input, and between researchers at Pitt and market experts at UPMC Enterprises to provide perspectives, but CARE is an example where you have a product team at UPMC Enterprises working side-by-side with a research team at Pitt to build tools in a way that ensures it is readily deployable.

Andrew Brown, PhD, Assistant Director of Commercial Translation Programs at sciVelo

Like all other investigators funded by the PHDA through the CCA, a key part of applying for CCA support required that Dr. He write a successful proposal and present two rounds of successful pitches to the UPMC Enterprises team to obtain funding.

Support for this hurdle was provided by a sciVelo team previously led by Clinical and Translational Science Institute (CTSI) Program Manager Aneesh Ramaswamy, PhD and currently led by Andrew Brown, PhD.

Before I was awarded this project, sciVelo invited me to visit and give a presentation. The sciVelo team (led by Aneesh Ramaswamy) told me what I should pay attention to, what needs to be emphasized, what needs to be clarified, and how I can improve my presentation. I found that extremely useful.

Daqing He, PhD

Dr. He admitted that since his focus is on building up the capabilities of their technology, that navigating the intellectual property landscape and the process of commercial translation “is not a strength” of his. “sciVelo has helped a lot to tell us what we need to do” said Dr. He.

In addition to supporting the initial CCA proposal of CARE, the Innovation Institute sciVelo team has also played a role in helping to differentiate Dr. He’s technology from similar technologies, incorporating UPMC’s feedback into clear validation goals, and supporting the production and submission of invention disclosures to the university.

Though it has existed for fewer than two years, CARE seems remarkably poised towards making an effective transition into real world clinical use. More than anything, the success of CARE can be attributed to the expertise and effectiveness of the technical minds within Dr. He’s academic research team and Jacobson’s UPMC Enterprises group. CARE is a clear example of the power of collaboration when it comes to accelerating important technologies, solving complex scientific problems, and transforming healthcare.

The CARE project is an excellent example of one way academic research can be translated to solve meaningful problems in healthcare when the right experts work together and the support infrastructure is in place, which is exactly what the PHDA is set up to encourage.

– Zariel Johnson, PhD, PHDA Program Manager, UPMC Enterprises

While the targeted acronyms of CARE may be ambiguous, its utility and potential for the future of medicine are unmistakable. Today, a single acronym could derail your own access to effective and clearly communicated healthcare, ambiguously clouding your correct prescribed dosage or masking your eligibility for a life-saving clinical trial. But the CARE team, through their hard work and boundless expertise, are poised to bring us a future free from this problem.

Author/Photographer: Ryan Staudt

Spread the love