Recent enhancements in the accuracy of CLiDE tool for extracting chemical structure data from patents and other documents

248th ACS National meeting
CINF, Hunting for Hidden Treasures: Chemistry Text Mining in Patents and Other Documents
Philadelphia, PA

Aniko T. Valko, A. Peter Johnson

We present an enhanced version of CLiDE, which is a long-term project aimed at detecting chemical structure diagrams rendered in images and converting these diagrams into chemical connection tables. The enhancement was achieved by introducing a feedback mechanism into CLiDE’s interpretation process. This mechanism makes use of a series of domain- and spatial-specific rules for identifying drawing features that convey a complex or an ambiguous meaning. Once such a feature is found, CLiDE automatically corrects the structural information being compiled and passed through subsequent interpretation steps.

This enhancement has a considerable effect on CLiDE’s accuracy in reconstructing chemical structures and auto-detecting interpretation errors. A detailed study of CLiDE’s performance on a large validation corpus will be presented. The validation corpus will include benchmark sets created by other projects and a set of non-Markush structures collected from patent documents.

Get in touch

You need a high-performing, reliable and easy-to-use software solution to speed up your next big scientific breakthrough. Getting the right solution is integral to advance your research and workflow.