CLiDE Pro

OVERVIEW

Depictions of 2D chemical structures published in the literature are stored as bitmap images in most electronic sources of chemical information such as patents, journals and reports. Although the original chemical structures are usually created using chemical drawing programs which generate complete structural information, this information is lost during the publication process and if required, is normally regenerated by redrawing the structure with a computer program, which is time-consuming and prone to errors.

CLiDE Pro is a chemical OCR software tool aimed at automatic extraction of chemical information from either the printed chemistry literature, or from the equivalent electronic PDF version. CLiDE Pro is the latest incarnation of software to emerge from the long-term CLiDE (Chemical Literature Data Extraction) project.

FEATURES

  • Converts 2D structure diagrams into connection tables
  • Interprets generic structures
  • Supports document-oriented processing as opposed to page-oriented processing
    The whole document is loaded and processed at once rather than individual pages.
  • Handles various difficult features used in structures diagrams sush as
    • Crossing bonds often appearing in bridged structures
    • Circles representing aromaticity in benzene rings
    • Easily-misinterpretable bond formations (e.g. a single bond joined to a triple bond)
    • Ambiguous objects (e.g. vertical lines appearing in atom labels (I, Cl or forming bonds)
  • Loads PDF documents, as well as TIFF and BMP image files
  • Exports chemical information into MDL SDF and RG files
    RG is the generic extension of SDF defining the root molecule and its associated R-groups in one query.
  • Generic structures can be exported in either RG or SDF file format. In the latter case, R-groups are automatically substituted with their substitution values.
  • Operates in interactive or batch mode