Processing images to chemical structures

Posted January 15, 2020 in Blog

A lot of scientific information is captured in images – we are using machine learning techniques such as deep neural networks to classify images. For example, we have applied transfer learning to train a deep convolutional neural network for developing a ML classifier that detects if an image contains a chemical structure. If so, this image is processed using an OSR (optical structure recognition) software such as OSRA (https://sourceforge.net/p/osra/wiki/Home/).

Thus, we have processed all images from US, European Patent Office (EPO) and World Patent Office (WIPO) patents, and converted all images containing chemical structures to compound structures, including their SMILES string, InChI and a unique ontology concept ID (OCID, https://registry.identifiers.org/registry/ocid). For example, the following information is extracted from an image of patent US-08754081-B2.

If an unknown compound is found in an image, it is being registered with our registration system that makes those compounds openly available as Google’s BigQuery on SciWalker Open Data project (https://console.cloud.google.com/bigquery?project=sciwalker-open-data). This table currently contains more than 130 million compounds with unique InChI-Keys.

Our software OC|image2structure provides a RESTful service for extracting chemical structures from images. More specifically, it emcompasses a pipeline for

classifying images (i.e. deciding if an image depicts chemical structures) and
extracting chemical structures (via OSR, i.e. optical structure recognition) from them.

OC|image2structure is designed as a client-server solution. It can run either locally on a single machine or on a server that is accessible within a network.

If of interest for you please let us send more on OC|image2structure processing pipeline.

Posted January 15, 2020 in:

Blog

OntoChem team contributes to EU report on mental health policy

April 11, 2025

Unlocking data insights through semantic normalization

March 3, 2025

Processing images to chemical structures

Related posts

OntoChem team contributes to EU report on mental health policy

Unlocking data insights through semantic normalization

Speak to our experts

SOLUTIONS

RESOURCES

COMPANY