Processing images to chemical structures

15 January 2020

A lot of scientific information is captured in images – we are using machine learning techniques such as deep neural networks to classify images. For example, we have applied transfer learning to train a deep convolutional neural network for developing a ML classifier that detects if an image contains a chemical structure. If so, this image is processed using an OSR (optical structure recognition) software such as OSRA (https://sourceforge.net/p/osra/wiki/Home/).

Thus, we have processed all images from US, European Patent Office (EPO) and World Patent Office (WIPO) patents, and converted all images containing chemical structures to compound structures, including their SMILES string, InChI and a unique ontology concept ID (OCID, https://registry.identifiers.org/registry/ocid). For example, the following information is extracted from an image of patent US-08754081-B2.

If an unknown compound is found in an image, it is being registered with our registration system that makes those compounds openly available as Google’s BigQuery on SciWalker Open Data project (https://console.cloud.google.com/bigquery?project=sciwalker-open-data). This table currently contains more than 130 million compounds with unique InChI-Keys.

Our software OC|image2structure provides a RESTful service for extracting chemical structures from images. More specifically, it emcompasses a pipeline for

classifying images (i.e. deciding if an image depicts chemical structures) and
extracting chemical structures (via OSR, i.e. optical structure recognition) from them.

OC|image2structure is designed as a client-server solution. It can run either locally on a single machine or on a server that is accessible within a network.

If of interest for you please let us send more on OC|image2structure processing pipeline.

chemical compounds image2structure machine learning ML OSR patents

AUTHOR

Lutz Weber

Founder and CEO of the OntoChem GmbH

Ontological classification of chemical patents

A first publication describing the classification of chemical patents using OntoChem's OCMiner® h...

2013-01-23 08:33:47

Processing tables in documents and images

Probably most of scientific information is captured in tables – for example in US patents from 20...

2020-03-09 10:55:26

Semantic homonym resolution - key to reduce the number of false positive search hits

Many words can have different meanings - also known as “homonyms”. Homonymic terms are often the ...

2020-02-07 10:56:34

Ontochem GmbH AGB Legal Disclosure

Processing images to chemical structures

YOU MIGHT ALSO LIKE

Tags