OntoChem extracts U.S. Food and Drug Administration SPL files

The “Structured Product Labeling” (SPL) files of the United States FDA are a valuable public resource for drugs on the market. Thus, UNII (Unique Ingredient Identifier) numbers are assigned to each drug and its chemical structure information by the FDA registration system. UNII numbers are also used in several databases such as drug labels in…

Read More

Processing tables in documents and images

Probably most of the scientific information is captured in tables – for example in US patents from 2001-2017 we have extracted more than 10 million tables containing interesting properties on materials and compounds. At OntoChem we have developed several technologies to extract this knowledge over the last 5 years. These software modules may read different…

Read More

Processing images to chemical structures

A lot of scientific information is captured in images – we are using machine learning techniques such as deep neural networks to classify images.  For example, we have applied transfer learning to train a deep convolutional neural network for developing a ML classifier that detects if an image contains a chemical structure. If so, this…

Read More