Comparing Software Tools for Optical Chemical Structure Recognition
The extraction of chemical information from images, known as Optical Chemical Structure Recognition (OCSR), has become increasingly relevant with the introduction of advanced machine learning methods.
This whitepaper compares eight open-access OCSR tools—DECIMER, ReactionDataExtractor, MolScribe, RxnScribe, SwinOCSR, OCMR, MolVec, and OSRA—using a comprehensive test set derived from patents and patent applications. The evaluation focuses on precision and recall, two key performance metrics crucial for analyzing intellectual property in chemistry patents. While each tool demonstrated strengths in different chemical and image categories, the results indicated a general need for further advancement in OCSR technology. Additionally, a machine learning image classifier was developed to automatically select the best-performing OCSR method based on image type. Both the classifier and datasets are publicly available as open-access tools, promoting further research and development in the field of chemical structure recognition.