Adapting the OCMiner text processing system to the CTD controlled vocabulary
Adapting OCMiner for Enhanced CTD Document Annotation.
OntoChem’s OCMiner text processing system was adapted to align with the Comparative Toxicogenomic Database (CTD) vocabulary, enabling high-speed annotation of large document collections in fields like genes, chemicals, and diseases. Using a modular UIMA-based framework, OCMiner converts CTD data into a standardized format, applies blacklists and whitelists to improve term accuracy, and delivers rapid, precise annotation across various document types. Strong results in the BioCreative challenge showcase its effectiveness in chemical annotation, providing insights into optimizing named entity recognition for complex biomedical data.