OC|miner, OC|processor and OC|manager

Our mighty OC|miner text analysis and data mining toolbox is the culmination of many years of groundbreaking work regarding text analysis and data extraction. It can help you to answer your most difficult questions. While the OC|miner gives you easy access to the output of the heavy data-crunching done by the OC|processor, the OC|manager does all the heavy lifting when you want to add new sources or new ontologies to your workflow.



What we need from you:
We need to know the type of data sources you normally rely on for your research. You can choose from our vast array of repositories like patents, life science databases, PubMed Central full text documents, Medline, even social media and web-content or you can throw your own internal documents into the mix. Everything is possible.

Then we need to talk about the topics you are interested in. We have over 80 different ontological dictionaries used to enrich and understand the text. If you work in a very specialized field we will gladly develop a new ontology just for you to be able to reliably find relevant information and extract meaningful relationships.

Last, we need to know what data you need extracted from documents. We can extract knowledge from unstructured text and tables as well as from structured sources like databases. While not all customers need this feature if can be of enormous help if you do. Please ask us about this feature.


What we do:
First we collect and normalize raw data (content) according to your wishes. That can mean indexing your entire pool of internal
documents (on-site if you wish), or normalizing all available EU- and US-patents, or all PubMed Central full-text documents as well as all Medline abstracts, or all of those examples at the same time.
Normalizing means that the documents must be converted into a format that is machine-readable and therefore accessible for working with those documents.

Semantic search relies on an understanding of the text. This is achieved by using enormous dictionaries of curated terms. So in this second step we use ontological dictionaries to tag each and every recognizable concept in the text. All proteins are tagged as “protein”, all animals, plants and bacteria as “species”, all chemical compounds as “chemical compound” and so forth. We call these dictionaries OC|Ontologies and they come in many flavors: chemistry, species, cell lines, general anatomy, plant and fungal anatomy, diseases, pharmacological and physiological effects, cosmetology, proteins, genes, but also company information for business intelligence. If needed, we even build, update and validate customer-owned ontologies using our own ontology tools and algorithms. One of our unique selling points is the scalability of its patented methods – enabling our ontologies to contain up to a billion terms for annotation.

While annotating might sound simple deciding which term to annotate with what is enormously difficult. Many ruled based algorithms govern this process in an effort to increase the number of terms we recognize and to minimize falsely anntoted concepts.

In the last and most important step we then use the annotated texts for the extraction of implicit, unknown but useful information from databases and document collections. That can be achieved by looking at the distance between two concepts, like drug and disease for example. Or by using refined methods of relation extraction based on hand-curated lists of grammatical syntaxes.
The goal always being, to use the hidden information buried in millions of documents to create new knowledge and further the cause of our customers.


What you can expect from us:
If you opt for access to our easy to use data mining tool box OC|miner you can expect us to customize the product to your liking. We can optimize not only the toolset, but also the data sources, the ontologies used to annotate the documents and even the user interface. We stand ready to give our users a structured introduction with on-site presentations, online Q&A sessions, help files and fast email and phone support. We love your feedback and are happy to stay in contact with and support you for as long as you use our services.


