Organizations create and store a lot of content but often know too little about what this content represents. To give meaning to content, much effort is needed in advance. However, classification of content is time-consuming and expensive when done by humans. With the use of the Artificial Intelligence (AI) and Machine Learning (ML) capabilities, this can be done exponentially faster and often more accurately. On top, the added value improves content discovery and analysis. Doculayer Cognitive Content Management leverages AI and ML technologies for uncovering the true value of content. This article gives more insights in the unique cognitive capabilities of Doculayer.
Doculayer uses Optical Character Recognition (OCR) technology to extract text from documents. This makes documents searchable and the extracted text can be used for further processing. Using Natural Language processing, meaning can be given to words or combination of words extracted from documents. If the OCR service is not able to recognize words correctly, e.g. due to bad image quality or stains, they will be corrected with the Doculayer Post-OCR Correction service. By using both a content-based correction model and a dictionary to correct the identified mistakes, the Doculayer Post-OCR Correction boosts the quality of your OCR data.
Illustration 1: Post-OCR processing solution steps
Machine Learning by itself is not the holy grail, since it only works if a good training set (the gold standard) is available. Creating the training set involves substantial human effort when working with high volumes of many different types of documents. Doculayer Clustering is based on clustering techniques to automatically identify similar documents. It helps to speed up the preparation of a training set and provide better and more accurate insight in the relations between documents. The Doculayer visualization engine enables business users to present (complex) clusters as user friendly graphs and diagrams.
Proper Machine Learning requires both a training set and a test set, implementing classifiers and finally testing the several classifiers based on the training data. To improve results, a data scientist must manually test classifiers and compare the results.
With Doculayer Adaptive Machine Learning, the system can dynamically determine the most optimal classifier to achieve the best results instead of using fixed classifiers with a fixed training set.
When the Machine Learner detects that the outcome is below a predefined threshold value, the business user will be notified. Doculayer will request human intervention via an automatically generated task, so that the prefilled data fields can be corrected.
Doculayer (Inter-) Active Learning feeds the human corrections back and lets the learning process use this for training and extension of the training set.
Most documents contain Named Entities. The Named Entity is a word or phrase that clearly identifies one item from a set of other items that have similar attributes. These can be either company names, dates, geographical locations or more specific items such as a serial or social security number. The detection of these entities further allow Doculayer to discover Named Entity Relations to improve the performance and accuracy of information retrieval.
Doculayer provides automatic Named Entity detection, and for specific needs the built-in annotation tool can be used to produce relevant training material.
Powered by an Image Recognition algorithm, Doculayer is able to add context to the images fully automated. The algorithm uses Deep Learning techniques like Convolutional Neural Networks and is trained with a large set of images. As soon as it is trained, Doculayer will understand what is on the image and enriches it with relevant tags describing the image's contents. Unlike most recognition tools, which use cloud-based services, Doculayer's image recognition technology runs on-premise. Therefore, it offers the ultimate privacy and data control.
A typical Machine Learning task consists of several processors coupled together. Doculayer provides extensive features that allow you to create such processes using a visual configuration interface. The visual configuration of the processors gives the user a detailed view of the steps within a task, and the flow of content through the entire configuration.
Illustration 3: Visual Configuration of Workflow
Doculayer's cognitive capabilities rely entirely on our own set of AI and Machine Learning technology. Our dedication to provide on-premise solutions ensures that the privacy and confidentiality of our client's content is ensured. With maximum results.
Doculayer offers multilingual tools to support our client operations both locally and internationally.
About the co-authors:
Tim is a Software Developer in the Machine Learning team at Onior. He is a computer technology enthusiast with a strong passion for theoretical mathematics. When he is not writing code, chances are that he is learning a new language. He is an aspiring polyglot who speaks 6 languages (Dutch, English, French, German, Japanese, Russian) at different levels of proficiency.
Frederic is a Computational Linguist with an interest in Machine Translation. Now, he is part of the Machine Learning team at Onior, where he helps to implement ML in Content Management. For a long time, he has been fascinated by creating humanity through technology.