Document AI
Data is valuable and is a big part of what makes a company competitive. A recent study claims that about 3.5 quintillion bytes of data are being created every single day in 2023. However, most of this data is in the form of unstructured documents and it is no easy task for organizations to extract meaningful insights and make the data work for them. It is also a challenge to store and process the high volume of data coming from various channels effectively. With Rappit Undoc, organizations can meet these challenges head-on.
An overview of Rappit Undoc
Rappit Undoc is an AI-powered platform for document automation that enables commercial and private developers to build document processing and integration workflows by stitching together machine learning models and software components using easy clicks and configurations.
Rappit Undoc can be used to automate all kinds of documents such as invoices, receipts, purchase orders, contracts, transportation documents, export documents, product quality specifications and hand-written lab reports that come in a variety of formats such as PDFs or images.
With Rappit Undoc, medium and large enterprises can:
- automate document collection and storage
- extract entities and meaningful insights from unstructured and semi-structured documents
- use the information and insights extracted to execute downstream workflows and integrate them into their back-end systems
- use the extracted data to index and search the documents
Take purchase invoice automation as an example of this end-to-end processing. We start with document ingestion by reading the configured mailboxes and downloading the purchase invoice documents sent by various suppliers. The next step is to extract the text from these PDFs and images and process the text to find invoice entities (things like invoice ID, invoice date, amounts and tax, supplier information, invoice line details). Then the invoice lines are automatically matched with the purchase order details in the system. Next, the non-matched lines are sent through a dispute management workflow and the matched invoices are posted to the back-end finance system for payment. These invoices are also stored and indexed with the extracted information for easy search and retrieval.
Leveraging Google’s Document AI solution in Rappit Undoc
At the heart of Rappit Undoc is the integration with Google’s Document AI for text recognition, text analysis and entity extraction. Google Cloud draws on its relationship with the Google Research organization to bring cutting-edge technologies into its solutions, making Document AI one of the most powerful text analytics platforms in the industry. Forrester named Google Cloud as a leader in The Forrester Wave™: Document-Oriented Text Analytics Platforms, Q2 2022 report, saying: “Google Cloud’s strengths include document capture, image analytics, full ModelOps cycle capabilities, unstructured data security, and integration of their research with Google Cloud’s augmented BI platform Looker.”
Document AI features a general processor for Document OCR (Optical Character Recognition), which enables us to identify and extract text from documents in over 200 languages for printed text and 50 languages for handwritten text. Rappit Undoc uses the text and layout information extracted from documents using the general Document OCR parser to train new models, extract custom fields, improve extraction quality and provide an intuitive verification app. Document AI also offers specialized parsers for procurement, contracts, lending and, most recently, identity. These parsers are integrated into Rappit Undoc as out-of-the-box document types.
Leveraging Document AI makes Rappit Undoc a capable document automation platform.
How Rappit Undoc complements Google’s Document AI
While Document AI provides text and entity extraction APIs for developers to build their own document processing solution, Rappit Undoc provides a platform for business users to configure the end-to-end document automation solution for a wide range of document types, and start using them in minutes! Further, Rappit Undoc provides ways to enhance data accuracy, add custom fields to Google Cloud’s specialized parsers, and improve the self-learning capability of the system.
Some examples of how Rappit Undoc complements Google’s Document AI are given below:
Low Code platform
Rappit Undoc abstracts the Document AI setup and integration activities for a wide range of parsers without the need for coding or deployments. In Rappit Undoc, a document parser can be added with simple point-and-click and the user can start processing documents in seconds.
Improve accuracy with custom models and rules
With Rappit Undoc, you can plug in additional models and business rules to improve the accuracy of entity recognition. For example, you can plug in a vendor recognition model trained with templates from all your vendors to identify vendors with 100% accuracy and add vendor-specific or template-specific rules to validate the accuracy of other entities.
Enrich data with knowledge stores and lookups
You can set up syncing between your master data from other systems and the document automation system to pull additional information to enrich the data or use it in downstream processes. For example, you can have lookups defined for your vendor master data to read permitted tax percentages to validate the tax details extracted from the documents.
Extract custom entities
Rappit Undoc enables you to add new custom entities to the specialized parsers and extract them from the document. For example, you can extract equipment numbers from equipment service invoices sent by your vendors.
Intuitive (human-in-the-loop) verification app
The Rappit Undoc verification app enables business users to review and confirm the processed information. This intuitive app provides a great user experience. The text and position information from OCR is overlaid onto the document image, which makes it easy for the user to simply click or drag the image and fill in the fields. There are also additional options to resend documents for processing, reject them, or split/merge documents before re-processing.
Continuous feedback and training of the system
Rappit Undoc enables users to configure a feedback loop with the human-verified data to continuously improve the performance of the system. For example, template-specific features are extracted from the inputs provided for entities such as Invoice ID, Issue Date and Reference Order fields and are used in new predictions, which helps increase automation and accuracy. Similarly, new supplier templates that arrive for verification are automatically used to retrain the ML models.
Ingestion automation
Documents can be configured to be picked up from mailboxes, Google Drive or cloud storage buckets. Based on the ingestion configuration specified for a document type, the documents will automatically be picked up and submitted for processing.
Integration with other systems
With Rappit Undoc, you can configure and integrate your delivery endpoints using different push or pull protocols, for instance a webhook update, hot-folder uploads, or status APIs. For example, a fully parsed invoice document can be pushed to an accounts payable system using an API call.
Workflow
Rappit Undoc allows you to automate document-centric workflows that come as a natural step after data is extracted from documents. This could be an approval workflow or dispute management process flow for an invoice document.
Analytics
Rappit Undoc provides out-of-the-box document processing analytics, showing dashboards and graphs related to automation metrics, documents with errors, and documents in various stages of the configured workflow.
Document storage and search
While Rappit Undoc can integrate with Google’s Document AI Warehouse to search, store and manage documents, it also provides a cost-effective alternative to a more localized setup with a built-in document management solution using cloud storage and a distributed search engine.