Client

Our client is a company that develops minibus taxi businesses in South Africa. Being a one-stop shop for taxi owners, they sell minibuses, provide insurance services, finance taxipreneurs to enable direct or indirect employment, and offer loyalty programs for drivers. With more than 36K minibus taxis, they cover 15 million commuter trips, providing convenient, cost-efficient, and fast public transport.

Challenge

The client dealt with a huge flow of printed, scanned, and photographed organizational documents, which comprised invoices, contracts, bills, CVs, insurance declarations, gas station receipts, driving licenses, financial calculations, and so on. The team had to enter, process, and search data from these documents manually, which was time-consuming and error-prone.Our team was asked to develop an AI-powered OCR (Optical character recognition) system to recognize values on printed, scanned, and photographed documents, enable automated data input as well as its further extraction from the document database for calculations per demand.

Project Description

Phase 1: API development

The first phase of OCR development involved building an API that performs document upload, recognition, and record to a database. We used microservices architecture and Google AI for these aims.

The client worked with various types of documents, and each type required a unique set of data to be recognized. To meet these requirements, we developed an algorithm for each document type that ruled which fields should be recognized and what data types they should employ. The algorithms were based on so-called anchors – unique fields by which the type of document (be it invoice, driving license, bill, etc) can be identified.

How the final process of work with API looked like: a user uploaded the document to the system while choosing a proper document type. Google AI applied a recognition algorithm, according to the chosen type. Once the recognition was completed, the API recorded two items in the database:

Initial document (before recognition).
JSON file with recognized document text.

In the end, the user could enter any piece of information from the document into the internal system, and then extract it anytime for further manipulations without manual effort, which accelerated data processing and operations performance, as well as streamlined data management.

Phase 2: UI client development

As the client derived significant benefits from the optical character recognition system described in the previous part, they decided to enable its cross-department usage and include more document types in the pipeline. To address this request, we extended the functionality and flexibility of the initial solution.

We developed a UI client to provide employees with centralized access to OCR features and enabled messaging with colleagues to share the recognition outputs.

Besides, we implemented functionality that allowed employees to selectively set values of each document to be recognized without attribution to the predefined algorithms. That saved recognition time and ensured extracting relevant, task-specific information only.

Some challenges we faced and resolved

Some documents were of bad quality, manually written, with text overlap, or corrupted physically (e.g. by liquid).

To overcome this, we introduced a human control stage. After the document recognition was completed, the employee had to verify the accuracy of recognition and either click "Approve" to save the results, or correct the misinterpreted values.

The volume of stored documents was growing exponentially, which increased database expenses.

To optimize database costs and ensure its stable performance under high loads, we applied caching (that stores frequently accessed data for faster retrieval in databases) and hashing (that uses algorithms to map data for efficient storage and retrieval).

As a by-product of these measures, the cost of Google AI was also optimized – if some document was processed before, it was recognized by the system and compared with the previously obtained results instead of initiating the document recognition one more time (which means additional costs).

Key Features

Optical character recognition of 50+ types of organizational documents (invoices, CVs, contracts, bills, driving licenses, and more).

Automated input and extraction of data from printed, scanned, photographed documents, including corrupted ones.

User-friendly interface for centralized usage of optical character recognition software and sharing document recognition outcomes between employees.

Human-control stage to prevent inaccurate recognition results.

Prevention of double recognition of the same document, associated with additional costs.

Team Composition & Project Duration

One full-stack developer worked on the project, which took 1 year.

Major Tech Stack

Google Cloud Vision API (OCR), Angular Material Design 16, Taiga UI, NGX-CHARTS, Blazor, C#/.Net 6, Entity Framework 6, Yapr API Gateway, Modular Monolith, Docker, ELK + Serilog, Seq, Azure Cloud Services (ACR, AAS, AKS, Azure key Value, Azure SQL Database), Azure Devops CI/CD/CD + Test + Artifacts

Results

Automation of document processing, saving employees time and effort for more complex, strategic tasks.
Orchestration of document and data management.
Increased accuracy of outcomes, based on document data, as well as speed of operations, associated with them.

OCR System Development for a Taxi Business Provider