arrow_back

Model Almighty - Rethinking AI for Document Processing

Daan Vermunt

Jul 2024

In today's AI and document processing landscape, Large Language Models (LLMs) are in the spotlight. These models, designed to handle a wide range of tasks by leveraging extensive training on massive datasets, have become famous for their impressive capabilities. From writing business plans to love letters, these models seem equipped to do anything, and if they fall short, we simply make them larger.

However, despite their computing superpowers and the significant attention and investment they’ve drawn to the AI space, LLMs often struggle to produce predictable, accurate, and recurring results. This raises a critical question: what are the risks of implementing large models for important operational tasks?

Large Risks

People sometimes refer to it as "AI magic" because once the model is built, we often have little to no understanding of how it makes its calculations. An AI model is essentially a massive calculator that assigns weights to certain inputs and calculates the probability of a specific output. However, the sheer size of these models can create a sort of black box, making it difficult to understand exactly how these calculations are made and what weights and parameters are involved. The more parameters a model has, the more factors can (unknowingly) influence the generated output, making it difficult to trace back why the model produced a specific result.

This can be particularly problematic when building document processing solutions around one all-knowing model for tasks like document classification, entity extraction, and output verification. While an LLM might quickly produce some good results, it will likely get stuck at around 70% accuracy. To achieve higher accuracy, you have two main options:

Model Fine Tuning: Adjusting a pre-trained model on a specific dataset to improve its performance on particular tasks.
Model Retraining: Training the model from scratch, either with different data, more data, or a different (often bigger) model architecture.

Initially, we saw a trend of model retraining, where models became bigger and bigger. Currently, there is a significant trend in model fine-tuning.

Only the problem remains that due to the model's size, the predictability of its performance becomes smaller, making these adjustments more complex and less reliable. This unpredictability poses significant risks, especially when the model is used for critical operational tasks. Implementing these models in large-scale document processes can lead to issues like inaccuracy, unpredictability, high costs, and time consumption. But then should we use AI for document processing?

What Would UNIX Think?

Luckily, we can always return to the UNIX principles. The nature of LLMs does not align well with the first principle of UNIX, "Do one thing and do it well". The "large" indicates a vast scope of knowledge, but as the saying goes, “a jack of all trades is a master of none.” LLMs aim to cover a wide range of tasks, which can compromise their effectiveness in specific areas. So, how do we work around this? The answer lies in Domain Specific (Language) Models (DSLM).

DSM = a computational model tailored to optimize tasks within a specific field, using domain-specific data and knowledge.

As people discover the limitations of LLMs, they realize that higher accuracy is achieved by training specific models for specific tasks. There is more power in finding the right model to perform a particular task proficiently, than there is in the model all mighty. Instead of creating one all-encompassing document processing model, we can develop a document processing pipeline with multiple small models, each excelling in a single task. What does that look like?

Let’s take a document processing example: Say you want to extract patient data, prices, and dates from an image of 3 receipts. Only two are pharmacy receipts and one is a snack at the hospital cafeteria we must ignore.

The LLM approach:

Write a good prompt that is able to do this and send data to the LLM

‍

The benefit is that a whole solution is being provided by one model, so you immediately get an answer to all the questions you ask. The downside is that when the outcome isn’t what you want you only have two choices. You can try to engineer better prompts, which can be a hard guessing game. Or you can fine-tune/ retrain the model, which is very money- and time-consuming.

‍

The DSM approach:

Model 1 cuts out all different receipts

Model 2 filters out the snack receipt

Model 3 extracts relevant data

‍

The benefits of the DSM approach are higher accuracy, less prompt engineering, and less computing power required. Only it does create new problems. Because these DSM can’t solve a whole problem, we now have a range of partial solutions. So the main question is, how can we make these models communicate with each other so that all the parts can become whole?

This is where we apply the third UNIX principle: "use a unified document model for data interchange." When we make sure that the same document structure goes into the model as comes out of the model, we enable the different models to communicate with each other and tasks to be handed down from model to model through the pipeline.

In a document processing workflow, each stage should integrate smoothly to maintain consistency and reliability. This means using a single interface or data type for communication between programs, ensuring interoperability. Because input always equals output, we can go back and forth between models and tasks. By honoring the UNIX principles, we can design an AI document processing infrastructure that is both effective and reliable.

Send (AI) Solutions

To implement effective AI solutions for document processing, we must ensure that various models can collaborate on tasks without relying on one model to do everything well. The key to building a super-accurate document processing infrastructure lies in this pipeline principle. By ensuring that the input equals the output, we can move seamlessly from one step to the next in the document processing pipeline. This allows us to employ DSMs that excel at specific tasks and create a chain of programs that work well together.

Using a unified document model for data interchange and creating modular, specialized tools for individual tasks ensures accuracy and consistency in processing. This approach not only adheres to the UNIX philosophy but also enhances the robustness and scalability of the document processing system. Let's quickly name some other benefits of DSMs.

Benefits of Using DSMs.

Improved Accuracy: DSMs are designed to perform specific tasks exceptionally well, leading to more precise results.
Flexibility: Individual models can be updated or replaced without overhauling the entire system, allowing for continuous improvement and adaptation.
Efficiency: By breaking down tasks into smaller, manageable parts, processing can be faster and more efficient.
Reduced Complexity: Specializing in one task reduces the overall complexity of each model, making them easier to develop, understand, and maintain.
Scalability: New models and tasks can be added to the pipeline without disrupting existing processes, making it easier to scale operations.

By adopting these principles, we can overcome the limitations of LLMs when developing an AI-based document processing system and create a more efficient and reliable document processing infrastructure. Which in turn will guarantee a more sustainable approach to implementing AI in organizations.

Interested in learning more about the models we use and how we achieve this? Stay tuned for our upcoming blog "The Holy Grail - of AI Model Orchestration." Subscribe now to stay informed!

‍

Product

Solutions

Model Almighty - Rethinking AI for Document Processing

Large Risks

What Would UNIX Think?

Send (AI) Solutions

Start your AI journey today