arrow_back
Go back

The Holy Grail of AI Model Orchestration

Daan Vermunt
Aug 2024

In our previous blog, we explored how small AI models can outperform larger ones in terms of accuracy and control. Today, we’ll dive deeper into how these smaller models can be orchestrated into a full AI solution, powering document processing infrastructure and model pipelines.

The industry is increasingly shifting away from relying on one large, monolithic AI model to do everything. Instead, companies are turning to smaller, specialized models designed for specific tasks. The future of AI lies in the collaboration of these models—each performing a distinct role, yet working together to accomplish a larger goal. As this trend grows, so too will the need for a robust system to define, manage, and orchestrate these models efficiently.

This vision aligns with current AI development trends, where text-based interactions are increasingly managed by specialized models. Rather than using a single large model to generate a response, companies are parsing messages to identify the nature of the task—whether it’s mathematical, linguistic, or otherwise. For instance, when a message is identified as a mathematical query, it's routed to a specialized mathematical model designed for high accuracy in such tasks. The result is then processed and communicated back through a language model.

But what models should we use? How do they work together, and what tasks can they perform? Let’s explore.

The Instruments of Orchestration

To create a robust AI-driven document processing pipeline, we need to understand the types of models and the steps involved. This is called a pipeline because of the sequential steps documents undergo, with the ultimate goal being to keep the documents flowing smoothly and efficiently through each stage.

Types of Steps

  • Transformers: These models modify the properties of documents, like extracting data or increasing contrast. They are crucial for tasks such as extracting relevant information from documents and classifying data. They can also handle visual adjustments like contrast enhancement or image rotation.
  • Splitters: Since models always operate on a certain context, e.g. one single receipt, or a single mathematical formula, splitters are used to move to that unit of work. Splitters take a single input and produce one or multiple outputs. They’re crucial for separating data into chunks of relevant context 
  • Routers: Routers introduce flexibility by directing documents to different parts of the pipeline. They are vital for dynamically moving tasks to the corresponding tasks within the pipeline.

By combining these models in a way that suits your process, you can create a tailor-made solution for your document processing needs. These models serve as the building blocks for your infrastructure.

Example Steps in the Pipeline:

Transformers

  • Entity Extraction
  • Contrast Enhancement
  • RAG-Based Entity Extraction
  • Document Summarization
  • Text Vectorization
  • Multi-Modal Entity Extraction
  • Sentiment Analysis
  • Document Anonymization
  • Document Redaction

Splitters

  • Model-Based Segmentation
  • Page Splitting
  • Page Grouping

Routers

  • Classification-Based Routing
  • Rule-Based Routing
  • Vision-Based Classification
  • Sentiment-Based Routing
  • Metadata-Based Filtering

Documents move through these steps, each one contributing to the final output. However, what happens after each step? The models can produce different outcomes or 'side effects.'

Side Effects

  • Data Extraction: As documents flow through the pipeline, you can attach packages of extracted data to them.
  • External Effects: The pipeline can interact with external systems, such as making API calls to retrieve additional data or trigger external actions.

Managing the Complexity

Managing the complexity of these pipelines can be guided by the principles of UNIX, which emphasize simplicity, modularity, and interoperability.

Six Key Principles:

  1. Do One Thing and Do It Well: Each block (or model) focuses on a single task and optimizes for it.
  2. Write Programs That Work Together: By defining strict rules on what each step can do, you ensure that the order of steps doesn’t matter, enabling seamless collaboration between steps.
  3. Use a Unified Document Model for Data Interchange: Employing a single model (format/schema) for passing data between steps allows for easy chaining of tasks.
  4. Design for Flexibility and Extensibility: By setting up the system as a pipeline of steps, it becomes flexible and easy to extend with new functionalities in the form of additional steps.
  5. Avoid Unnecessary Complexity: Each step’s model is straightforward, with complexity abstracted away into the blocks themselves. While the block might be complex, connecting them is simple.
  6. Strive for Model Interoperability: It should be easy to upgrade models, change providers, or switch clouds within a single step without disrupting the pipeline.

Taking all six UNIX principles into account within our approach. The processing of hypercomplex, high-volume document streams is possible by making sure all steps are experts and all experts can always work together as part of a grander solution. The UNIX approach creates a flexible, yet robust workflow tailored to your business operation, to be able to assure the accuracy required. 

The Conductor: Orchestrating the Models

Understanding the building blocks of a document processing pipeline is just the beginning. The true power lies in effectively orchestrating these models. While current model orchestration is often managed manually, the future points toward AI-driven orchestration, where specialized models work together to accomplish complex tasks with greater efficiency.

At Send AI, we’ve developed a robust document processing infrastructure that empowers customers to orchestrate their own customized pipelines. By creating a network of steps and allowing users to control them, we ensure that models collaborate seamlessly while users retain ultimate control.

I envision a future where AI systems are composed of a network of specialized models, each responsible for a specific task. As AI evolves, the emphasis will shift from relying on a single, all-encompassing model to a more modular approach. This approach not only enhances accuracy but will ultimately also simplify the process of achieving the desired outcome.

As this trend continues, I am confident that there will be a growing need for a system that can define and orchestrate these tasks, ensuring that all models work in harmony. The future of AI will not be ‘the bigger the better’ in terms of models. The future of AI lies in the seamless integration and orchestration of these specialized models, with a system in place to manage and coordinate them effectively. This shift will not only improve efficiency but also pave the way for more sophisticated and reliable AI-driven solutions.

Orchestration is the key—and the future of orchestration will be AI.

Ready to start automating your Document Processing flow?

At Send AI, we empower you to fine-tune your own language models. Are you eager to start speeding up your document processing flow while keeping error rates low?