arrow_back
Go back

The Six Commandments of UNIX

Daan Vermunt
Jun 2024

At Send AI our goal is to automate document processing using a variety of AI models. We believe that applications should be as flexible as the variety in human communication, and therefore our goal is always to build simple yet intelligent systems. One of the ways we do this is by applying the UNIX principles to this process. In the following blog our Head of Engineering, Daan, will talk you through why looking at how UNIX was build can help create flexible, yet robust AI systems.

Why UNIX?

For all tech’s new bees, we will start with a brief introduction to UNIX. Every computer runs on an operating system. An operating system is at the core of everything a computer can do. One of the most important tasks being file management. A file management system manages files and directories on storage devices, providing a way to store, retrieve, and organize data. One of the most used operating systems (the backbone of the internet) is LINUX, and LINUX runs on UNIX.

So your operating system is Linux, and how Linux operates is UNIX. Then what is UNIX? And why should you care about it?

UNIX’s founders did not only produce an operating system, they also produced a philosophy. Built on the principles of simplicity and modularity, UNIX consists of a set of small, efficient programs that work harmoniously together. This design philosophy is particularly beneficial for AI document processing, where the ability to handle large volumes of data with precision and reliability is crucial.

UNIX is known for its robust file system management, which organizes files in a hierarchical structure starting from the root directory. By how it’s made it creates an efficient way for data management and creates flexible, reliable, and scalable programs. Understanding how UNIX handles file management can significantly enhance the efficiency and effectiveness of your AI workflows.

The UNIX philosophy

UNIX dates back to 1969 when Dennis Ritchie and Ken Thompson created this philosophy. (OG’s in the computing world). It is based on 6 principles:

In software engineering, there is always a tension between simplicity and capability. Finding the right abstractions between these two factors is what creates a good program. From the perspective of ‘it will only have to perform this specific task’ it is tempting to write programs that are not extensible. These programs are simple, which is good, but when tasks evolve, or parameters are added, they will get stuck. On the other hand, adding unnecessary complexity to programs, for potential parameters, makes them harder to use, maintain and extend.

Applying the UNIX principle to this is to strive for abstractions that always integrate well with the existing systems. This approach ensures that changes or expansions in requirements can be accommodated without introducing unnecessary complexity. These principles guide developers to create efficient, adaptable, and maintainable software. But then what about AI?

UNIX in AI

To apply the UNIX principles for AI (document processing infrastructure) development, we have to tweak two of the principles a little.

For document processing with AI, there are a lot of different models in place. This is due to the large amounts of data and the large variety of documents that pass our system. The infrastructure we have built should operate as a whole but with self-sufficient, interchangeable parts. Since there are so many different document types that come into our system, we don’t use plain text for data interchange. We use a unified document model, so that input always equals output. The principle is that you always use one interface, or one data type to communicate between programs, in order to make them interoperable. More about this in my next blog.

Because of the rapidly changing AI environment, it is important not to build your entire infrastructure around one single model. Chances are high that new and better models will come along in the lifecycle of your product. (Or at least, let’s hope your product has such a life cycle). Therefore within development, you should always strive for model interoperability. The system/ structure should be able to keep its full functionalities although an AI model could change. Or at least you should always strive to make it as easy as possible to change models, without adding complexities. Creating a system in which models can be plugged in and out could be UNIX’s answer to the tension between simplicity and capability.

Conclusion

Although 1969 was a long time ago, the UNIX principles still prove themselves ever valuable. Within the practice of software development, UNIX’s philosophy can lend a hand.

Document processing companies using AI have to operate in a field of rapidly evolving innovating, robust, and reliable systems. Document processing is such a crucial part of business operations, that the companies that are building these applications should take into account how large the operational dependencies on their product is and will be. Therefore always being able to innovate and upgrade their systems, without too much operation disruption for the customers. Grabbing back to the (tweaked) UNIX principles can provide guidance in this.

You should be able to upgrade to a new model but still have the functionality of the whole unit/program. So newer, better models can be plugged into the system, without changing how or if the system operates.

In the next blog, we will talk all about the different models behind these infrastructures and the need for using unified document models. Stay tuned for “Model All-Mighty”.

Ready to start automating your Document Processing flow?

At Send AI, we empower you to fine-tune your own language models. Are you eager to start speeding up your document processing flow while keeping error rates low?