Distributed Computing

Overview
Two important attributes of a good cloud application are its ability to be resilient and to scale. The NTENT Platform achieves those objectives through a variety of strategies leveraging homogeneous servers, redundant spine-leaf networks, virtualization and solutions that are parallelized.

The first element to address resiliency is the hardware platform. We deploy our services on homogenous servers with plenty of memory and storage space. Each machine is connected to one another via a fast and redundant spine-leaf network. The network provides two independent paths from any machine to any other machine. This ensures that every machine can run the most demanding process, and also that data can efficiently transmit to any part of the platform, even if we lose a network card, a cable or a switch.

The second element related to resiliency is to deploy a reliable file system on top of the hardware. This is especially important when dealing with billions of documents that need to be processed and made searchable. The key to storage reliability is to make copies of the content on multiple machines. The file system will transparently write multiple copies of data, check its integrity and automatically make copies if a machine or hard drive is no longer available.

From the moment the document is crawled and enters our system to the time it becomes searchable, it is stored on our redundant file system. Every service (worker in the diagram below) that processes, transforms, extracts data or indexes a document reads data from the reliable file system, perform its function, then write data back to the reliable file system. Services are all reporting to a central monitoring system which distributes tasks and coordinates the activities of every service. If a service (or a server) disappears, the central monitor detects it and assigns the task to another available service. Better yet, if a machine disappears, the system is capable of bringing in a machine from the reserve to take its place.

We use virtualization in various places, but in our experience, Search is an application that benefits from running on physical servers. It is not uncommon to lose 20%-40% of the raw server performance when Search services are run in virtual machines (VMs). This is due to the fact that those services are CPU and I/O -heavy application.

To address scale, we design our solutions so that process can be parallelized. This means that most tasks can be distributed among many services (and servers) and be completed independently from one another.

Conclusion
This section presented some of the principles that are integrated in the NTENT Search Platform. The principles articulated above are not unique to our platform and can be found in several open source middleware frameworks such as Hadoop, Kafka, or Cassandra. In each case, the power of the platform is based on a set of core components: First, the system is distributed across multiple services, activity is coordinated by a central service, but the actual activity can be carried on independently, in parallel. Secondly, the system can scale linearly (add machines to support more activity or more content). Thirdly, the system is self-healing, meaning it can detect problems and repeat failed operations.

At NTENT, we build a platform that integrates those principles from the ground up, starting with hardware and network, all the way up to back-end and front-end service so that the service is always available and responsive.