Ballerina Concurrency Model and Non-Blocking I/O

Anjana Fernando
Ballerina Swan Lake Tech Blog
9 min readFeb 22, 2021

--

The Ballerina programming language has a unique concurrency model that promotes efficient resource usage and provides an intuitive programming model for users. Its concurrency model is also critical to the non-blocking I/O support provided with the communication protocols. In this article, we will take an in-depth look into Ballerina’s concurrency support, and see how the non-blocking I/O operations are implemented on top of this.

Let’s first take a look at the general concurrency constructs provided by an operating system and how they work, and then move onto the concurrency primitives provided by Ballerina.

Multitasking with Processes and Threads

The operating system (OS) provides us with the general constructs of processes and threads. A single program or a process can contain one or more threads. A thread of execution is scheduled on a specific CPU core by the OS scheduler. So from the OS, a thread is the most primitive execution construct for our programs. There can of course be more threads than the CPU cores we have in our machine. The execution of these threads happens in a preemptive manner using the OS scheduler, where currently running threads in CPU cores are interrupted in time intervals to timeshare with the other threads in the system. This provides us with the illusion that all processes and threads are executed concurrently. We humans do not detect this pausing and resuming of execution; especially in interactive applications, we see everything executing at the same time.

A thread pausing its work on a CPU core and resuming another thread to execute there is called a context switch. This is somewhat of an expensive operation since the current thread’s full state needs to be stored in memory, and the other thread’s earlier stored state has to be read in and restored to start its execution. A higher number of unwanted context switches is not good for performance, so generally, we try to keep the context switches low. The best way to do this is to make current CPU-bound threads equal to the number of CPU cores. In this way, the busiest threads will mostly be scheduled in a CPU core and will rarely be preempted for other low-priority threads in the system.

So if the ideal scenario is limiting the number of threads to the CPU core count, how can we scale our executions for concurrent execution? This can be done by having a user-space scheduler to switch between concurrent executions. A user-space scheduler generally has a lower overhead compared to the OS scheduler’s context switching. Ballerina takes on this approach when implementing its concurrency primitives.

Ballerina Concurrency with Strands

Ballerina implements a user-space scheduler and uses an execution construct known as a strand. A strand is a lightweight thread of execution, which is cooperatively multitasked. Cooperative multitasking is different from preemptive multitasking done in the OS scheduler as it relies on the running execution itself to yield its CPU time to another execution. So rather than it being forcefully stopped by the scheduler, the execution itself signals to the scheduler that it’s ready to give up its execution time and willingly hands over its OS-level thread to a different concurrent execution in the application. In Ballerina, there are certain implicit yield points that trigger this, such as communication between concurrent executions, sleep operations, and other library function calls that may signal a yield. This concurrency pattern is also known as coroutines.

A group of strands belongs to an OS thread. At any given moment, a single strand in its group is actively executing in the thread it belongs to. The strands in the same group cooperatively multitask in sharing the execution thread. In Ballerina code, a worker represents a single strand of execution in a function execution. Every function in Ballerina has one or more workers. The function will always have a default worker, and optionally more named workers. If there are no explicitly defined workers, the code in the function belongs to the default worker. All the workers in a function are concurrently executed, where each worker is assigned its own strand. The default worker in a function is executed in the same strand as the worker of the function it was called from.

Let’s take a look at some sample Ballerina code and see how its execution is modeled using workers and how strands are mapped in the runtime.

Listing 01

The Ballerina program execution above starts in the main function. Here, a new strand is created and the main function’s default worker starts its execution. The default worker of the main function is basically represented by its full function body. At line 5, it invokes the function printNumber. In this invocation, the current worker in the main function is suspended, and its strand moves into executing the default worker of the printNumber function. After the printNumber function is finished with its instructions, its worker is terminated, the main function’s default worker is made active again, and the strand moves onto executing from line 6 in the main function.

So this is a simple case where only a single strand is executing the workers in our Ballerina functions. Let’s see a scenario of having explicitly named workers in a function.

Listing 02

In this scenario, the function process has three workers. The default worker is defined using code lines 11 and 20–23. There are two additional named workers w1 and w2. A worker in Ballerina optionally returns a value, and if a return type is not mentioned, it returns an implicit nil — () value. The return type of a function is the return type of its default worker. In this execution, the strand of the default worker of the function process creates new strands for the execution of workers w1 and w2. When a new strand is created, in the default policy, it is added to the same group as its parent strand (the strand that created it). So by line 20, we have three active strands, one which is executing the default worker of the process function, and two other strands that execute workers w1 and w2.

A function returns when its default worker returns a value and terminates its execution. Any other named workers’ state does not affect this behavior. These other workers may be already terminated or can still be running when a function is returned. In this case, we explicitly wait for the termination and its return value of the workers using the wait action.

In a multiple worker scenario, communication between them can be done using a message passing mechanism. This is done using worker send (->) and receive (<-) actions.

Listing 03

Here, in line 14 we do a worker send operation, which sends the value result to the worker w2. In worker w2, at line 18, there is a corresponding worker receive operation to read in the value sent by worker w1. To reduce the chance of deadlocks, the compiler does static analysis of the code to see if every worker “send” is matched with a worker “receive” in the recipient worker.

Non-Blocking I/O

Ballerina’s concurrency model shines through in its use in non-blocking I/O operations. The coroutine-based execution allows I/O operations to not block execution threads, but rather suspend a function execution and resume its execution after its I/O operations are done. All of this is possible without the use of any obscure callback-based or reactive programming model. Let’s use an example to demonstrate this behavior.

Listing 04

Here, we only have the default worker in the main function, so there is only one strand executing. In this scenario, we are using an HTTP client to make a network request to a remote endpoint. At line 6, we invoke the get remote method, which does a non-blocking I/O request through HTTP. In this instance, the pending I/O operation is handed over to the OS, and the executing strand yields its executing thread. With the yielding, we effectively pause/suspend our worker execution, and the execution thread is released to be used by other strands. This behavior makes sure we optimally use OS threads by using lesser active threads, resulting in fewer context switches and more efficiency.

Ballerina keeps a separate thread pool for I/O operations that cannot be implemented in a non-blocking manner (some database connectors, etc.) since they will block the calling thread.

The non-blocking I/O scenario will be more apparent when we use multiple workers in our code. This is demonstrated below.

Listing 05

Here, in the main function, we have the default worker and two named workers, which results in three execution strands. By default, all the strands belong to the same group, thus a single execution thread is assigned to execute the strands. Let’s take a look at the ordering of instruction execution in this scenario.

  • Create strand s1 for the default worker.
  • Create strand s2 for worker w1.
  • Create strand s3 for worker w2.
  • Strand s1 is active and executes instructions in the default worker.
  • Worker w1 and w2 are still suspended since their strands do not have a free thread to execute their instructions — the available thread in the group is occupied by strand s1.
  • At line 16, the wait action is executed, yielding the execution of s1, thus suspending the default worker.
  • The execution thread is now free to schedule any other pending strands. Let’s assume strand s2 is scheduled, thus executing operations in worker w1.
  • Worker w1 executes a non-blocking I/O operation at line 7, which results in yielding its strand. This allows the strand s3 to execute and do its non-blocking I/O operation, which in turn yields s3 and suspends its worker.
  • At this point, all the workers are suspended, and the execution thread of the strand group is released. However, at this point, both the network calls have been made concurrently to the remote endpoints, and both workers are waiting for the same time until their results are communicated back.
  • When one of the I/O operations returns, the corresponding strand is again scheduled in a free execution thread, and its code is executed to return from the worker.

As we can see from the flow above, the I/O operations are executed concurrently without a single I/O operation blocking the execution thread until its result is received.

As we now know, by default, all the strands created from a calling function (parent) belong to the same group, thus they are all represented using a single execution thread. However, if we have CPU-bound operations that need to be executed concurrently in the workers rather than non-I/O operations, then these strands need to execute in their own execution thread. This can be done by directing the runtime to create a strand with its own strand group, and not to inherit the group from its parent strand. Let’s take a look at some sample code that does this.

Listing 06

In the code above, we have used the @strand annotation to provide the property that allows it to schedule its strand in any other thread that is available. The default value of this signals the strand to be scheduled in the same thread as the parent strand (same group as the parent strand). We can notice that in a multi-core CPU, with this specific annotation, our execution is faster since it will allocate two execution threads to execute our workers w1 and w2, thus executing them parallelly. Without this, the workers will be executed one after another, and for a non-I/O scenario, it wouldn’t make much sense to do the execution in this manner.

In Ballerina, the start action provides a convenient operation to perform a function invocation in its own dedicated strand and returns its default worker as a future construct. The following shows an example of this scenario.

Listing 07

In the example above, we implement a scenario similar to Listing 06. However, all the strands created for the workers belong to a single thread. This behavior again can be customized by using the @strand annotation in the following manner.

This makes sure the generated strands for the default worker execution of calc function invocations are generated in their own group, thus with their own execution thread.

Summary

In this article, we have gone through the concurrency model of Ballerina. We delved into some of the aspects that make it unique and how it provides a user-friendly experience and features for efficient computational and I/O-related operations.

For more information on Ballerina language and its features, refer to the Ballerina Learn pages.

--

--

Anjana Fernando
Ballerina Swan Lake Tech Blog

Software Engineer @ Google; ex: WSO2 | 🇱🇰 🇺🇸 | — @lafernando