Thread Synchronization Strategies

9 min readFeb 3, 2018

Abstract

Thread synchronization is required to maintain data consistency in multi-threaded/parallel programs. As locks are used to prevent race conditions, this article highlights a few optimization techniques/strategies for effective lock utilization.

In this article, we first look at why multi-threading is required and the problem of a race condition. We then proceed to highlight a few locking techniques/strategies, which include using the critical section to the minimum, spin locks, blocking the waiting threads, using more than one lock, using high-level APIs, using a lock manager and using read/write locks. A discussion about their advantages and disadvantages is also highlighted.

Background

A single-threaded process (especially GUI-based) will sometimes appear unresponsive to the user if this single thread is performing a complex operation (such as printing a large text file, performing a processor intensive calculation, or attempting to connect to a remote server thousands of miles away.) But with the advent of Intel Hyper-Threading technology [ref. W1b] and multi-core processors, this drawback can be addressed with the help of the multi-threading/parallel programming paradigm.

The multi-threading/parallel programming paradigm makes it possible for the primary thread of execution to spawn additional secondary threads in the background. Each thread (primary or secondary) becomes a unique path of execution in the process and has concurrent access to all shared data.

Although multi-threading increases the performance of the application, a new problem — called the race condition — arises due to it.

Problem Statement

A race condition is an undesirable situation that occurs when a device or system attempts to perform two or more operations at the same time (either by using Hyper-Threading technology or on a multi-core processor), but because of the nature of the system, the operations must be done in a proper sequence so as to be done correctly.

For example, let us assume that two threads, say T1 and T2, each want to increment the value of a shared integer variable, say i (whose initial value is 0) by 1. Also, let us assume that thread T1 uses a register, r_A, with an initial value of 0 and thread T2 uses a register, r_B, with an initial value of 0.

Ideally, we wish the following sequence of operations (pseudo-machine instructions) to take place:

The final value of i is 2. But due to the race condition problem, the following operations may take place instead:

Again, assuming initial value of i is 0. Please note that instructions placed side-by-side are executed in parallel (either by using Hyper-Threading technology or on a multi-core processor)

The final value of i is 1, instead of the expected value of 2.

This occurred because both the threads read and modified the shared variable ‘i’ at the same time, leading to invalid data. In other words, the shared memory resource was not accessed in a controlled manner due to lack of synchronization between the two threads.

Locking techniques and strategies

The key to synchronization is the concept of the monitor (also called a semaphore.) A monitor is an object that is used as a mutually exclusive lock, or a mutex. Only one thread can own a monitor at a given time. When a thread acquires a lock, it is said to have entered the monitor (or a critical section.) All other threads attempting to enter the locked monitor will be suspended (or any other action may take place depending on the operating system) until the first thread exits the monitor. These other threads are said to be waiting for the monitor.

Use critical section to the minimum

Ideally, a thread should execute in its critical section for a minimum required set of instructions.

For example, instead of the following sequence of steps:

Preferably, consider re-structuring them to:

Spin locks

When a thread is in its critical section, any other process that tries to enter the critical section must loop continuously in the entry code. This type of semaphore is called a spin lock because the threads ‘spin’ while waiting for the lock.

A sample implementation of spin locks using POSIX.1c PThreads is given below.

The echoThis() function may be called by any thread to print some text to the standard output. This function ensures that only one thread can print the text to the standard output at a time.

In the above example, the main function initializes the spin lock echoLock to be used by all the threads in the same process. It then creates multiple threads, one per command line argument, to call the echoThis() function and displays the command line argument strings to the standard output. It does not matter how many threads call the echoThis() function simultaneously, as the function serves only one thread at a time.

Two runs of the above program are shown below.

The computer, on which this program was run, had the following configuration:

Processor: Intel(R) Pentium D CPU 2.80GHz and 2.80GHz
RAM: 1024 MB
OS: Knoppix 5.1 (with X Window System running a KDE desktop)
Thread model: POSIX
Compiler: gcc version 4.1.2 20061028 (prerelease) (Debian 4.1.1–19)

It is worth mentioning, that although the echoThis() function serves only one thread at a time, it, however, does not preserve the order in which the command-line arguments are printed. It is clear from the sample runs above.

Blocking the waiting threads

This method is similar to the busy waiting methods of spin locks. But instead of waiting for the lock, the thread blocks itself. The block operation places the thread into a waiting queue associated with the lock (also known as a turnstile in Solaris) and thread is switched to a waiting state. The control is then transferred to the CPU scheduler, which selects another thread to execute.

The above spinlock.cpp program, when implemented using this methodology, is shown below.

The only differences between spin locks and blocking locks are in the function and data type names; mutex instead of spin. For example:

Spin lock function: pthread_spin_lock()
Blocking lock function: pthread_mutex_lock()

Using more than one lock

Mutex locks tend to become inefficient as the critical section they are locking becomes longer. A simple solution is to split the critical section into smaller ones and provide a lock for each sub-section.

Here, a big critical section is protecting 3 mutually independent shared data/resources. Here, if one threads occupies the critical section, the other threads have to wait for it to exit.

It would be more efficient to provide a separate lock for each shared data/resource, so that at most 3 threads can enter their respective critical sections simultaneously.

3 critical sub-sections are protecting 3 mutually independent shared data/resources

Using high-level APIs

Although semaphores provide a convenient and effective mechanism for thread synchronization, their incorrect usage can still result in timing error that are difficult to detect, since these errors happen only if some particular execution sequence takes place, and these sequences do not always occur. This situation may be the result of honest programming errors by the programmer. High-level thread synchronization APIs present in Java (the synchronized keyword [ref. TB3]) and C# (the lock keyword [ref. TB2]) provide simple yet robust synchronization tools. This leads to easier programming and lesser programming errors.

A sample implementation in Java is shown below. The main() function uses the ThreadCreator class to create 4 sample threads. The ThreadCreator class in turn calls the class Printer's printThis() function, which prints its arguments, with each thread.

It is important to note that in the printThis() function, the threads sleep for some time. This enables the thread-scheduler to context-switch to another thread, introducing a race condition. But because the synchronized keyword is used with the printThis() function, only one thread can enter the function. Thus, the access to the printThis() function is said to be serialized.

Using the synchronized keyword in Java

A sample run of the above program is shown below.

Output with the synchronized keyword

Notice that the order in which the threads are created and the order in which the text is printed, need not be the same.

Now, if we code the program without using the synchronized keyword, i.e. if we change...

class Printer
{
    synchronized void printThis(Thread t, String text)
    {

… to, by removing the synchronized keyword…

class Printer
{
    void printThis(Thread t, String text)
    {

… then the function will no longer be synchronized. A sample output, without synchronization, is shown below.

All the threads enter the printThis() function simultaneously and race against each other to complete the function, mixing up their outputs.

Using a lock manager

This method is analogous to having a waiter at the table in the Dining Philosophers’ Problem [ref. TB1]. The lock manager delegates the mutex to one of lock-contending threads. This thread, upon exiting the critical section, returns the mutex back to the lock manager. The lock manager then selects another thread from the remaining threads and so on. The lock manager may maintain a queue of contending threads (either blocked or spinning) and may also order them according to some priority.

A simplified diagram of the working of a lock manager is shown below.

Using read/write locks

Read/write locks are similar to mutex locks, but they can be acquired for read-only or write-only purposes. A read-only lock can be set on a read/write lock simultaneously by one or more threads.

Only one thread is allowed to set a write-only lock on a read/write lock at a given instance of time; any other threads trying to acquire a read-only or write-only lock must wait. Also, any thread that tries to acquire a write-only lock on a read/write lock must wait until all the read-only locks on that read/write lock are released.

Key results and discussion

Using the critical section to the minimum definitely improves performance. It can and must be employed wherever possible.

Spin locks require busy waiting. The continual looping of the thread is a problem on single CPU computers because it wastes CPU cycles that some other thread might be able to use productively. The advantage of spin locks is that no context-switch and thread re-scheduling is required, because the thread is still running. In some cases, the CPU may spend more time performing context-switches than executing instructions. Thus, when locks are expected to be held for short times, spin locks are useful.

The blocking-waiting-threads strategy may be used instead of spin locks, but they suffer from thread context-switch overhead. So, this strategy may be used where longer critical sections are involved.

Splitting critical regions and using more locks can be advantageous; but care must be taken not to use too many locks, as the threads may spend too much time contending for the locks at each sub-section, which may degrade performance.

High-level APIs hide the specific synchronization implementation details, but generally use primitive synchronization tools in their inner implementations.

The lock-manager method is useful as it amicably addresses the synchronization problem. But maintaining another thread for the lock manager can be an overhead.

Read/write locks can lead to write-starvation, because as long at least one reading thread holds the lock, the writer thread will not be able to acquire the lock. A variant read/write locks, known as write-preferred locks, prevent any new readers from acquiring the lock if there is a writer queued and waiting for the lock. The advantage of read/write locks, of course, is the ability to read shared data simultaneously. POSIX.1c does not define read/write locks.

Conclusion

No locking strategy can be selected as the best one. It is up to the programmer to select the best strategy depending on his/her application. As the multi-threaded/parallel programming paradigm starts to become popular, the search for better synchronization tools will continue.

References

Web

[W1a] SIGCSE thoughts: Preparing Students for Ubiquitous Parallelism
[W1b] Intel Hyper-Threading technology
[W1c] Intel Multi-Core Community
[W1d] Intel Training
[W1e] Intel Threading Glossary
[W1f] Intel Threading Knowledge Base
[W2] ThinkingParallel.com
[W5] Differences — mutex vs. semaphore
[W7] Writing a Research Paper
[W8] How to avoid race conditions
[W9] POSIX.1c Thread (PThread) Tutorial
[W10] Using Spin Locks (Multithreaded Programming Guide) — Sun Microsystems

Textbooks

[TB1] Operating System Concepts (6th edition) by Silberschatz, Galvin, Gagne (2002, John Wiley & Sons, Inc.)
[TB2] C# and the .NET Platform (2nd edition) by Andrew Troelsen (2003, Apress, Berkeley, CA, USA)
[TB3] The Complete Reference Java 2 (5th edition) by Herbert Schildt (2002, Tata McGraw Hill)
[TB4] Unix System Programming Using C++ (1st impression) by Terrence Chan (2008, Pearson Education, Inc.)

This article was published as an entry to the Intel Threading Champion 2009 contest.