TTAS Spinlock with Yield

Last week, I implemented the Test-And-Test-And-Set (TTAS) spinlock in C++. The algorithm is simple and elegant, but it wasn’t very efficient under high contention.

Ideally, we would like a spinlock algorithm to have zero overhead when an additional thread is added. To do this, we need a throttling mechanism that decrease the amount of contention.

Yield Under Contention

Recall that CTtaslock performs two steps.

1. It repeatedly tries to test the state of the mutex to see if it is unlocked.

2. If the mutex is unlocked, perform a compare-and-swap (CAS) to commit to the variable. If the commit fails, it goes back to step 1.

It is not difficult to see that more threads only leads to more contention with this algorithm.

Now what if we implement a simple backoff algorithm to allow a thread to yield under high contention. The amount to yield would be based on the number of times a thread failed to acquire a lock.

// This algorithm is based on the boost smart_ptr's yield_k
// algorithm.
void yield( unsigned int k )
	if( k < 2)
	else if( k < 16)
		Sleep( 0 );
		Sleep( 1 );

This algorithm is used in boost smart_ptr yield_k. The number here are chosen arbitrarily based on my test, but the idea is clear. If the thread fails to acquire the lock more than twice, we perform Sleep(0). If we still fail, then we perform Sleep(1) for more aggressive yielding in case we are running into priority induced starvation.

TTAS with Yield Implementation

Here’s the implementation of CTTASLock with yield.

// atomic read is just a volatile read in C
inline LONG atomicRead(volatile LONG *mem) { return *mem; }

class CTtasLock
	// mutex class that holds the lock state
	class CTtasMutex
		CTtasMutex() : m_state(0) {};
		LONG m_state;
		LONG const static Locked = 1;
		LONG const static NotLocked = 0;

	// RAII constructor that acquire lock upon construction
	explicit CTtasLock(CTtasMutex&m) : m_mutex(m) { lock(); }

	// RAII destructor that performs the "scoped" lock behavior
	~CTtasLock() { unlock(); }

	// spin lock that performs the test-test-and-set algorithm
	void lock()
			// test until state is cleared
			unsigned int k = 0;
			while(atomicRead(&m_mutex.m_state) == CTtasMutex::Locked)
				yield(k); // yield algorithm to avoid high contention

			// test-and-set
				CTtasMutex::NotLocked) == CTtasMutex::NotLocked)
	// unlock the mutex with atomic set
	void unlock()
		InterlockedExchange(&m_mutex.m_state, CTtasMutex::NotLocked);

	void yield( unsigned int k )
		if( k < 2 ) { }
		else if( k < 8 ) { Sleep( 0 ); }
		else { Sleep( 1 ); }
	CTtasMutex &m_mutex;

Performance Under Contention

So how does this implementation perform under contention?

Fortunately, much better. 🙂

When yield is introduced, the TTAS lock scales much better.

Final Thoughts

When threads yield under high contention, the TTAS lock scales well.

The only drawback is that TTAS lock has no sense of fairness. A thread could be starved because of luck. But then again, fairness could be overrated.


The source and the data sheet can be downloaded here.

Tools: Visual Studio 2008, Boost 1.41, Window 7

One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s