Spin-Wait vs. Condition Variable

Recently at work, I ran into an interesting threading problem.

I have two threads – A and B, running concurrently. Thread A needs a piece of data from Thread B. Normally, Thread B would just pass a message to Thread A. But in this scenario, every millisecond counts. Thread A needs the data A.S.A.P!

In theory, Thread B should have the result within couple milliseconds. Since the wait is so short, the synchronous approach is feasible. I would block Thread A until Thread B gets the data. But should I use a condition varialbe, or should I just spin-wait (assuming that I have a multi-core processor)?

This is very much a real life threading problem that even expert programmer encounters. (And I am no expert. :???:)

Speed Comparison

I expect spin-wait to have lower latency than condition variable. But by how much?

It has been awhile since I ran any performance test. So I cooked up some test code to get a sense of the latency.

Here’s the spin-wait code.

long global_spin_var = 0;
void spin_thread()
{
	while(1)
	{
		// memory barrier for visibility semantics
		_ReadWriteBarrier();

		// spin wait
		if(global_spin_var == 1){ break; }
	}
}
void time_spin_thread()
{
	// ... set up timing code here

	// update the global variable with cmpxchg
	InterlockedExchange(&global_spin_var,0);

	thread t1(bind(&spin_thread));
	// wait for a second for thread start, sloppy but good enough
	Sleep(1000);
	t1.join();
}

Here’s the condition variable code.

long global_condition_wait_var = 0;
condition_variable global_condition;
mutex global_mutex;

void condition_wait_thread()
{
	mutex::scoped_lock lock(m_mutex);
	while(global_condition_wait_var == 0)
		global_condition.wait(lock);
}

void time_condition_wait_thread()
{
	// ... set up timing code here

	// update the global variable with cmpxchg
	InterlockedExchange(&global_condition_wait_var,0);

	thread t2(bind(&condition_wait_thread));

	// wait for a second for thread start, sloppy but good enough
	Sleep(1000);

	global_condition.notify_all();
	t2.join();
}

It turns out that latency of condition variable is higher, but not absurdly high.

Spin-wait’s latency is about half of condition variable. This is a reasonable trade-off for spending a bunch of CPU cycles spinning.

Latency comparison between spin-wait and condition variable

Final Thoughts

So which approach did I end up using? Neither.

Apparently Thread A doesn’t really need the “right” answer from Thread B. It just needs “some” answer to make things look good.

Another lesson from the real world – it doesn’t matter if it works, as long as it looks good. 🙄

Source

The source and the data sheet can be downloaded here.

Tools: Visual Studio 2008, Boost 1.41, Window 7, Intel I5-750 (quad core)

Const Reference To Temporary Is Useless

C++ is always full of surprises. This week, I learned that C++ has (yet another) special feature that allows the lifetime of a temporary (rvalue) to be lengthen on the stack if it is bound to a reference-to-const.

It means that the following example is legal C++. In this example, the temporary i’s lifetime is extended across its original scope.

int f()
{
	int i = 0;
	return i;
}
void main()
{
	int const &cr = f(); // lifespan extended
	int &cr = f(); // illegal (crash) because f() does not return lvalue
}

This is supposed to minimize the number of copies performs.

More Ammo To Blow Your Leg Away

You would think that this feature prevents the temporary from being copied. Apparently not so.

If you run the following program on VC9.0 and VC10.0, obj destructor is called twice, implies that it was copied. [sidenote: gcc 4.3 did not perform the copy]

#include <iostream>
class obj
{
public:
	~obj() { std::cout <<"destroyed" << std::endl; }
};
void main()
{
	obj const &rco = obj();
}

This is because the standard allows the temporary to be fully copied, and renders the reference-to-const declaration useless.

Here’s the C++ standard’s guideline in section 8.5.3.

If the initializer expression is an rvalue, with T2 a class type, and “cv1 T1” is reference-compatible with “cv2 T2,” the reference is bound in one of the following ways (the choice is implementation defined):

The reference is bound to the object represented by the rvalue or to a subobject with that object

A temporary of type “cv1 T2” is created, and a constructor is called to copy the entire rvalue object into the temporary. This reference is bound to the temporary or to a subobject with the temporary.

No Better Than RVO

Dave Abraham’s copy ellision example demonstrates how smart modern compilers have become. Return Value Optimization works well in both named and unnamed temporaries, and compose nicely.

So why bother adding reference-to-const if it works just as well without it?

Final Thoughts

C++0x (C++0a?) is supposed to address this problem by enforcing the reference-to-const to always bind to the rvalue, instead of its copy. But if a copy can be elided, I expect RVO would have done it as well.