Overriding Thread Context

While goofing around with Win32 threads, I bumped into a function called SetThreadContext. The documentation is light. The function is described with one sentence.

Sets the context for the specified thread.

My first thought was “There is no way you can override a thread’s context. That’s just crazy.”.

I was wrong. This function is a hell lot of fun.

The CONTEXT structure

The most interesting part of SetThreadContext is the input argument. It takes a CONTEXT object that is briefly mentioned in MSDN.

The structure, according to the documentation, is tied to specific processor architecture. The actual structure can only be found through WinNT.h.

So digging into WinNT.h, I found the definition of this really cool structure.

typedef struct _CONTEXT {

// ... many things here

    DWORD   Edi;
    DWORD   Esi;
    DWORD   Ebx;
    DWORD   Edx;
    DWORD   Ecx;
    DWORD   Eax;
    DWORD   Ebp;
    DWORD   Eip;

// ... many things here

} CONTEXT;

The CONTEXT data structure contains all the main x86 registers plus other debug registers for a thread. These registers can be changed with a simple function call.

Fun Hacking

Putting my black hat on, what if I …

  1. Spin up a thread, and let it run for a bit.
  2. Hijack its thread context and force it to do something malicious.
  3. When that malicious task is completed, revert its thread context to its original state.
  4. The thread seamlessly recovers its original task, and the hijacker leaves absolutely no trace.

Below is such a program (minus the malicious part). A thread is spun up to calculate pi. Somewhere along the way, its instruction pointer is hijacked to calculate e. When the e is calculated, the thread is restored to calculate pi.

#include <windows.h>
#include <iostream>

// A simple E approximator found online.
double calculateE(int n)
{
    double value = 0;
    double factorial = 1;
    for(int i = 0; i <= n; i++)
    {
        for(int j = 1; j <= i; j++)
        {
            factorial *= j;
        }
        value += 1 / factorial;
        factorial = 1;
    }
    return value;
}

// A simple Pi approximator found online.
double calculatePi()
{
	std::cout << "Calculating Pi" << std::endl;
	double retPi = 0;
	for (LONGLONG denom = 1; denom <= 300000000; denom += 2)
	{
		if ((denom - 1) % 4)
			retPi -= (4.0 / denom);
		else
			retPi += (4.0 / denom);

	}
	return retPi;
}

CONTEXT originalContext; // thread's original context
HANDLE threadHandle;     // handle of the victim thread
volatile int intrusionDone; // global flag to indicate intrusion is completed

void Intrusion()
{
	__asm
	{
		push ebp
		mov ebp,esp
	}
	std::cout << "Running intrusion" << std::endl;
	std::cout << "E is " << calculateE(105) << std::endl;
	std::cout << "Completed intrusion" << std::endl;
	intrusionDone = 1;
	Sleep(1000);

	__asm
	{
		pop ebp
		mov ebp,esp
	}
}

DWORD WINAPI ThreadProc( LPVOID lpParam )
{
	double pi = calculatePi();
	std::cout << "Pi is " << pi << std::endl;
	return 0;
}

int main()
{
	intrusionDone = 0;
	threadHandle = CreateThread(
		NULL,
		0,
		ThreadProc,
		NULL,
		CREATE_SUSPENDED,
		NULL);

	// Let the thread run a little bit, then suspend it.
	ResumeThread(threadHandle);
	Sleep(100);
	SuspendThread(threadHandle);

	// Save the thread's context object
	originalContext.ContextFlags = CONTEXT_ALL;
	GetThreadContext(threadHandle, &originalContext);

	// Get the thread's context object again, but overwrite it this time.
	CONTEXT c;
	ZeroMemory(&c, sizeof(c));
	c.ContextFlags = CONTEXT_ALL;
	GetThreadContext(threadHandle, &c);

	// overwrite the instruction pointer to call Instruction directly.
	c.Eip = reinterpret_cast<DWORD>(Intrusion);

	// Update the thread context, and let it run again
	SetThreadContext(threadHandle, &c);
	ResumeThread(threadHandle);

	// Let it run for a bit and then suspend the thread.
	while(0 == intrusionDone) { Sleep(1); }
	SuspendThread(threadHandle);

	// Now revert the thread to its original context and let it finish its
	// job
	SetThreadContext(threadHandle, &originalContext);
	ResumeThread(threadHandle);

	WaitForSingleObject(threadHandle, INFINITE);

	return 0;
}
Output of the program

	Calculating Pi
	Running intrusion
	E is 2.71828
	Completed intrusion
	Pi is 3.14159

Thoughts

After some internet searches, apparently this technique is part of the method for Dll Injection.

Tools: Visual Studio 2008, Window 7

Choppy GStreamer

Couple weeks ago, I was trying to enhance an application with GStreamer to stream live audio (8k sample rate Alaw and 16-bit PCM) over UDP.

GStreamer’s pipeline framework is extremely powerful, and it is also poorly documented. I was able to get GStreamer to read from an UDPSrc without much effort, but the audio playback was choppy and would stop playing after a minute.

The choppy playback did not occur when the source was from a hard drive. So I concluded that the audio decoder in GStreamer was susceptible to bursty nature of the audio stream, and some buffering mechanism is necessary to ensure a smooth data rate.

Searching around the GStreamer developer forum, I found few poor souls posted similar issues with no solution.

So after several days of trial and error, I worked out a combo that solved my problem.

Lots of Queues

Disclaimer: I am a n00b with GStreamer, so my solution may be wrong or sub-optimal.

Here’s my GStreamer pipeline for 16 bit PCM audio at 8ksps streaming through port 50379 over UDP.

gst-launch.exe -v udpsrc port=50379 ! queue leaky=downstream max-size-buffers=10 ! audio/x-raw-int, rate=(int)8000, channels=(int)1 ! queue leaky=downstream max-size-buffers=10 ! audiorate ! queue leaky=downstream max-size-buffers=10 ! audiopanorama panorama=0.00 ! queue leaky=downstream max-size-buffers=10 ! autoaudiosink

For Alaw audio, the command is the following.

gst-launch.exe -v udpsrc port=50379 ! queue leaky=downstream max-size-buffers=10 ! audio/x-alaw, rate=(int)8000, channels=(int)1 ! alawdec ! queue leaky=downstream max-size-buffers=10 ! audiorate ! queue leaky=downstream max-size-buffers=10 ! audiopanorama panorama=0.00 ! queue leaky=downstream max-size-buffers=10 ! autoaudiosink

Result

As you can see, I am using queue to smooth out every pipeline level.

In addition to that, I am also buffering audio in my application and smooth out the UDP packet rate with a Windows Multimedia Timers.

So the sender (my application) and receiver (GStreamer) are both buffering to ensure the smoothest data rate.

The result is excellent. The many layers of queuing allows a (fairly) high tolerance of bursting data. I am keeping this setup for now until I find something better.