Kagami (鏡) is the sacred mirror of Shinto shrines, the object that reflects the kami, the divine spirit, back at the observer. In Japanese folklore, a mirror reveals what is truly there, beneath the surface. In offensive security, the opposite is needed: during sleep, you want the mirror to reflect nothing. No code. No stack. No data. Just emptiness where a beacon used to be.
A modern beacon spends more than 95 percent of its life asleep. Sixty seconds of check-in time, fifty-nine minutes and forty seconds of waiting. During that waiting period, the beacon sits frozen in memory, and memory scanners have all the time they need to dismantle it. Every byte of code, every frame of the thread stack, every allocation on the heap is a potential indicator of compromise.
The best-known sleep obfuscation techniques (Ekko, Foliage, Zilean) are showing their age. EDRs have caught up: a simple code encryption pass no longer buys you a clean sleep. The stack still attributes you, the heap still betrays your config, and the protection-change telemetry lights up every correlation engine on the market. A new generation of sleep obfuscation has to address all three surfaces at once.
This post walks through the complete architecture of such a system. Three layers, three distinct problems, three solutions that only work when combined. I will show you what each layer closes, what it leaves open, and why skipping any single one collapses the entire evasion.
This is a theoretical deep dive. I will not release the source of my implant. But I will show enough of the mechanics, with partial code where necessary, to make the architecture reproducible by anyone who understands the Windows memory manager.
Part I: Why Sleep Is the Critical Moment
What actually happens when a beacon sleeps
When a beacon calls Sleep(60000) or any of its siblings, the following chain executes:
Sleep(60000)
├→ kernelbase!SleepEx
├→ ntdll!NtDelayExecution(FALSE, &interval)
└→ syscall → KiSystemCall64 → nt!NtDelayExecution
└→ thread state = Waiting:DelayExecution
The kernel places the thread in a wait state. Its register context is saved. Its stack is frozen in place, exactly as it was at the moment of the call. For the next sixty seconds, nothing executes in that thread. But everything is still visible: the code section, the stack, the heap, the thread environment block, the module list.
The three attack surfaces
A process in memory has three distinct regions that can attribute it to a malicious beacon:
Surface A, the code section. The .text of the implant contains all the interesting logic: the syscall resolver, the C2 routines, the encryption primitives, the beacon loop. It sits in a MEM_PRIVATE region (if the beacon is shellcode) or MEM_IMAGE (if it is an EXE or uses module stomping). Any memory scanner with YARA rules will find signature matches. Any scanner that knows the Moneta checks will flag a MEM_PRIVATE region with executable permissions as "Private Executable Memory" or "Unbacked Shellcode" with high confidence.
Surface B, the thread stack. Every thread has a stack. When a thread sleeps on NtWaitForSingleObject, its stack contains the chain of return addresses that led it there. A scanner walks this stack. The deepest frame points to your module, in non-signed private memory. That pointer alone is enough to attribute the beacon. Hunt-Sleeping-Beacons does exactly this: enumerates every thread in Waiting:DelayExecution and walks its stack looking for return addresses inside MEM_PRIVATE.
Surface C, the heap. The heap contains the runtime data: the C2 configuration (URL, User-Agent, sleep delay, jitter), session keys, buffered command output, captured credentials. Even if the code is encrypted, the heap remains readable. PE-sieve dumps the entire process. Every string, every allocated block, every piece of config is there in plain text.
A memory scan at any moment during sleep reads all three surfaces simultaneously. Each is sufficient, on its own, to burn the beacon. A real sleep obfuscation system has to close all three.
Why one layer is not enough
It is tempting to think of sleep obfuscation as a single technique. Encrypt the code, done. But that only closes Surface A. The stack still attributes you. The heap still betrays your config. A thorough scanner sees all three, and the weakest layer is always the one that burns you.
The three layers are not additive. They are multiplicative. Each one closes a surface the others cannot reach. Skipping any single layer collapses the entire evasion because the scanner simply pivots to the remaining open surface.
The rest of this post walks through each layer in order: what it does, what it closes, what it leaves open, and how the next layer picks up where the previous one stops.
Part II: Layer 1, Code Encryption
The principle
The first layer encrypts the .text section of the beacon before sleeping, and decrypts it after waking. While sleeping, the code is unreadable gibberish. A memory scanner that reads the region finds high-entropy bytes instead of x86 instructions.
In pseudocode:
VirtualProtect(.text, size, PAGE_READWRITE)
XOR/RC4/AES the content with a key
NtWaitForSingleObject(timer)
XOR/RC4/AES to decrypt (same key)
VirtualProtect(.text, size, PAGE_EXECUTE_READ)
This is what Ekko, Foliage, Zilean, and most first-generation sleep obfuscators implement. It is the foundational layer, and every serious implant needs it.
The real cost: kernel callbacks on every protection change
The naive code above makes four calls to NtProtectVirtualMemory per sleep cycle:
RX → RW (so we can write the XOR)
RW → NOACCESS (hide the pages during sleep)
NOACCESS → RW (on wake, reopen for XOR)
RW → RX (restore executable)
Every one of these calls triggers a kernel callback. Not a userland hook. A kernel callback. EDR drivers register for these callbacks through ObRegisterCallbacks and receive notifications directly from the kernel every time a memory protection changes. The EDR sees:
- The process handle
- The base address of the region
- The old protection and the new protection
- A captured stack trace of who called
NtProtectVirtualMemory
On top of that, Windows emits ETW events on the Microsoft-Windows-Kernel-Memory and Microsoft-Windows-Threat-Intelligence providers. EDR agents are subscribed to these from a kernel-mode ETW session. Even with a perfect indirect syscall that hides the userland call stack, the protection change itself is visible from the kernel side.
The pattern detectors built around this telemetry are specific and brutal:
- Elastic Security flags the sequence
RX → RW → RXon aMEM_PRIVATEregion within a short time window as "Memory Protection Flip-Flop" with high confidence - CrowdStrike Falcon correlates the protection change with the return address of the caller. If the return address points to a non-image region, the threat score jumps significantly
- Microsoft Defender for Endpoint classifies the
PAGE_EXECUTE_READ → PAGE_READWRITE → PAGE_EXECUTE_READpattern as "self-modifying code" by default
Four kernel callbacks per cycle. On a sleep of thirty seconds, that is eight callbacks per minute, or 480 per hour, on the same memory region. The correlation window closes, and the alert fires.
Who wakes up the sleeping beacon
A question most articles do not address. If the code is encrypted, what decrypts it? The encryption logic itself must live somewhere executable. This "something" is the wake-up mechanism, and it has to satisfy three constraints:
- It must be executable at wake time (since the beacon's main code is encrypted)
- It must contain the decryption routine
- It must be triggered by something outside the encrypted code (because encrypted code cannot wake itself up)
Different approaches solve this differently:
- Ekko uses a Timer Queue callback.
CreateTimerQueueTimerschedules a function to run inside the Windows thread pool. When the timer fires, a system worker thread calls into a ROP chain built from gadgets inside signed DLLs, which performs the decryption. - Foliage uses an APC queued through
NtQueueApcThread. When the beacon thread is alertable, the APC fires and executes the decryption ROP chain. - Waitable timer + APC routine: combines
NtCreateTimerwith an APC callback.
In every case, the wake-up mechanism is external to the encrypted code, because encrypted code cannot execute. This external trigger is always the residual, the part that cannot be hidden. It becomes the new target for the scanner.
What Layer 1 alone leaves open
With only Layer 1, several failures remain:
Failure 1: the stack still attributes you. Your code is encrypted, but the stack of your thread still contains return addresses that point to your module. The scanner does not need to disassemble. It just verifies that the address falls inside a MEM_PRIVATE region. It does. Detection.
Failure 2: the heap is naked. Config, keys, tokens, output buffers, all sit in plain text on the heap. Any dump reads them.
Failure 3: the wake-up trampoline is an indicator by itself. A timer queue callback that runs every thirty seconds, calling NtProtectVirtualMemory on the same region, is a textbook beacon signature.
Failure 4: the flip-flop telemetry. Four NtProtectVirtualMemory calls per cycle equals 480 per hour on the same region. No cryptography hides that from kernel-side observation.
Layer 1 is necessary. It is not sufficient. The next layer picks up the stack problem.
Part III: Layer 2, Thread Stack Spoofing
The principle
While the thread sleeps, rewrite the saved RBP and return addresses on its stack. Replace them with addresses that point to legitimate system functions. After the rewrite, a stack walk reads:
Frame 1: ntdll!NtWaitForSingleObject+0x14
Frame 2: kernelbase!WaitForSingleObjectEx+0x8E
Frame 3: kernel32!BaseThreadInitThunk+0x14
Frame 4: ntdll!RtlUserThreadStart+0x21
Every frame points inside a Microsoft-signed DLL loaded legitimately. None of them points into your module. The thread looks like a system worker waiting on a handle.
I covered the full theory of stack frame construction and the six EDR validation checks in the Kagura-StackWalker article. If you have not read it, the summary is this: you cannot just push random return addresses. Each address must pass RtlLookupFunctionEntry, the instruction before the return address must be a real CALL, the frame size must match the UNWIND_INFO recorded in the PE's .pdata section, and the whole chain must terminate at RtlUserThreadStart with a return address of zero.
Targeted resolution beats generic gadget hunting
Most public stack spoofers (CallStackMasker, generic .pdata gadget scanners) follow the same shortcut: scan every function in kernelbase/kernel32/ntdll, score them by unwind simplicity (low CountOfCodes, no FrameRegister, no handler), pick the smallest matches, and call them "gadgets." The fake chain ends up looking like:
kernelbase!RandomTinyHelper+0x12
kernel32!AnotherShortFunc+0x08
ntdll!ObscureWrapper+0x1A
This passes RtlVirtualUnwind's arithmetic, but the function names are nonsense in the context of a sleeping thread. A defender who builds a baseline of real sleep-chain frames sees garbage names that no real worker thread would ever produce. The spoof is geometrically valid and semantically wrong.
The Shinkiro approach is the opposite: do not search for gadgets. Resolve the actual functions that a real sleeping thread would have on its stack, and parse their real UNWIND_INFO at runtime.
A genuine thread sleeping on WaitForSingleObjectEx has this exact chain:
ntdll!NtWaitForSingleObject+0x14 ← the syscall return point
kernelbase!WaitForSingleObjectEx+0x5f
kernel32!BaseThreadInitThunk+0x20
ntdll!RtlUserThreadStart+0x2c
Each of those return addresses is the byte immediately after a specific CALL instruction inside a specific function. They are not interchangeable with random gadgets. A defender who reads frame names recognizes this chain instantly. A defender who replays the unwind sees frame sizes that match the real PDB-documented layouts.
Resolving the four CALL sites
The resolver walks each target function's body looking for the specific CALL that leads to the next link in the chain. Four x86 patterns cover essentially everything in modern Windows:
| Pattern | Bytes | Where it shows up |
|---|---|---|
call [rip+disp32] | FF 15 ?? ?? ?? ?? | IAT calls, e.g. WaitForSingleObjectEx → NtWaitForSingleObject |
call rel32 | E8 ?? ?? ?? ?? | Direct intra-module calls |
mov reg, [rip+disp32] then call | 48 8B ?? ?? ?? ?? ?? then E8 / FF Dx | CFG dispatch, e.g. RtlUserThreadStart → guard_dispatch_icall → BaseThreadInitThunk |
call reg | FF D0..D7 | Register-indirect, e.g. BaseThreadInitThunk → thread_entry |
Each match is verified by resolving the target pointer and confirming it lands in the expected module. No hardcoded offsets, no version-specific magic numbers. The resolver works on every Windows 10 and 11 build because it locates the CALL by structure, not by address.
Parsing the real UNWIND_INFO
Once the resolver has identified the four target return addresses, it parses each function's actual UNWIND_INFO from .pdata to compute the exact frame layout. This is the same logic Kagura applies, but at runtime, inside the implant, with no precomputed metadata:
// Skeleton, not the full implementation
typedef struct {
DWORD allocSize; // total bytes of stack reserved by the prolog
DWORD numPushed; // number of PUSH_NONVOL operations
BOOL usesFramePtr; // does the function set a frame pointer?
BYTE frameReg;
DWORD frameOffset;
} TS_FRAME_INFO;
BOOL ParseFrameLayout(PVOID moduleBase, PVOID funcAddr, TS_FRAME_INFO* info) {
RUNTIME_FUNCTION* rf = FindRuntimeFunction(moduleBase, funcAddr);
UNWIND_INFO* uwi = (UNWIND_INFO*)((BYTE*)moduleBase + rf->UnwindData);
UNWIND_CODE* codes = (UNWIND_CODE*)((BYTE*)uwi + sizeof(UNWIND_INFO));
for (DWORD i = 0; i < uwi->CountOfCodes; ) {
BYTE op = codes[i].UnwindOp_OpInfo & 0x0F;
BYTE opInfo = (codes[i].UnwindOp_OpInfo >> 4) & 0x0F;
switch (op) {
case UWOP_PUSH_NONVOL: info->numPushed++; i += 1; break;
case UWOP_ALLOC_SMALL: info->allocSize += opInfo*8 + 8; i += 1; break;
case UWOP_ALLOC_LARGE: /* read slot count, advance i+=2 or i+=3 */ break;
case UWOP_SET_FPREG: info->usesFramePtr = TRUE; /* ... */ i += 1; break;
/* SAVE_NONVOL, SAVE_XMM128, etc. */
default: i += 1; break;
}
}
return TRUE;
}
The output is the exact frame size, push count, and frame-pointer status of the real function as the compiler emitted it. The fake stack uses these numbers verbatim. When the EDR replays the unwind, the arithmetic matches because the layout was copied from the real function's metadata, not approximated from a generic gadget.
SET_FPREG: bridging the stub into the fake chain
One subtle problem remains: how does the unwinder transition from the spoofing stub's own frame into the fabricated chain? The trick is UWOP_SET_FPREG, declared in the stub's prolog as .setframe rbp, 0. During unwind, the operations replay in this order:
UWOP_ALLOC_SMALL: RSP += 0x40 (the stub's local allocation, ignored because SET_FPREG overrides it)UWOP_SET_FPREG: RSP = RBP, which the stub previously set to point into the fake-frame zoneUWOP_PUSH_NONVOL: pop RBP, RSP += 8- Return address =
[RSP]=WaitForSingleObjectEx+0x5f
From step 4 onwards, the unwinder is inside real kernelbase code, walking real .pdata. Every subsequent frame is geometrically and semantically perfect, because the unwinder is no longer being tricked, it is following genuine metadata that just happens to point to addresses we deliberately seeded in our fabricated stack zone.
The result, validated against WinDbg's kv command, is indistinguishable from a real sleeping thread.
The chicken and egg of the spoofer
To spoof the stack, code must execute. To encrypt code, code must execute. Who runs first?
The sequence is fixed:
1. Encrypt the heap (Layer 3, covered next)
2. Save the real stack context (so we can restore it on wake)
3. Write the fake stack frames on top of the real ones
4. Encrypt the .text (Layer 1)
5. Enter the wait
(sleep)
6. Wake-up mechanism fires
7. Decrypt the .text (Layer 1, reverse)
8. Restore the real stack (Layer 2, reverse)
9. Decrypt the heap (Layer 3, reverse)
10. Resume normal execution
The wake-up trampoline must live somewhere in memory that can execute while everything else is encrypted. It must also restore the saved stack context and call the decryption routines in order. This is the core residue of Layer 2. No matter how clever you are, something has to stay in clear and be callable externally.
What Layer 2 still leaves open
Failure 1: the heap is still naked. Same problem as before. Now that the code and the stack are both protected, the heap becomes the single largest remaining surface, and all the interesting data lives there.
Failure 2: the spoofer's own code is visible. The spoofing routine, the targeted resolver, the unwind parser, and the XOR primitive all need to execute during the sleep transition. Where does this code live, and what protects it?
Failure 3: MEM_PRIVATE does not change. Layer 2 hides your code from the stack, but the region itself is still MEM_PRIVATE with executable permissions. Moneta still flags it as "Private Executable Memory." The scanner just has less context to act on.
The last one, Failure 3, is the one most articles skip. Encrypting the content does not change the memory type. The next layer, Layer 3, does not fix it either. There is a fourth trick, and I will cover it in the final section.
Part IV: Layer 3, Heap Encryption
The principle
Before sleeping, encrypt every tracked heap allocation. On wake, decrypt. While sleeping, the heap contains only ciphertext. YARA rules looking for the C2 URL find nothing. String scrapers find nothing. A full process dump yields a blob of high-entropy bytes where the configuration used to be.
Simple on the surface. Three non-trivial constraints in practice.
Constraint 1: you cannot encrypt the process default heap
If the beacon is injected into explorer.exe or any host process, the default heap belongs to the host. It is used constantly by system threads. If you encrypt it, you break the host process. Crash. Game over.
The solution is a private heap created via RtlCreateHeap. Every allocation used by the beacon goes through a wrapper that allocates on this private heap and tracks the block in a linked list:
// Private heap creation
g_privateHeap = RtlCreateHeap(HEAP_GROWABLE, NULL, 0, 0, NULL, NULL);
// Tracked allocation
void* kg_alloc(size_t sz) {
void* p = RtlAllocateHeap(g_privateHeap, 0, sz);
kg_track(p, sz); // add to tracked-blocks list
return p;
}
The benefit is twofold. First, encrypting only the private heap leaves the host untouched. Second, a scanner walking the default heap of the host process finds nothing of yours. Another surface removed.
Constraint 2: where the encryption routine executes
This is the constraint that most implementations get wrong. If your XOR or AES routine lives in the beacon's .text, then during the encryption loop:
- Your crypto code must be in clear (otherwise it cannot execute)
- The thread stack during the encryption points into your module
- Moneta observes a
MEM_PRIVATEregion running tight loops over heap memory. This is, itself, a behavioral signature
The elegant solution is to use a crypto primitive inside a Microsoft-signed DLL. advapi32!SystemFunction032 is RC4 exposed from a signed module. bcrypt!BCryptEncrypt is AES exposed from another signed module. When you call these functions, the encryption runs from MEM_IMAGE memory, not yours.
Wrap the call in a call stack spoof from Layer 2, and during encryption the EDR sees:
advapi32!SystemFunction032+0x??
kernelbase!SomeGadget+0x?? ← fake frame
kernel32!BaseThreadInitThunk+0x14 ← fake frame
ntdll!RtlUserThreadStart+0x21 ← terminator
No frame of the stack points to your beacon. The encryption looks like legitimate use of a system crypto primitive.
Constraint 3: ephemeral keys and clean free
The encryption key is itself a secret. Store it in a global variable and it lives in your BSS segment in clear while sleeping. That is a key-recovery vector for anyone who snapshots the process at any point.
Two rules fix this:
Rule A: derive a fresh key every sleep cycle. Source of entropy: __rdtsc(), optionally mixed with QueryPerformanceCounter(). The key exists from the moment encryption begins until decryption completes, then it is zeroed.
// Fresh 16-byte key per cycle
for (int i = 0; i < 16; i++)
key[i] = (BYTE)(__rdtsc() >> (i * 4));
Rule B: zero with volatile to defeat the compiler. A plain memset(key, 0, 16) followed by no further reads is an optimization target. The compiler will eliminate it because the bytes are never read again. The fix:
// Volatile zero, survives optimization
for (int i = 0; i < 16; i++)
((volatile BYTE*)key)[i] = 0;
A process snapshot taken during sleep finds no key. The key simply does not exist in memory outside of the encryption and decryption windows, which are a few microseconds long.
Same principle applies to free. A freed block sits in the private heap's free list with its contents intact until it is reallocated. If you free a buffer containing a credential and then sleep, the credential survives in the freed block. Volatile-zero the block before calling RtlFreeHeap and the residue disappears.
What Layer 3 still leaves open
After three layers, here is what a memory scan during sleep sees:
- Code: ciphertext, high entropy, unrecognizable (Layer 1)
- Stack: legitimate-looking chain of system frames (Layer 2)
- Heap: ciphertext, no readable strings, no config (Layer 3)
But the region types have not changed. The beacon's .text is still MEM_PRIVATE + RX even if its contents are encrypted. Moneta still flags "Private Executable Memory." The content is opaque but the region itself remains suspicious.
The wake-up trampoline is still a residue. It must stay in clear. It is small and signed-looking if you used ROP through advapi32, but it is there.
Every protection change, if you used Layer 1 with NtProtectVirtualMemory, still triggers kernel callbacks.
Three layers have closed the content-based detections. But memory-type detections and behavioral detections remain. This is where the dual-mapping technique comes in.
Part V: The Fourth Trick, Dual Mapping
The unsolvable contradiction
Layer 1 as described above relies on NtProtectVirtualMemory to switch between executable and writable, because XOR encryption needs write access and normal execution needs execute access. These two permissions cannot coexist on the same page. Or can they?
The contradiction is actually a constraint of a single view. One virtual address range with one set of permissions. If you had two views, each with different permissions, both pointing at the same physical memory, the problem disappears.
This is exactly what the Windows memory manager lets you do.
The primitive
A section object in Windows, created with NtCreateSection, can be mapped into a process multiple times. Each mapping is a view, and each view has its own set of Page Table Entries with its own permission bits. All views of the same section share the same physical pages.
Physical RAM
+--------------+
| beacon code |
| (plaintext |
| or cipher) |
+--------------+
^ ^
| same | same
| PFN | PFN
View_RW ---+ +--- View_RX
(PAGE_READWRITE) (PAGE_EXECUTE_READ)
random addr original .text addr
Any write to View_RW lands on the same physical page that View_RX reads from. There is no Copy-on-Write, because the section is pagefile-backed (FileHandle = NULL at creation) and both views use compatible protections (not WRITECOPY). The write is instantly visible through View_RX.
This is not a new Windows feature. Multiple views of the same section are the mechanism behind shared memory, inter-process communication, and Peruns Fart-style ntdll unhooking (where two views of ntdll.dll are mapped, one of them naturally unhooked). The innovation in the context of sleep obfuscation is the specific application: map View_RX at the exact virtual address the beacon's .text was originally loaded at.
The workflow
At beacon startup, before the first sleep, the remap runs once:
Step 1: NtCreateSection(PAGE_EXECUTE_READWRITE, SEC_COMMIT,
size = beaconTextSize, file = NULL)
Step 2: NtMapViewOfSection(section, addr = NULL, PAGE_READWRITE)
→ View_RW at random address
Step 3: memcpy(View_RW, beacon.text, size) ← copy live code
into the section pages
Step 4: NtFreeVirtualMemory(beacon_allocation, MEM_RELEASE)
→ original .text and everything around it is released
Step 5: NtMapViewOfSection(section, addr = original_text_addr,
PAGE_EXECUTE_READ)
→ View_RX at the EXACT address the .text used to be
Step 6: reallocate and restore the surrounding regions (PE headers,
.data, .rdata) from a prior backup
After this sequence, the beacon has two views of the same physical pages. View_RX sits at the exact address the code was originally linked against, so every internal pointer, every relocation, every vtable, every function pointer still resolves correctly. View_RW sits at a random address that only the sleep routine knows about.
The sleep loop becomes trivial
With the dual-mapping in place, the actual sleep cycle collapses to:
void ss_sleep(DWORD sleepMs) {
// Generate a fresh 16-byte key
BYTE key[16];
for (int i = 0; i < 16; i++) key[i] = (BYTE)(__rdtsc() >> (i * 4));
// Encrypt via View_RW (PAGE_READWRITE)
// Instantly visible as ciphertext through View_RX (PAGE_EXECUTE_READ)
BYTE* p = (BYTE*)g_viewRW;
for (size_t i = 0; i < g_textSize; i++) p[i] ^= key[i & 0xF];
// Wait (indirect syscall via matched gadget, fake RBP chain)
LARGE_INTEGER t;
t.QuadPart = -(LONGLONG)sleepMs * 10000LL;
NtWaitForSingleObject(g_hSleepEvent, FALSE, &t);
// Decrypt (XOR is symmetric)
for (size_t i = 0; i < g_textSize; i++) p[i] ^= key[i & 0xF];
// Volatile zero of the key
for (int i = 0; i < 16; i++) ((volatile BYTE*)key)[i] = 0;
}
Zero calls to NtProtectVirtualMemory per sleep cycle. No kernel callbacks on protection changes. No ETW protection-modification events. No flip-flop pattern. The View_RX permissions never change. Only the bytes underneath flip between plaintext and ciphertext.
Bonus: MEM_MAPPED instead of MEM_PRIVATE
Pages created through NtMapViewOfSection are of type MEM_MAPPED, not MEM_PRIVATE. This is a significant OPSEC improvement:
| Detection | MEM_PRIVATE + RX | MEM_MAPPED + RX |
|---|---|---|
| Moneta "Unbacked Shellcode" | Flagged | Not triggered by type |
| Elastic "Shellcode Thread" | High confidence | Not triggered |
| MDE "Anomalous Executable Memory" | Classified suspicious | Classified low severity |
| SentinelOne "Suspicious Memory Region" | Detected | Not detected in default configs |
MEM_MAPPED + RX is common in every Windows process: memory-mapped files, IPC regions, shared memory between processes. The beacon's executable region, once dual-mapped, looks exactly like any legitimate section-backed mapping.
The tricky part: self-contained remap
Step 4 of the workflow releases the beacon's own allocation. If the remap function is inside that allocation, the instant NtFreeVirtualMemory succeeds, the code executing it disappears. Crash.
The solution is to copy the remap function into a separate RWX buffer and execute it from there:
// Measure the function size via a sentinel symbol
size_t funcSize = (size_t)((BYTE*)&_DoRemapEnd - (BYTE*)&_DoRemap);
// Allocate a buffer outside the beacon's main allocation
void* funcBuf = VirtualAlloc(NULL, funcSize,
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
memcpy(funcBuf, (void*)&_DoRemap, funcSize);
// Execute from the separate buffer
typedef NTSTATUS (*RemapFn)(REMAP_CTX*);
RemapFn fn = (RemapFn)funcBuf;
NTSTATUS st = fn(ctx);
The buffer survives the release. The function copied inside it does too. But it has to be written under strict rules:
- No references to global variables, because globals live in
.datawhich is about to be freed. Every[rip + global]becomes a dangling pointer - No
memcpyormemset, because these resolve to RIP-relative calls into the freed.text. Byte-by-byte copies withvolatilepointers replace them - No debugging macros, no logger calls, no runtime dependencies
- All the ntdll pointers it needs (
NtCreateSection,NtMapViewOfSection,NtAllocateVirtualMemory, etc.) must be passed in through a context pointer, not resolved from globals
The discipline is similar to writing position-independent shellcode, because for a few dozen microseconds, the function is exactly that.
The critical-zone sentinel
Not all of the beacon's .text can be encrypted during sleep. The sleep routine itself, the indirect syscall wrappers, the call stack spoofing stubs, all have to remain executable while the rest of the code is ciphertext. If they were encrypted, the beacon could not even perform NtWaitForSingleObject to enter the sleep.
The trick is to split .text in two by placing a sentinel symbol at the boundary:
__attribute__((noinline))
void SleepSentinel(void) { asm volatile("ret\n"); }
Link order is arranged so that every offensive module (agent, commander, crypto, C2) sits before the sentinel, and every infrastructure module (syscall wrappers, stack spoofing, sleep logic) sits after:
[ Encryptable zone | SleepSentinel | Critical zone ]
[ Agent, Commander, Crypt,... | | Syscalls, CallSpoof, ]
[ | | WaitMask, SleepLogic ]
[ encrypted during sleep | | stays in clear ]
[ 80 to 90 percent of .text | | 10 to 20 percent ]
The sleep routine computes the encryptable size as &SleepSentinel - g_textBase, and only encrypts up to that point. The critical zone remains plaintext and serves as the permanent wake-up trampoline. No timer queue, no APC, no ROP chain. The beacon's own thread performs the encryption, waits, and performs the decryption, all from code that was never encrypted.
The complete sleep cycle, end to end
One diagram, the full theoretical wiring of a sleep cycle once the dual-mapping is in place. Entry, dark phase, exit. No source code, just the order of operations.
+--------------------------------------------------------------------+
| SLEEP ENTRY (active thread) |
+--------------------------------------------------------------------+
Beacon main loop calls Sleep(ms)
|
v
[1] Generate ephemeral 16-byte key
- RtlGenRandom + __rdtsc mix for entropy
- Key lives only in a stack local
|
v
[2] Encrypt the heap (private heap only)
- Iterate tracked allocations
- Call advapi32!SystemFunction032 (RC4)
wrapped in CallStack Spoofing
===> crypto runs from MEM_IMAGE signed DLL
stack shows advapi32 -> ntdll, not the beacon
|
v
[3] Spoof the thread stack
- Save real RBP/RSP context to a hidden slot
- Write fake frames pointing to:
ntdll!NtWaitForSingleObject+0x14
kernelbase!WaitForSingleObjectEx+0x5f
kernel32!BaseThreadInitThunk+0x20
ntdll!RtlUserThreadStart+0x2c
- SET_FPREG bridges the spoof stub into the real
.pdata chain so RtlVirtualUnwind walks cleanly
|
v
[4] Encrypt the .text via View_RW (PAGE_READWRITE)
for i in 0..encryptable_size:
View_RW[i] ^= key[i & 0xF]
===> instantly visible as ciphertext through View_RX
because both views share the same physical pages
zero NtProtectVirtualMemory calls involved
|
v
[5] Issue the wait
NtWaitForSingleObject(sleepEvent, FALSE, timeout)
via Shinkiro indirect syscall:
- matched syscall;ret gadget at +0x12 of the real stub
- 3 fake RBP-linked frames built before the syscall
|
v
+--------------------------------------------------------------------+
| SLEEP DARK PHASE (thread waiting) |
+--------------------------------------------------------------------+
==========================================================
=== .text body : ciphertext (View_RX, MEM_MAPPED) ===
=== private heap : ciphertext (RC4) ===
=== thread stack : legitimate sleep chain ===
=== region type : MEM_MAPPED + RX (no Unbacked SC) ===
=== key : zeroed before wait ===
=== protections : ZERO NtProtectVirtualMemory calls ===
==========================================================
Memory scanner observation during this window:
- YARA on .text : no match (ciphertext)
- Moneta : MEM_MAPPED, no Unbacked alert
- Hunt-Sleeping : stack walks to RtlUserThreadStart
- PE-sieve dump : heap blocks are RC4 ciphertext
- ETW protect-flip : no events emitted
- Stack-walk EDR : every frame in a signed module
|
v
+--------------------------------------------------------------------+
| SLEEP EXIT (wait returns) |
+--------------------------------------------------------------------+
[6] Wait returns (timeout reached or event signaled)
|
v
[7] Decrypt the .text via View_RW (XOR symmetric, same key)
===> code is executable again through View_RX
|
v
[8] Restore the real thread stack from saved context
- RBP/RSP back to the pre-spoof values
- Fake frames overwritten (zeroed)
|
v
[9] Decrypt the heap (SystemFunction032 again, RC4 symmetric)
Same CallStack Spoofing wrapper
|
v
[10] Volatile-zero the key
for i in 0..15: ((volatile BYTE*)key)[i] = 0
===> defeats the optimizer that would otherwise
remove the dead store
|
v
Back to main loop -- the beacon is fully reconstituted
in memory as if nothing happened. Ready for the next
C2 check-in.
Five layers of evasion stacked on a single execution path: dual-mapped section for the code, targeted-resolution stack spoofing for the thread, signed-DLL crypto wrapped in call stack spoofing for the heap, ephemeral keys with volatile zero, and a matched-gadget indirect syscall for the wait itself. No external trampoline. No timer queue. No APC. The thread that goes to sleep is the same thread that wakes up, and during the sleep window every observable surface of the beacon is either encrypted, spoofed, or backed by a signed module.
Part VI: The Full Picture
What each layer closes, what remains
| Detection vector | Layer that closes it |
|---|---|
| YARA on .text (static content) | Layer 1 (code encryption) |
| Moneta "Private Executable Memory" by content | Layer 1 (content is ciphertext) |
| Moneta "Unbacked Shellcode" by memory type | Dual mapping (MEM_PRIVATE becomes MEM_MAPPED) |
| Hunt-Sleeping-Beacons stack walk | Layer 2 (stack spoofing) |
| CrowdStrike unwind validation | Layer 2 (targeted resolution + real UNWIND_INFO) |
| Elastic RtlVirtualUnwind check | Layer 2 (SET_FPREG bridge into real .pdata chain) |
| Protection change flip-flop telemetry | Dual mapping (zero NtProtectVirtualMemory calls) |
| ETW Microsoft-Windows-Kernel-Memory | Dual mapping (no protection changes emitted) |
| YARA on heap strings (URL, config) | Layer 3 (heap encryption) |
| PE-sieve process dump of heap | Layer 3 (heap encryption) |
| Key recovery from memory snapshot | Ephemeral keys + volatile zero |
| Default-heap fingerprint of beacon data | Private heap (RtlCreateHeap) |
What still remains visible
Honesty matters here. The following signals are still detectable:
The critical zone. Between 10 and 30 kilobytes of MEM_PRIVATE + RX code remains in clear after the sentinel. It contains the syscall wrappers, the call stack spoofing stubs, and the sleep routine itself. Moneta still flags this small residue as "Private Executable Memory." The mitigation requires a custom linker script that places the critical zone in a 64KB-aligned section suitable for a second dual-mapping, with the associated globals relocated to the heap. That is future work.
Behavioral patterns. The beacon wakes up, makes a DNS query, issues an HTTP request, sleeps again. This rhythm is detectable independently of all the memory-level protections. Modern defense is moving away from content inspection and toward behavioral correlation. No amount of encryption hides an obvious periodic callback.
The wake-up micro-window. For a few microseconds between XOR and NtWaitForSingleObject, and again between NtWaitForSingleObject returning and the decryption XOR, the full state of the beacon is in clear. An EDR that triggers memory scans on specific events (thread alert, protection change, APC queue) can, in theory, land inside this window. The probability is low but not zero.
Full MEM_MAPPED auditing. A theoretically possible detection correlates every MEM_MAPPED region with its backing file. A pagefile-backed section without any corresponding file on disk is distinguishable from a legitimate memory-mapped file. I have not seen an EDR product implement this correlation in real time, likely because it is expensive. It is on the horizon.
Closing thoughts
A beacon that bypasses CrowdStrike Falcon, Microsoft Defender for Endpoint, SentinelOne, and Elastic Security during sleep is not the product of a single technique. It is the intersection of three layers that together close every content-based detection surface, plus a fourth trick that closes the memory-type and behavioral telemetry that the three layers leave exposed.
If you skip Layer 1, your code is a YARA target. If you skip Layer 2, Hunt-Sleeping-Beacons finds you in ten seconds. If you skip Layer 3, your C2 URL sits in plain text on the heap for anyone to dump. If you skip the dual-mapping, your MEM_PRIVATE + RX region and your 480 NtProtectVirtualMemory calls per hour are both independent high-confidence signals.
This full sleep obfuscation system is now integrated into Kagemusha, my custom C2 framework built from scratch in C and x86-64 assembly. With Kagemusha, the beacon sleeps undetected under Elastic Security with full detection rules enabled, CrowdStrike Falcon at Prevention level 3, Microsoft Defender for Endpoint, and SentinelOne. The architecture described in this post is not theoretical anymore. It runs in production on real engagements.
The tooling behind this architecture (Shinkirō for matched-gadget indirect syscalls, Kagura for stack frame metadata, the dual-mapping remap, the private heap with signed-DLL encryption) came out of two years of studying C2 frameworks and the last eight months spent grinding through implant and malware development. I built it because off-the-shelf beacons burn on every top-tier engagement. I will not release the source, because that would accelerate the exact detection research I am trying to stay ahead of. But the architecture is not a secret. Every idea in this post is implementable by anyone who reads the Windows memory-manager documentation carefully and has patience for assembly-level detail.
The mirror reflects nothing during sleep. That is the goal. Three layers of encryption, a dual-mapped section, a signed crypto primitive, a private heap, an ephemeral key, a sentinel-bounded critical zone, and an indirect-syscall engine underneath. Subtract any one, and the reflection returns.
Find me on LinkedIn // Enenra
References
- Microsoft,
NtCreateSectionandNtMapViewOfSection, MSDN - Microsoft, Section Objects and Views, Windows Internals 7th ed., Russinovich et al.
- Microsoft,
ObRegisterCallbackskernel memory callbacks, MSDN - Upping the Ante: Detecting In-Memory Threats with Kernel Call Stacks, Elastic Security Labs
- Doubling Down: ETW Callstacks, Elastic Security Labs
- Moneta, Forrest Orr, Live Memory Usermode IOC Scanner
- PE-sieve, hasherezade, Process Memory Scanner
- Hunt-Sleeping-Beacons, theflink
- Ekko, Cracked5pider, Sleep Obfuscation via Timer Queue
- Foliage, Sleep Obfuscation via APC
- Zilean, Sleep Obfuscation via Thread Hijacking
- AceLdr, Kyle Avery, Cobalt Strike UDRL with Sleep Mask
- DeathSleep, CONTEXT manipulation sleep obfuscation
- Kagura-StackWalker: The Stack Is a Dance, Enenra, 2026
- Shinkirō: Matched-Gadget Indirect Syscalls, Enenra, 2026