Markdown to HTML

AuroBreeze Blog

A tiny, fast Markdown blog for GitHub Pages.

Investigating issues in implementing the minimal U mode and questions regarding the xv6 implementation

Trampoline Placement in .text

Why does xv6 choose to place trampsec in .text? Can't it be mapped normally? Why are the mapped trampoline and uservec at the same address? Won't this overwrite code?

First, we need to understand that transitioning from U mode to S mode involves a page table switch. When transitioning from U to S mode, the PC pointer directs to data stored in stvec—namely uservec. However, the page table remains the user page table at this point. If the code within uservec points to a kernel address, an error occurs due to the lack of mapping.

The solution involves mapping a tiny, specialized piece of code to both the user page table and the kernel page table, using the same virtual address—what we refer to as the trampoline.

To map the trampoline, mappages must be used for mapping. However, it requires 4096-byte address alignment. Therefore, we cannot use .text to replace trampsec. Instead, we can place .text first during linking, align it, then place trampsec, and align it again to achieve proper mapping.

Another issue is the mapping order. In the xv6 mapping, trampoline is mapped last:

kvmmap(kpgtbl, TRAMPOLINE, (uint64)trampoline, PGSIZE, PTE_R | PTE_X);

where TRAMPOLINE is defined as:

#define TRAMPOLINE (MAXVA - PGSIZE)

The address for writing data into the trampoline is li a0, TRAPFRAME, where TRAPFRAME is defined as:

#define TRAPFRAME (TRAMPOLINE - PGSIZE)

Therefore, no overwriting issue occurs.

In xv6, the mapping of TRAPFRAME and TRAMPOLINE (user page table) is performed through the proc_pagetable function.

Virtual Address Space
      +----------------------------+ <--- MAXVA
      |                            |
      |   Trampoline Page (Code)   | <--- PC pointer runs here (reading instructions)
      |   (uservec resides here)   |
      |                            |
      +----------------------------+ <--- TRAMPOLINE (Base Address)
      |                            |
      |   Trapframe Page (Data)    | <--- a0 register points here
      |   (used to store ra, sp, etc.)        | <--- sd instruction writes data here
      |                            |
      +----------------------------+ <--- TRAPFRAME (base address)
      |                            |
      |           ...              |

Kernel Page Table and Trampoline Frame

Why does the kernel page table only map the trampoline and not the trampoline frame?

In xv6, uservec first saves all register data to the trampoline frame before writing the kernel page table to the SATP to switch the page table. In userret, the user page table is first passed in and written to the SATP to switch to the user page table, after which register data is restored from the trampoline frame.

Therefore, there is no need to map the tramframe in the kernel page table.

Tramframe Kernel Data Storage

Why does the trammframe need to store kernel data such as satp, sp, and trap in xv6?

Because a trampoline is essentially a "transfer station" containing both the data required to enter S mode and the data needed to enter U mode.

For instance, when transitioning from S mode to U mode for the first time, it is necessary to save the kernel page table address, the interrupt handler address, and the corresponding CPU ID and SP.

When transitioning from U mode to S mode for the first time, the aforementioned data must be retrieved from the trampoline frame and restored to the registers. Simultaneously, the current program counter (PC) value must be saved (as previously noted, transitioning from U mode to S mode directly modifies the PC value).

Alright, another problem arises: the PC has been instantly altered. How do I preserve the original PC that was executing in U mode?

Now I'm going to bring out the "original legal text."

When a trap is taken into S-mode, sepc is written with the virtual address of the instruction that was interrupted or that encountered the exception. Otherwise, sepc is never written by the implementation, though it may be explicitly written by software.

The original text in the SEPC section of the Privileged Architecture Manual states that when entering S mode, the SEPC register is written. Therefore, we can use the value of SEPC to record the EPC.

Uservec and stvec Calculation

What is this? Why is it written this way? Can't uservec just write directly to stvec?

// trap.c -> void usertrapret(void)
// send syscalls, interrupts, and exceptions to uservec in trampoline.S
uint64 trampoline_uservec = TRAMPOLINE + (uservec - trampoline);
w_stvec(trampoline_uservec);

Note that in .S files, code addresses are written from low to high. Since the trampoline is written first, its address is lower. Therefore, by calculating the offset of uservec within the trampoline in this manner, we locate the mapped uservec for execution.

Since trampolines are simultaneously mapped in both the user page table and kernel page table with identical addresses, and user page tables cannot execute kernel-level uservec functions, the mapped uservec functions can only be accessed through trampolines. Consequently, the handler function is placed within the trampoline. The position-independent calculation method trampoline + (uservec - trampoline) is used to locate the uservec function.

sfence.vma Usage in Uservec

Why does uservec require two refreshes, specifically sfence.vma zero, zero?

sfence.vma serves two purposes: first, to force the CPU to write data from the write buffer to memory; second, to force the clearing of address mappings from the TLB.

Therefore, the first flush primarily aims to write all data from the buffer to memory. The TLB flush here is purely a side effect.

The second flush is specifically for refreshing the TLB, ensuring the old address mappings are no longer used.

Why does the first refresh operation need to write the data from the buffer to memory?

The core issue lies in the MMU being an "outsider." The MMU is an independent hardware unit that does not participate in the CPU core's internal pipeline. It typically lacks the ability to peek into the CPU core's private Store Buffer. Therefore, instructions like sfence are required to "push" data from the CPU's private domain into the public domain visible to the MMU.

Page table switching involves more than merely updating register data.

The moment the csrw satp instruction completes, the CPU immediately fetches the next instruction. However, our current state is:

  • The current PC points to the uservec entry in the user page table, and we are in the process of switching page tables. The next action is fetching the following instruction. At this critical moment, we must use the new page table to perform the virtual-to-physical address translation. If the initial flush hasn't occurred to write data to memory, and the CPU has already begun fetching the next instruction to access data that hasn't been written to memory yet, the MMU will not see the corresponding data. This results in triggering an exception.