Friday, January 26, 2018

5-Level Paging and 5-Level EPT

IA-32e mode is a mode of processor execution that extends the older 32-bit operation, known as legacy mode. Software can enter IA-32e mode with the following algorithm.
  1. Use the MOV CR instruction to set CR4.PAE[bit 5]. (Physical-address extension must be enabled to enter IA-32e mode.)
  2. Use the WRMSR instruction to set bit 8 (LME) of the IA32_EFER MSR (index C0000080H).
  3. Use the MOV CR instruction to load CR3 with the address of a PML4 table (see below).
  4. Use the MOV CR instruction to set CR0.PG[bit 31].
IA-32e paging does not use all 64 linear-address bits because processors limit the size of linear addresses. This limit is enumerated by the CPUID instruction. Specifically, CPUID.80000008H:EAX[bits 15:8] enumerates the number of linear-address bits (the maximum linear-address width) supported by the processor. Existing processors enumerate this value as 48.

Processors also limit the size of physical addresses and enumerate the limit using CPUID. CPUID.80000008H:EAX[bits 7:0] enumerates the number of physical-address bits supported by the processor, the maximum physical-address width. Existing processors have enumerated values up to 46. Software can use more than 32 physical- address bits only if physical-address extension has been enabled by setting CR4.PAE, bit 5 of control register CR4.

IA-32e mode comprises two sub-modes: compatibility mode and 64-bit mode. In compatibility mode, software uses 32-bit addresses, which the processor zero-extends to 64-bit linear addresses. In 64-bit mode, software uses 64-bit addresses directly.

VM entries and VM exits manipulate numerous processor registers that contain linear addresses. The transitions respect the processor’s linear-address width in a manner based on canonicality.

Most Intel 64 processors supporting VMX also support an additional layer of address translation called extended page tables (EPT).

VM entry can be configured to activate EPT for guest software. When EPT is active, the addresses used and produced by paging are not used as physical addresses to reference in memory. Instead, the processor interprets them as guest- physical addresses, and translates them to physical addresses in a manner determined by the VMM. (This translation from guest-physical to physical applies not only to the output of paging but also to the addresses that the processor uses to reference the guest paging structures.)

If the EPT translation process cannot translate a guest-physical address, it causes an EPT violation. (EPT violations may also occur when an access to a guest-physical address violates the permissions established by EPT for that guest-physical address.) An EPT violation is a VMX-specific exception, usually causing a VM exit.

Processors supporting the Intel 64 architecture typically use the CPUID instruction to enumerate to software specific processor functionality. Those processors that support 5-level paging enumerate that fact through a new feature flag as well as through changes in how related features are reported:
  • CPUID.(EAX=07H, ECX=0):ECX[bit 16] is a new feature flag that will enumerate basic support for 5-level paging. All older processors clear this bit. A processor will set this bit if and only if it supports 5-level paging.
  • As noted in Section 1.1, CPUID.80000008H:EAX[bits 15:8] enumerates the maximum linear-address width supported by the processor. All older processors that support Intel 64 architecture enumerated this value as 48. Processors that support 5-level paging will instead enumerate this value as 57.
  • As noted in Section 1.1, CPUID.80000008H:EAX[bits 7:0] enumerates the maximum physical-address width supported by the processor. Processors that support Intel 64 architecture have enumerated at most 46 for this value. Processors that support 5-level paging are expected to enumerate higher values, up to 52.
  • CPUID.(EAX=07H,ECX=0H):ECX.MAWAU[bits 21:17] is an existing field that enumerates the user MPX address-width adjust (MAWAU). This value specifies the number of linear-address bits above 48 on which the BNDLDX and BNDSTX instructions operate in 64-bit mode when CPL = 3.Older processors that support Intel® MPX enumerated 0 for this value. Processors that support 5-level paging may enumerate either 0 or 9, depending on configuration by system software. See Section 2.7 for more details on how BNDLDX and BNDSTX use MAWAU and how system software determines its value.
As IA-32e, processors with a linear-address width of 48 bits reserve linear-address bits 63:48 for future expansion. Linear addresses that use only bits 47:0 (because bits 63:48 are a sign-extension of bit 47) are called canonical.

Processors that support 5-level paging can translate 57-bit linear addresses when 5- level paging is enabled. But if software has enabled only 4-level paging, such a processor can translate only 48-bit linear addresses. This fact motivates the definition of two levels of canonicality.

Like 4-level paging, 5-level paging translates linear addresses using a hierarchy of in- memory paging structures. Because 5-level paging increases the linear-address width to 57 bits (from the 48 bits supported by 4-level paging), 5-level paging allows up to 128 PBytes of linear-address space to be accessed at any given time.

Also like 4-level paging, 5-level paging uses CR3 to locate the first paging-structure in the hierarchy. (CR3 has the same mode-specific format with 5-level paging as it does with 4-level paging.) The following items describe in more detail the changes that 5- level paging makes to the translation process.
• Translation begins by identifying a 4-KByte naturally aligned PML5 table. It is located at the physical address specified in bits 51:12 of CR3. A PML5 table comprises 512 64-bit entries (PML5Es). A PML5E is selected using the physical address defined as follows.
— Bits 51:12 are from CR3.
— Bits 11:3 are bits 56:48 of the linear address. — Bits 2:0 are all 0.
• Thenextstepofthetranslationprocessidentifiesa4-KBytenaturallyalignedPML4 table. It is located at the physical address specified in bits 51:12 of the PML5E (see Table 2-1). A PML4 table comprises 512 64-bit entries (PML4Es). A PML4E is selected using the physical address defined as follows.
— Bits 51:12 are from the PML5E.
— Bits 11:3 are bits 47:39 of the linear address. — Bits 2:0 are all 0.
As is normally the case when accessing a paging-structure entry, the memory type used to access the PML4E is based in part on the PCD and PWT bits in the PML5E.
Because a PML4E is identified using bits 56:39 of the linear address, it controls access to a 512-GByte region of the linear-address space.
  1. Certain fields in the VMCS correspond to registers that contain linear addresses.
VM entries confirm those fields contain values that are canonical. This checking is based on the linear-address width supported by the processor (e.g., is based on 57-bit canonicality if the processor supports 5-level paging). The following are the fields to which this applies.
• In the host-state area:
— The fields for the IA32_SYSENTER_EIP and IA32_SYSENTER_ESP MSRs. — The base-address fields for FS, GS, TR, GDTR, and IDTR.
• In the guest-state area:
— The fields for the IA32_SYSENTER_EIP and IA32_SYSENTER_ESP MSRs. — The base-address fields for FS, GS, TR, GDTR, and IDTR.
— The base-address field for LDTR (if LDTR will be usable).
— The field for the IA32_BNDCFGS MSR (if VM entry is loading that MSR).
A VM entry to 64-bit mode also performs a check on the RIP field in the guest-state area of the current VMCS. If the VM entry would result in 4-level paging, it checks that bits 63:48 of the guest RIP field are identical; if it would result in 5-level paging, that check is on bits 63:57.

VM exits load the state of certain registers from the host-state area of the VMCS. Some of these registers contain linear addresses. Each VM exit ensures that the value of each of the following registers is canonical: the IA32_SYSENTER_EIP and IA32_SYSENTER_ESP MSRs; and the base addresses for FS, GS, TR, GDTR, and IDTR. How this is done depends on whether the processor supports 5-level paging.
  • If the processor does not support 5-level paging, bits 47:0 of the register are loaded from the field in the host-state area; the value of bit 47 is then sign- extended into bits 63:48 of the register.
  • If the processor does support 5-level paging, bits 56:0 of the register are loaded from the field in the host-state area; the value of bit 56 is then sign-extended into bits 63:57 of the register.
Again, there is a special case for LDTR. LDTR is always unusable after a VM exit. Its base address may be loaded with an undefined value. This undefined value is always 48-bit canonical on processors that do not support 5-level paging and is always 57-bit canonical on processors that do support 5-level paging.

5-level EPT is a new mode for EPT. As its name suggests, it will translate guest- physical addresses by traversing a 5-level hierarchy of EPT paging structures. Because the process is otherwise unmodified, 5-level paging extends the processor’s guest- physical-address width to 57 bits. (The additional 9 bits are used to select an entry from the fifth level of the

Like 4-level EPT, 5-level EPT translates guest-physical addresses using a hierarchy of in-memory paging structures. Because 5-level EPT increases the guest-physical- address width to 57 bits (from the 48 bits supported by 4-level EPT), 5-level EPT allows up to 128 PBytes of guest-physical-address space to be accessed at any given time.
The following items describe in more detail the changes that 5-level EPT makes to the translation process.
• Translation begins by identifying a 4-KByte naturally aligned EPT PML5 table. It is located at the physical address specified in bits 51:12 of EPTP. An EPT PML5 table comprises 512 64-bit entries (EPT PML5Es). An EPT PML5E is selected using the physical address defined as follows.
— Bits 63:52 are all 0.
— Bits 51:12 are from EPTP.
— Bits 11:3 are bits 56:48 of the guest-physical address. — Bits 2:0 are all 0.

No comments:

Post a Comment

Compiler Optimizations

Peephole optimization In  compiler theory , peephole optimization is a kind of  optimization  performed over a very small set of instruct...