context/rnd: paging/stack/heap/virtualization

This commit is contained in:
steveej 2017-09-21 21:53:48 +02:00
parent 12b71b3744
commit 83c5540a42
8 changed files with 972 additions and 382 deletions

View file

@ -1,6 +1,32 @@
% // vim: set ft=tex:
\chapter{Topic Refinement}
% TODO: is this chapter required?
\chapter{Refined Research Questions}
\section{Software Tests}
% TODO: describe that tests are mostly semantics as opposed to static checks being mostly syntactical and technical
% TODO: Are they necessary in addition to static checks to cover the well-known use-cases and edge-cases.
% TODO: example?
\section{Definition Of Additional Analysis Rules To Extend Safety Checks}
% TODO: How can Business Logical
% Examples:
% TLB needs to be reset on Task Change
% Registers need to be
\subsection{Paging}
Setting up and maintaining the paging-structure, as well as allocating physical memory for the virtual pages is a complex task in the \gls{os}.
Developing this part of the \gls{os} is error-prone, and is not well-supported by mainstream \glspl{proglang}.
\section{Software Fault Isolation}
% TODO: content from \cite{Balasubramanian2017}
% TODO Which language items help with managing memory?
% TODO How generic can the memory allocators be written?
% TODO Guarantees to be statically checked:
% TODO * Control access to duplicates in page tables
% TODO * Tasks can't access unallocated (physical) memory
% TODO * Tasks can't access other tasks memory
\chapter{System Programming Conventions}
\label{rnd::sysprog-conventions}
@ -17,7 +43,7 @@ PUSH takes value operand which is to be pushed onto the stack.
The address in RSP moves towards numerically lower addresses with every PUSH instruction, which stores a new data entry on top.
The order is to first change the RSP and then copy the value at its new address.
POP takes a storage reference operand - \gls{CPU} register or memory address.
POP takes a storage reference operand - \gls{cpu} register or memory address.
It works in the opposite direction to PUSH.
First, consuming the top-most data entry and storing it on the operand location, then moving the RSP address towards the numerically higher RBP address.
@ -37,13 +63,14 @@ For example, PUSHing some value onto the stack before the end of the function wo
\paragraph{Called Procedure Setup} \emph{not} with ENTER and LEAVE.
When a procedure is called the stack is set up with the following four components
When a procedure is called, the stack is set up with the \gls{sf}, the four components listed in \cref{lst:amd64-stack-frame-components}.
\cite[p.~48]{AMD64Vol1}:
\begin{listing}[h]
\begin{enumerate}
\item{%
Parameters passed to the called procedure (created by the calling procedure). \\
\textit{Only if parameters don't fit the \gls{CPU} registers}
\textit{Only if parameters don't fit the \gls{cpu} registers}
}
\item{%
Return address (created by the CALL instruction). \\
@ -55,10 +82,13 @@ For example, PUSHing some value onto the stack before the end of the function wo
}
\item{%
Local variables used by the called procedure. \\
\textit{This includes the variables passed via \gls{CPU} registers}
\textit{This includes the variables passed via \gls{cpu} registers}
}
\end{enumerate}
only necessary when there aren't enough \gls{CPU} to pass the parameters.
\caption{\glsentrytext{amd64} Stack-Frame Components}
\label{lst:amd64-stack-frame-components}
\end{listing}
only necessary when there aren't enough \gls{cpu} to pass the parameters.
Item 3 is only necessary when
The \gls{amd64} manual also lists ENTER and LEAVE as instructions to \textit{"provide support for procedure calls, and are mainly used in high-level languages."}\cite[p.~48]{AMD64Vol1}.
@ -70,75 +100,13 @@ These instruction groups within the called procedure are called prologue and epi
\subsection{Full Procedure Call Example}
\label{context::introduction::hw-supported-mm::procedure-call-example}
This section combines the separate categories into one complete example that shows how the \gls{stack} is used by various \gls{CPU} instructions to perform procedure calls.
This section combines the separate categories into one complete example that shows how the \gls{stack} is used by various \gls{cpu} instructions to perform procedure calls.
The following code samples are extracted from a disassembled binary which was originally created using \gls{Rust}.
The Assembler that's shown uses Intel Mnemonic, which generally operates from right to left.
For example, \mint{nasm}{mov a, b} copies b to a.
\cref{code::context::examples::func-callee} shows the \gls{Rust} source code of the function \textit{sum}.
\cref{code::context::examples::func-callee-rust} shows the \gls{Rust} source code of the function \textit{sum}.
\section{4-Level Paging Hierarchy on \glsentrytext{amd64}}
\label{rnd::sysprog-conventions::paging-amd64}
On \gls{amd64} "a four-level page-translation data structure is provided to allow long-mode operating systems to translate a 64-Bit virtual-address space into a 52-Bit physical-address space."\cite[p.~18]{AMD64Vol2}.
This allows the system to only hold the \textit{PML4} table, the which is currently referenced by the \textit{Page Map Base Register (CR3)}, available in main memory.
\cref{fig:virtual-addr-transl} shows the 64-Bit virtual address composition on \gls{amd64}, which uses four-levels of page tables.
Counterintuitively the page-tables are not called level-\textit{n}-page-table, but the levels received distinct names in \citetitle{AMD64Vol2}.
The most-significant Bits labelled as \textit{Sign Extend} are not used for addressing purposes, but must adhere the canonical address form and simply repeat the value of the most-significant implemented Bit \cite[p.~130]{AMD64Vol2}.
The least significant Bits represent the offset within the physical page.
The four groups in between are used to index the page-table at their respective level.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/Virtual-to-Physical-Address-Translation-Long-Mode.png}
\caption{Virtual to Physical Address in Long Mode\cite{AMD64Vol2}}
\label{fig:virtual-addr-transl}
\end{figure}
\subsubsection{Translation Scheme 4 KiB and 2 MiB Pages}
The \gls{amd64} architecture allows configuring the page-size, two of which will be introduced in this section.
\cref{tab:page-transl-vaddr-composition} displays the virtual address composition for the 4KiB and 2MiB page-size modes on \gls{amd64}.
The direction from top to bottom in the table corresponds to most significant to least significant - left to right - in the virtual address.
The \textit{sign extension} Bits cannot be used for actual information but act as a reservation for future architectural changes.
\begin{table}
\begin{tabular}{l | c | c}
Description & Bits in 4 KiB Pages & Bits in 2 MiB Pages \\
\hline
Sign Extend & 12 & 12 \\
Page-Map-Level-4 Offeset & 9 & 9 \\
Page-Directory-Pointer Offeset & 9 & 9 \\
Page-Directory Offeset & 9 & 9 \\
Page-Table Offeset & 9 & - \\
Physical Page Offset & 9 & 21 \\
\end{tabular}
\caption{Paging on \gls{amd64}: Virtual Address Composition 4KiB/2MiB pagesizes}
\label{tab:page-transl-vaddr-composition}
\end{table}
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/amd64-4kb-page-translation-long-mode}
\caption{4-Kbyte Page Translation—Long Mode\cite{AMD64Vol2}}
\label{fig:4kb-page-transl}
\end{figure}
\cref{fig:4kb-page-transl} shows the detailed virtual address composition for 4 KiB pages, using four levels of page-tables.
It uses four sets of 9-Bit indices in the virtual address, one per hierarchy level, followed by the 9 Bit page-internal offset.
An alternative approach is displayed in \cref{fig:2mb-page-transl}, using 2 MiB sized pages.
It uses three sets of 9-Bit indices for the page-tables, and a 21-Bit page-internal offset.
Increasing the page-size improves speed and memory-usage and decreases the granularity.
In this specific example the hierarchy is reduced by one level of page-tables.
This reduces the amount of storage required for the page-tables in overall and causes the lookup algorithm to finish faster.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/amd64-2mb-page-translation-long-mode}
\caption{2-Mbyte Page Translation—Long Mode\cite{AMD64Vol2}}
\label{fig:2mb-page-transl}
\end{figure}
The other supported page sizes, 4 MiB and 1 GiB, as well as intermixing page sizes through the different levels don't add new insight into the mechanism and don't need to be detailed here.
% \subsubsection{Top-Level Page Table Self-Reference}
% \subsubsection{Caching Lookups}
@ -149,27 +117,30 @@ The other supported page sizes, 4 MiB and 1 GiB, as well as intermixing page siz
\begin{listing}[htb]
\tikzset{/minted/basename=callee-c}
\begin{minted}[autogobble,linenos,breaklines=true]{rust}
TODO
\end{minted}
\caption{The called function in \gls{Rust}}
\label{code::context::examples::func-callee-c}
\label{code::context::examples::func-callee-rust}
\end{listing}
\cref{code::context::examples::func-call} shows a snippet snippet of the calling function.
\cref{code::context::examples::func-call-asm} shows a snippet snippet of the calling function.
It stores the arguments within the registers according to the System V X86\_64 calling convention. %TODO REFERENCE
The caller doesn't alter the stack-frame pointer (RBP) or the stack pointer (RSP) registers before call, hence the called function must restore these if it alters them.
\begin{listing}
\begin{minted}[escapeinside=??,highlightlines={},autogobble,linenos,breaklines=true]{rust}
TODO
\end{minted}
\caption{Procedure Call Example: Caller Rust}
\label{code::context::examples::func-call}
\label{code::context::examples::func-call-asm}
\end{listing}
\begin{listing}
\begin{minted}[escapeinside=??,highlightlines={},autogobble,linenos,breaklines=true]{nasm}
\end{minted}
TODO
\caption{Procedure Call Example: Caller Assembly}
\label{code::context::examples::func-call}
\label{code::context::examples::func-call-rust}
\end{listing}
% \balloon{comment}{
@ -250,18 +221,110 @@ $74f7: ret ; return to the caller, following the add
\caption{Memory Layout Throughout The Procedure Call Steps}
\label{fig:proc-call-example-mem}
\end{figure}
\FloatBarrier
\section{4-Level Paging Hierarchy on \glsentrytext{amd64}}
\label{rnd::sysprog-conventions::paging-amd64}
On \gls{amd64} "a four-level page-translation data structure is provided to allow long-mode operating systems to translate a 64-Bit virtual-address space into a 52-Bit physical-address space."\cite[p.~18]{AMD64Vol2}.
This allows the system to only hold the \textit{PML4} table, the which is currently referenced by the \textit{Page Map Base Register (CR3)}, available in main memory.
\cref{fig:virtual-addr-transl} shows the 64-Bit virtual address composition on \gls{amd64}, which uses four-levels of page tables.
Counterintuitively the page-tables are not called level-\textit{n}-page-table, but the levels received distinct names in \citetitle{AMD64Vol2}.
The most-significant Bits labelled as \textit{Sign Extend} are not used for addressing purposes, but must adhere the canonical address form and simply repeat the value of the most-significant implemented Bit \cite[p.~130]{AMD64Vol2}.
The least significant Bits represent the offset within the physical page.
The four groups in between are used to index the page-table at their respective level.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/Virtual-to-Physical-Address-Translation-Long-Mode.png}
\caption{Virtual to Physical Address in Long Mode\cite{AMD64Vol2}}
\label{fig:virtual-addr-transl}
\end{figure}
\subsubsection{Translation Scheme 4 KiB and 2 MiB Pages}
The \gls{amd64} architecture allows configuring the page-size, two of which will be introduced in this section.
\cref{tab:page-transl-vaddr-composition} displays the virtual address composition for the 4KiB and 2MiB page-size modes on \gls{amd64}.
The direction from top to bottom in the table corresponds to most significant to least significant - left to right - in the virtual address.
The \textit{sign extension} Bits cannot be used for actual information but act as a reservation for future architectural changes.
\begin{table}
\begin{tabular}{l | c | c}
Description & Bits in 4 KiB Pages & Bits in 2 MiB Pages \\
\hline
Sign Extend & 12 & 12 \\
Page-Map-Level-4 Offeset & 9 & 9 \\
Page-Directory-Pointer Offeset & 9 & 9 \\
Page-Directory Offeset & 9 & 9 \\
Page-Table Offeset & 9 & - \\
Physical Page Offset & 9 & 21 \\
\end{tabular}
\caption{Paging on \gls{amd64}: Virtual Address Composition 4KiB/2MiB pagesizes}
\label{tab:page-transl-vaddr-composition}
\end{table}
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/amd64-4kb-page-translation-long-mode}
\caption{4-Kbyte Page Translation—Long Mode\cite{AMD64Vol2}}
\label{fig:4kb-page-transl}
\end{figure}
\cref{fig:4kb-page-transl} shows the detailed virtual address composition for 4 KiB pages, using four levels of page-tables.
It uses four sets of 9-Bit indices in the virtual address, one per hierarchy level, followed by the 9 Bit page-internal offset.
An alternative approach is displayed in \cref{fig:2mb-page-transl}, using 2 MiB sized pages.
It uses three sets of 9-Bit indices for the page-tables, and a 21-Bit page-internal offset.
Increasing the page-size improves speed and memory-usage and decreases the granularity.
In this specific example the hierarchy is reduced by one level of page-tables.
This reduces the amount of storage required for the page-tables in overall and causes the lookup algorithm to finish faster.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/amd64-2mb-page-translation-long-mode}
\caption{2-Mbyte Page Translation—Long Mode\cite{AMD64Vol2}}
\label{fig:2mb-page-transl}
\end{figure}
The other supported page sizes, 4 MiB and 1 GiB, as well as intermixing page sizes through the different levels don't add new insight into the mechanism and don't need to be detailed here.
\section{Interrupt Driven Preemptive Context Switches on \glsentrytext{amd64}}
\label{rnd::sysprog-conventions::ir-driven-preemptive-cs-amd64}
On \gls{amd64}, the \gls{CPU}'s interrupt mechanism does not switch the full context described previously, but only handles the registers that are necessary to successfully jump to the interrupt function: RFLAGS, RSP, RBP, RIP\footnote{Segment registers are neglected}.
On \gls{amd64}, the \gls{cpu}'s interrupt mechanism does not switch the full context described previously, but only handles the registers that are necessary to successfully jump to the interrupt function: RFLAGS, RSP, RBP, RIP\footnote{Segment registers are neglected}.
\subsection{Interrupts}
% TODO https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf p. 2848
\subsection{Context Content}
A description for \gls{amd64} is given in \cref{tab:task-minimum-context-registers}.
\begin{table}
\begin{tabularx}{\textwidth}{| c | X | X |}
\hline
\textbf{descriptive name} &
\textbf{register names on amd64} &
\textbf{description} \\
\hline
the instruction pointer register & RIP & address of the next instruction to be fetched \\
\hline
the stack pointer register & RSP & address of current position in stack \\
\hline
the flags register & RFLAGS & various attributes, e.g. the interrupt flag \\
\hline
all general-purpose registers & RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8R15 & arbitrary data \\
\hline
\end{tabularx}
\caption{Minimum Context Registers on amd64\cite[p.~28]{AMD64Vol2}}
\label{tab:task-minimum-context-registers}
\end{table}
\subsection{Storing The Context On The Stack}
In this scenario, the context is stored on the \gls{stack} of the function that is interrupted.
\Cref{fig:amd64-long-mode-interrupt-stac} pictures the \gls{stack} layout on interrupt entry.
In order to leverage an interrupt for a context switch, the interrupt function needs to replace these values on the \gls{stack} with values for the new context.
CS (Code-Segment) and SS (Stack-Segment) have no effect in \gls{amd64} 64-Bit mode\cite[p.~20]{AMD64Vol1} and can remain unchanged.
The \gls{OS} developer needs to know the exact address where on the \gls{stack} this data structure has been pushed by the \gls{CPU}, and must then manipulate these addresses directly.
The \gls{os} developer needs to know the exact address where on the \gls{stack} this data structure has been pushed by the \gls{cpu}, and must then manipulate these addresses directly.
This type of manipulation is inherently dangerous and can not be easily checked by the \gls{compiler}.
The function that handles the interrupt must then use the instruction \textit{iretq}\cite[p.~252]{AMD64Vol2}, to make the \gls{CPU} restore the partial context from the \gls{stack} and continue to function pointed to by the RIP.
The function that handles the interrupt must then use the instruction \textit{iretq}\cite[p.~252]{AMD64Vol2}, to make the \gls{cpu} restore the partial context from the \gls{stack} and continue to function pointed to by the RIP.
\begin{figure}
@ -271,40 +334,7 @@ The function that handles the interrupt must then use the instruction \textit{ir
\label{fig:amd64-long-mode-interrupt-stac}
\end{figure}
For a full context-switch, the other registers that are part of the context need to be handled by the \gls{OS}'s interrupt function.
\chapter{Research Questions}
Setting up and maintaining the paging-structure, as well as allocating physical memory for the virtual pages is a complex task in the \gls{OS}.
Developing this part of the \gls{OS} is error-prone, and is not well-supported by mainstream \glspl{proglang}.
\section{Definition Of Additional Analysis Rules To Extend Safety Checks}
% TODO: How can Business Logical
% Examples:
% TLB needs to be reset on Task Change
% Registers need to be
\subsubsection{Software Fault Isolation}
% TODO: content from \cite{Balasubramanian2017}
\subsection{More Detailed Research Questions}
% TODO Which language items help with managing memory?
% TODO How generic can the memory allocators be written?
% TODO Guarantees to be statically checked:
% TODO * Control access to duplicates in page tables
% TODO * Tasks can't access unallocated (physical) memory
% TODO * Tasks can't access other tasks memory
\subsection{Interrupts}
% TODO https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf p. 2848
\section{Software Tests}
% TODO: describe that tests are mostly semantics as opposed to static checks being mostly syntactical and technical
% TODO: They necessary in addition to static checks to cover the well-known use-cases and edge-cases.
% TODO: example?
For a full context-switch, the other registers that are part of the context need to be handled by the \gls{os}'s interrupt function.
\chapter{Porting \glsentrytext{C} Vulnerabilities}
\label{rnd::porting-c-vulns}
@ -312,8 +342,8 @@ In this chapter, the weakness manifestations from \cref{context::common-mem-safe
\chapter{\glsentrytext{LX} Modules Written In \glsentrytext{Rust}}
\chapter{Existing \glsentrytext{OS}-Development Projects Based On Rust}
\label{rnd::existing-os-dev-wity-rust}
\chapter{Existing \glsentrytext{os}-Development Projects Based On Rust}
\label{rnd::existing-os-dev-with-rust}
\section{Libraries}
@ -326,8 +356,9 @@ In this chapter, the weakness manifestations from \cref{context::common-mem-safe
\subsection{Blog OS}
\subsection{Redox}
\subsection{Tock}
%TODO: mention paper's by tockos team
\chapter{\glsentrytext{imezzos}: Adding Preemptive \glsentrytext{OS}-Level Multitasking}
\chapter{\glsentrytext{imezzos}: Adding Preemptive \glsentrytext{os}-Level Multitasking}
\label{rnd::imezzos-preemptive-multitasking}
\section{Timed Interrupts For Scheduling and Dispatching}