context: more on memory management and weaknesses

This commit is contained in:
steveej 2017-09-11 22:54:22 +02:00
parent e69951fe71
commit ebc4bcb8bb
8 changed files with 547 additions and 312 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

View file

@ -36,6 +36,20 @@
first = {\glsentrylong{addrspace}}
}
\newglossaryentry{stack}{
name = stack,
description = {
TODO
},
}
\newglossaryentry{heap}{
name = heap,
description = {
TODO
},
}
\newglossaryentry{api}{
name = API,
long = {Application Programming Interface},
@ -106,6 +120,13 @@
plural=Linuces
}
\newglossaryentry{android}{
name = Android,
description = {a mobile \gls{OS} based on \gls{LX}},
first = {\glsentryname{android}, \glsentrydesc{android}},
}
\newglossaryentry{imezzos}{
name = intermezzOS,
description = {

View file

@ -2,9 +2,12 @@
\chapter{Introduction}
\label{context::introduction}
This thesis studies the feasibility of using compile-time code analysis, as found in \gls{Rust}'s \gls{compiler}, for ensuring memory-safety within an \gls{OS} kernel.
Because an \gls{OS} is nothing but a \gls{app}, this study could be applied to all \glspl{app}, but the focus is on the implementation of \glspl{OS} which is the \gls{app} that is responsible for managing the system's resources and provide abstractions for higher level applications.
The \gls{OS} is the only \gls{app} that required unrestricted access to these resources, with the task of managing them safely according to the rules that are either hard-coded or set up by the \gls{sysadmin}.
This study could be applied to all \glspl{app}, but the focus is on the implementation of \glspl{OS} which is the \gls{app} that is responsible for managing the system's resources and provide abstractions for all other \glspl{app}.
For this the \gls{OS} is the only \gls{app} that required unrestricted access to these resources, with the job of managing them safely according to the rules that are either hard-coded or set up by the \gls{sysadmin}.
The increasing number of vulnerabilities based on memory-safety issues in \glspl{app}, as presented in \autoref{context::common-mem-safety-mistakes::cwe::statistics}, is a major motivator for working on this topic.
\section{Motivational Hypothesis}
% Primary Research Questions
@ -20,10 +23,10 @@ This is to my surprise, because as explained in \autoref{context::introduction::
The hypothesis cannot be trivially approved or denied, which drives the research efforts for my final thesis project.
Besides this specific hypothesis, many implementations of \glspl{OS} with \gls{Rust} have appeared in public.
Their purposes range from proof-of-concept and educational work like \gls{imezzos} and \gls{blogos}, to implementations that aim to be production grade software like \gls{redoxos} and \gls{tockos}.
These implementations are subject to evaluation in \ref{part:rnd}.
Their purposes range from proof-of-concept and educational work like \gls{imezzos} and \gls{blogos}, to implementations that aim to be production grade software like \gls{redoxos} and \gls{tockos} \cite{Levy2015a}.
These implementations are subject to evaluation in \autoref{rnd::existing-os-in-rust}
The final results presented will be of qualitative nature, captured by analyzing the existing and a self-developed \gls{Rust}-implementations of popular memory management techniques.
The final results will be of qualitative nature, captured by analyzing the existing and a self-developed \gls{Rust}-implementations of popular memory management techniques.
In addition to the sole analysis of \gls{Rust}-implementations, comparisons will be made, discerning the level of memory safety guarantees gained over similarly intending implementations in \gls{C}.
\section{Assessing Memory-Safety}
@ -36,12 +39,12 @@ These instructions are themselves able to alter the very main memory they are st
As any other \gls{app}, the \gls{OS} is loaded and executed in form of one or multiple sets of logically grouped instructions, called \glspl{program}.
Loading the \gls{OS}'s program into memory is not the responsibility of the \gls{OS}, it belongs to the components earlier in the boot process, namely the boot loader and system firmware.
The \gls{OS} takes over the responsibility to protect the main and secondary memory from the point where it is being handed control over by the bootloader.
Loading further programs into main memory is done by the \gls{OS}, either according to scheduled jobs set up by the \gls{sysadmin}, or based on well-defined events which can be triggered by any form of input via the system's interfaces.
For example, the \gls{OS} can load and execute a program stored on the hard-disk, after the user has gave the appropriate instructions via a terminal.
The execution of other programs is potentially dangerous, because they might attempt to access the memory content of other programs and their data.
The \gls{OS} takes over the responsibility to protect the main and secondary memory as soon as the bootloader has loaded the \gls{OS} and has jumped to its first instruction.
From this point, loading further programs into main memory is done by the \gls{OS}, either according to scheduled jobs set up by the \gls{sysadmin}, or based on well-defined events which can be triggered by any form of input via the system's interfaces.
For example, the \gls{OS} can load and execute a program stored on the hard-disk, after the user has given the appropriate instructions via a terminal.
It is the responsibility of the \gls{OS} to prevent executed programs from being able to mutually interfere with memory content that is not theirs, keeping the memory in a safe state at all times \footnote{This does not include memory-safety \textit{within} each of these executed programs, as the \gls{OS} has no pertinent knowledge of the program's intentions.}.
The execution of other programs is potentially dangerous, because they might attempt to access the memory content of other programs and their data.
It is the responsibility of the \gls{OS} to prevent such executed programs from being able to mutually interfere with memory content that is not theirs, keeping the memory in a safe state at all times \footnote{This does not include memory-safety \textit{within} each of these executed programs, as the \gls{OS} has no pertinent knowledge of the program's intentions.}.
This requires an extensive amount of care and foresight from the developers of the \gls{OS}, to ensure memory consistency in any of the various events and combinations thereof that might possibly occur at runtime.
\subsection{A Definition Of Memory-Safety For \glsentryplural{OS}}
@ -60,7 +63,7 @@ Any existing or hypothetical solution to this dilemma is not in scope of this th
First, public statistics in the area of software vulnerabilities are questionable with regard to their completeness.
Second, and more importantly, memory-safety related software mistakes should be detected as early as possible, ideally before the software is released and installed anywhere.
\subsection{Human Aspects}
\subsection{Human Aspect}
\label{context::introduction::human-aspect}
To detect software mistakes early, it is helpful to analyze where they originate.
This section emphasizes the fact that software - even if software-generators are interleaved - is ultimately produced by humans.
@ -86,33 +89,209 @@ I realized similar mindset in some of the other teams.
This personal experience is no scientific proof nor is it statistically significant.
It does create a feeling of insecurity, because if their software is distributed widely a few of these people are enough to risk the security of thousands of systems.
A professor and co-author of \citetitle{Arpaci-Dusseau2015} gives the following warning about this issue:
\textit{"Just because a program compiled(!) or even ran once or many times correctly does not mean the program is correct. Many events may have conspired to get you to a point where you believe it works, but then some- thing changes and it stops. A common student reaction is to say (or yell) “But it worked before!” and then blame the compiler, operating system, hardware, or even (dare we say it) the professor. But the problem is usually right where you think it would be, in your code. Get to work and debug it before you blame those other components."}\cite[p.~127]{Arpaci-Dusseau2015}
Plenty of educational, economical or methodological solutions are imaginable for this problem.
Higher focus on safety and testing in education, enforced internal company guidelines, or industry wide third party software certification requirements can be attempted.
For this thesis such constraints are out of scope, and the focus is on examining technical methods that detect and indicate mistakes as early as possible.
\subsection{Technical Aspect}
The problem on the technical side is that the \gls{compiler} was not able to detect all errors that are in the source code and the human was able to produce an executable program.
The problem on the technical side is that the \gls{compiler} is not able to detect all errors that are in the source code and the human was able to produce an executable program.
The resulting executable program might merely serve its purpose, and can contain severe technical mistakes that are not considered an error by the \gls{compiler}.
This is especially likely in low-abstraction languages like \gls{C}, where technical mistakes and intended behavior are difficult to distinguish.
This is especially likely using low-abstraction languages like \gls{C} and \gls{C++} for \gls{OS} development, where technical mistakes and intended behavior are very difficult to distinguish.
The goal of this thesis is to find out if the \gls{Rust} \gls{compiler} is able to mitigate this specific problem.
\chapter{OS Development Concepts}
This chapter explains concepts used in \gls{OS} development today, and is a direct preparation for the upcoming \autoref{context::common-mem-safety-mistakes}, which explains specific weaknesses that result from made memory-management mistakes in the attempt to implement these concepts.
Since the \gls{OS} manages the system's hardware directly, some of the implementation and design choices depend on the underlying hardware architecture.
For a full understanding the hardware implications are also outlined in this document.
To bound the extent of this and the following chapters, the explanations are limited to one contemporary architecture, \gls{amd64}, and further narrowed down by focusing on the operation in 64-Bit long mode\cite[p.~18]{AMD64Vol2} it provides.
\section{Resource Management by Virtualization}
Resource management in \gls{OS} development is different than in generic \glspl{app} development.
The \gls{OS}, typically the lowest software layer, must know the very details of the system's hardware and perform raw access to it.
\subsection{Layers}
The \gls{OS} creates a virtualization\footnote{The term \textit{virtualization} the \gls{OS} jargon can be understood as abstraction} layer on top of architecture specific code and abstracts it in form of an internal \gls{api}.
This layer abstracts at least the \gls{CPU} and memory\cite{Arpaci-Dusseau2015}.
Higher-level, complex management algorithms can then implement hardware-independent on top of this \gls{api}, making it reusable across different architectures.
The \gls{OS} then provides an \gls{api} through which \glspl{app} can request access to these virtualized resources.
This allows \gls{app} developers to develop and run different programs easily and presumably safely on the \gls{OS}, agnostic of the architecture.
\subsection{Resource Specifics}
Virtualization has different technical implications for different resources types, depending on their nature and available count.
To give an example, the \gls{CPU} is not explicitly requested, because any instruction by the program implicitly requires the \gls{CPU} to execute it.
In contrary, a program could ask the \gls{OS} for a specific amount of memory or to write text on the display output on behalf of it.
\section{Hardware-supported Memory-Management}
This section provides an overview of hardware-supported memory-management and protection techniques, which are necessary to understand in order to reason about memory-safety in the \gls{OS}.
To keep this section as short as possible, 64-Bit mode as described in \cite{AMD64Vol2} is assumed.
To effects of this are, in short, that the system relies primarily on paging memory management, thus memory segmentation can be neglected in this context.
\label{context::introduction::hw-supported-mm}
Activating the 64-Bit long mode on \gls{amd64} makes the system rely primarily on paging memory management, thus the technique of memory segmentation can be neglected in this context.
This section provides information about hardware-supported memory paging and protection techniques.
To improve the efficiency and safety of memory-management, developers of hardware and software have been collaborating to offload some memory-management operations from the \gls{OS} to the \gls{CPU}'s \gls{MMU}.
This improves speed and adds runtime memory permission checks\cite[p. 117]{AMD64Vol2}.
To improve the efficiency and safety of memory-management, developers of hardware and software have been collaborating to offload the page look-up from the \gls{OS} software to the hardware, namely the \gls{CPU}'s \gls{MMU}.
A hardware-implementation of the lookup algorithm is fast, and allows rudimentary memory permission runtime-checks to protect pages by leveraging \gls{CPU}'s security rings\cite[p.~117,~p.~145]{AMD64Vol2}.
\subsection{Virtual Address Translation and Paging}
Paging with virtual addresses is one method of virtualizing and in this way transparently share the system's memory among running programs and the \gls{OS} itself, presumably in a safe way.
On \gls{amd64}, the software's instructions use virtual memory addresses, which are translated to physical memory addresses by the \gls{MMU} of the \gls{CPU} at the time the instructions are executed.
Even when using a language that supports direct memory addressing, \gls{app} developers don't have to consider paging and address translation in the logic of their programs, because all addresses in their program are virtual and are translated at runtime by the \gls{MMU}.
The translation itself is performed by the \gls{MMU} according to a map that is called page table, which is a structure maintained by the \gls{OS} in the main memory.
This memory structure can be stored anywhere in memory, and the address is handed to the \gls{MMU} via a specific \gls{CPU} register, \textit{CR3} on \gls{amd64}.
The \gls{OS} can maintain multiple page table structures, and can create different virtual address spaces by changing \gls{MMU}'s page-table pointer - the \textit{CR3} register.
\subsection{Virtualization - Challenges Of Multitasking}
In order to concurrently run multiple programs easily and presumably safely, the \gls{OS} conducts virtualization of the \gls{CPU}, memory and other resources\cite{Arpaci-Dusseau2015}.
This allows to perform preemptive multitasking transparently to the programs at runtime, which means that it has no side-effects on the running programs and it needs not be considered during \gls{app} development.
To avoid the need for storing a translation mapping for every possible address, mappings are grouped into fixed-size pieces, the \textit{page}s.
This works by encoding the offset within the page in the virtual address, together with the index into the page table.
\subsubsection{Task Switching}
When the \gls{OS} preempts a task it needs to store and preserve the current task's context in a well-known and protected memory location, so that it can be restored when this task is resumed.
The offset size depends on the chosen page-size, and can be calculated with the following formula, given page-size in bytes as $p$:
\begin{equation}
\textrm{offset\_bits(p)} = log_2(p), \{ p \in N, p: n^2 \}
\end{equation}
For example, the \gls{amd64} default page-size of 4 KiB has a 12-bit offset, which leaves $64-12 = 52$ bits for page-table indexing.
\paragraph{Page-Faults}
If an instruction uses a virtual address that indexes a page which is not present in memory, the \gls{CPU} will generate page-fault exception to give control back to the \gls{OS}.
The \gls{OS} must then react accordingly by e.g. finding free physical memory and map it to the page my modifying the page's page-table entry.
\paragraph{Hypothetical 1-level-page-table example.}
If only one page-table per context was used that consists of $2^{52}$ page-table entries which mustat minimum store the physical address it maps to, it would require $\frac{52 * 2^{52} [Bit]}{8*1024^4 [Bit/Byte]} = 26624$ TiB of memory for each context.
Even if only a handful of additional pages were allocated and mapped, the \gls{OS} would still have to allocate this huge page-table.
This vast consumption of main memory is impractical and impossible for average systems, which rarely surpass 100 GiB of main memory.
\subsubsection{Swapping}
The finite primary memory can only hold a finite number of virtual pages, and the \gls{OS} is responsible for having the required pages present.
Besides the pages that contain the page-table itself, the pages that aren't required by the current instruction might be moved to secondary memory.
Swapping pages in and out of primary memory is risky as it requires to transfer large amounts of raw memory content, but these safety analyzes exceed the scope of this thesis.
\subsubsection{Multi-Level Paging}
\label{context::introduction::hw-supported-mm::multilevel-paging}
On \gls{amd64} "a four-level page-translation data structure is provided to allow long-mode operating systems to translate a 64-Bit virtual-address space into a 52-Bit physical-address space."\cite[p.~18]{AMD64Vol2}.
Using a hierarchical translation structure allows to save significant amounts of memory, as not every page-table of every level and address space has to be allocated and present in memory.
Only the \textit{PML4} which is currently referenced by the \textit{Page Map Base Register (CR3)} is required to be present.
\autoref{fig:virtual-addr-transl} shows the 64-Bit virtual address composition on \gls{amd64}, which uses four-levels of page tables.
Counterintuitively the page-tables are not called level-\textit{n}-page-table, but the levels received distinct names in \citetitle{AMD64Vol2}.
The most-significant Bits labelled as \textit{Sign Extend} are not used for addressing purposes, but must adhere the canonical address form and simply repeat the value of the most-significant implemented Bit \cite[p.~130]{AMD64Vol2}.
The least significant Bits represent the offset within the physical page.
The four groups in between are used to index the page-table at their respective level.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/Virtual-to-Physical-Address-Translation-Long-Mode.png}
\caption{Virtual to Physical Address in Long Mode\cite{AMD64Vol2}}
\label{fig:virtual-addr-transl}
\end{figure}
\subsubsection{Translation Scheme 4 KiB and 2 MiB Pages}
The \gls{amd64} architecture allows configuring the page-size, two of which will be introduced in this section.
\autoref{tab:page-transl-vaddr-composition} displays the virtual address composition for the 4KiB and 2MiB page-size modes on \gls{amd64}.
The direction from top to bottom in the table corresponds to most significant to least significant - left to right - in the virtual address.
The \textit{sign extension} Bits cannot be used for actual information but act as a reservation for future architectural changes.
\begin{table}
\begin{tabular}{l | c | c}
Description & Bits in 4 KiB Pages & Bits in 2 MiB Pages \\
\hline
Sign Extend & 12 & 12 \\
Page-Map-Level-4 Offeset & 9 & 9 \\
Page-Directory-Pointer Offeset & 9 & 9 \\
Page-Directory Offeset & 9 & 9 \\
Page-Table Offeset & 9 & - \\
Physical Page Offset & 9 & 21 \\
\end{tabular}
\caption{Paging on \gls{amd64}: Virtual Address Composition 4KiB/2MiB pagesizes}
\label{tab:page-transl-vaddr-composition}
\end{table}
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/amd64-4kb-page-translation-long-mode}
\caption{4-Kbyte Page Translation—Long Mode\cite{AMD64Vol2}}
\label{fig:4kb-page-transl}
\end{figure}
\autoref{fig:4kb-page-transl} shows the detailed virtual address composition for 4 KiB pages, using four levels of page-tables.
It uses four sets of 9-Bit indices in the virtual address, one per hierarchy level, followed by the 9 Bit page-internal offset.
An alternative approach is displayed in \autoref{fig:2mb-page-transl}, using 2 MiB sized pages.
It uses three sets of 9-Bit indices for the page-tables, and a 21-Bit page-internal offset.
Increasing the page-size improves speed and memory-usage and decreases the granularity.
In this specific example the hierarchy is reduced by one level of page-tables.
This reduces the amount of storage required for the page-tables in overall and causes the lookup algorithm to finish faster.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/amd64-2mb-page-translation-long-mode}
\caption{2-Mbyte Page Translation—Long Mode\cite{AMD64Vol2}}
\label{fig:2mb-page-transl}
\end{figure}
The other supported page sizes, 4 MiB and 1 GiB, as well as intermixing page sizes through the different levels don't add new insight into the mechanism and don't need to be detailed here.
% \subsubsection{Top-Level Page Table Self-Reference}
% \subsubsection{Caching Lookups}
% \subsubsection{Full Example}
% * http://taptipalit.blogspot.de/2013/10/theory-recursive-mapping-page.html
% * https://www.coresecurity.com/blog/getting-physical-extreme-abuse-of-intel-based-paging-systems-part-2-windows
\subsection{Premised Trust In Hardware}
The algorithms that are implemented in hardware can't be verified and need to be trusted to work exactly like the manual describes them.
% TODO: remove this chapter of write something interesting
\subsection{The \textit{Stack} And \textit{Heap} Concept}
In \gls{proglang} and \gls{OS} design and literature, the terms \gls{stack} and \gls{heap} are ubiquitous and assumed to be known.
To avoid ambiguities in the first place, this document refers to \gls{heap} as the memory zone, not the data structure.
From a perspective of developing \glspl{app} and studying \gls{OS} course content, there is still a certain vagueness in the understanding of these concepts.
After the research for their origin it is clear that they are mere concepts, that might be implemented and used differently in the various \glspl{OS} and \glspl{proglang}.
The hardware manuals \citetitle{AMD64Vol1} and \citetitle{AMD64Vol2} refer to \gls{stack} but have no mention of \gls{heap}.
\subsubsection{Stack: Hardware-Backed Abstract Type}
The \gls{amd64} manuals conjunctionally describe how the \gls{stack} is used and influenced by various instructions.
In summary, it is a memory model for a structured contiguous memory region which grows by storing new data entries on top of each other.
It grows from numerically higher to numerically lower addresses, whereas the numerically highest address is called the stack bottom, and the current numerically lowest address is the stack top.
Hi
The usage of the \gls{stack} is coupled with control flow instructions, in conjunction with two registers, the Stack-Frame Base Pointer (RBP) and the Stack Pointer (RSP).
The instructions that reference with the stack\cite[p.~83]{AMD64Vol1} can be grouped into the following three categories.
\paragraph{Data Storage}
The address in RSP moves towards numerically lower addresses with every PUSH instruction, which stores a new data entry on top.
POP instructions works in the opposite direction, consuming the top-most data entry, moving the RSP address towards the numerically higher RBP address.
When RBP and RSP have the same value, the stack is empty.
In 64-Bit long mode \gls{amd64} doesn't consider the stack to be sized, so it is up to the \gls{OS} developer to ensure that it doesn't grow into other foreign memory regions.
\paragraph{Procedure Calls}
% TODO CALL
% TODO RET
\Glspl{stack} used for procedure calls\footnote{a different word for function call}, specifically for passing data from the calling to the called procedure.
\paragraph{Procedure Setup}
% TODO ENTER
% TODO LEAVE
\section{Preemptive Multitasking}
Virtualization as previously explained is the foundation for the \gls{OS} to perform preemptive multitasking inconspicuously towards the \glspl{app}.
This means that when a task is preempted and continued later, it observes no side-effects other than an elapse of time.
Preemptive multitasking needs not be considered during development of single-threaded \gls{app}.
Multi-threading and
\subsection{Resource Characteristics}
Switching tasks has different technical implications for different resources types, depending on their nature and quantity.
For example, a single \gls{CPU} system can not be utilized by more than one program at the same time, as it runs instructions one-by-one and implicitly holding the program state in form of the \gls{CPU} registers, which are preserved in between the instructions.
In contrast, main memory resources are only limited by their capacity and can otherwise be shared by several programs simultaneously, so that tasks that are not executed by \gls{CPU} can still have data stored in memory.
The \gls{OS} must ensure that switching tasks is done properly for all resources to prevent interference and unintended behavior.
To ensure memory safety in this scenario, all data in the memory must be protected from unintended access, according to the definition of memory safety in \autoref{context::introduction::memory-safety::def}.
\subsection{Context Switching}
When the \gls{OS} preempts a task, it needs to store and preserve the current task's context.
The context consists of all volatile resources that can possibly be overwritten by another task.
This is at minimum a set of \gls{CPU} registers depending on the specific architecture.
For \gls{amd64}, see \autoref{tab:task-minimum-context-registers}.
A description for \gls{amd64} is given in \autoref{tab:task-minimum-context-registers}.
The \gls{OS} stores the preempted context in a well-known and protected memory location, so that it can be restored when this task is resumed.
\begin{table}
\begin{tabularx}{\textwidth}{| c | X | X |}
@ -123,74 +302,53 @@ For \gls{amd64}, see \autoref{tab:task-minimum-context-registers}.
\hline
the instruction pointer register & RIP & address of the next instruction to be fetched \\
\hline
the instruction pointer register & RIP & address of the next instruction to be fetched \\
\hline
all general-purpose registers & RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8R15 & any data \\
\hline
the instruction pointer register & RIP & address of the next instruction to be fetched \\
\hline
the stack pointer register & RSP & address of current position in stack \\
\hline
the flags register & RFLAGS & various attributes, e.g. the interrupt flag \\
\hline
the instruction pointer register & RIP & address of the next instruction to be fetched \\
all general-purpose registers & RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8R15 & arbitrary data \\
\hline
\end{tabularx}
\caption{Minimum Context Registers on amd64\cite[p. 28]{AMD64Vol2}}
\caption{Minimum Context Registers on amd64\cite[p.~28]{AMD64Vol2}}
\label{tab:task-minimum-context-registers}
\end{table}
\subsection{Virtual Address Translation and Paging}
% TODO: why virtual addressing?
On \gls{amd64}, the software's instructions use virtual memory addresses, which are translated to physical memory addresses by the \gls{MMU} of the \gls{CPU} at the time the instructions are executed.
The responsibility falls onto the \gls{OS}, thus \gls{app} developers don't have to consider paging in the logic of their programs.
\subsubsection{Using Hardware Induced Interrupts}
In preemptive multitasking, context switches are not considered voluntary, but rather by force.
This works by using the \gls{CPU}'s interrupt mechanism which has the ability to jump to an \gls{OS} function in the event of an interrupt.
Interrupts for this use-case are usually triggered by programmed timer interrupts, occurring continuously and regularly.
The interrupt mechanism itself is part of the \gls{CPU} and must be used by the \gls{OS} as is.
On \gls{amd64}, the \gls{CPU}'s interrupt mechanism does not switch the full context described previously, but only handles the registers that are necessary to successfully jump to the interrupt function: RFLAGS, RSP, RBP, RIP\footnote{Segment registers are neglected}.
In this scenario, the context is stored on the \gls{stack} of the function that is interrupted.
\autoref{fig:amd64-long-mode-interrupt-stac} pictures the \gls{stack} layout on interrupt entry.
In order to leverage an interrupt for a context switch, the interrupt function needs to replace these values on the \gls{stack} with values for the new context.
CS (Code-Segment) and SS (Stack-Segment) have no effect in \gls{amd64} 64-Bit mode\cite[p.~20]{AMD64Vol1} and can remain unchanged.
The \gls{OS} developer needs to know the exact address where on the \gls{stack} this data structure has been pushed by the \gls{CPU}, and must then manipulate these addresses directly.
This type of manipulation is inherently dangerous and can not be easily checked by the \gls{compiler}.
The function that handles the interrupt must then use the instruction \textit{iretq}\cite[p.~252]{AMD64Vol2}, to make the \gls{CPU} restore the partial context from the \gls{stack} and continue to function pointed to by the RIP.
To avoid the need for storing a translation mapping for every possible address, mappings are grouped into fixed-size pieces, called \textit{page}s.
This works by encoding the offset within the page into virtual address, together with the index into the translation array, which is an array commonly called the \textit{page table}.
The translation itself is performed by the \gls{MMU} according to a map that is called page table, which is a structure maintained in memory by the \gls{OS}.
This memory structure can be stored anywhere in memory, and the address is handed to the \gls{MMU} via a specific \gls{CPU} register, which is \textit{CR3} on \gls{amd64}.
Safety could be increased if the \gls{compiler} or in a more general sense the \gls{proglang} could assist in architecture specific code.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/Virtual-to-Physical-Address-Translation-Long-Mode.png}
\caption{Virtual to Physical Address in Long Mode\cite{AMD64Vol2}}
\label{fig:virtual-addr-transl}
\includegraphics[width=0.8\textwidth]{gfx/amd64-long-mode-stack-after-interrupt.png}
\caption{Long-Mode Stack After Interrupt\cite[p.~252]{AMD64Vol2}}
\label{fig:amd64-long-mode-interrupt-stac}
\end{figure}
\subsubsection{Multi-Level Paging}
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/amd64-4kb-page-translation-long-mode}
\caption{4-Kbyte Page Translation—Long Mode\cite{AMD64Vol2}}
\label{fig:4kb-page-transl}
\end{figure}
\subsubsection{Top-Level Page Table Self-Reference}
\subsubsection{Caching Lookups}
\subsubsection{Full Example}
* http://taptipalit.blogspot.de/2013/10/theory-recursive-mapping-page.html
* https://www.coresecurity.com/blog/getting-physical-extreme-abuse-of-intel-based-paging-systems-part-2-windows
\subsubsection{Swapping}
The physical memory can only hold a limited number of pages, and the \gls{OS} is responsible to swap the pages into and from physical memory from and to a persistent memory.
Swapping is only mentioned for the sake of completeness, and is not further pursued in this thesis.
\subsection{Premised Trust In Hardware}
\subsection{Stack And Heap Concept}
\subsection{Memory Allocation}
For a full context-switch, the other registers that are part of the context need to be handled by the \gls{OS}'s interrupt function.
This function is critical for safety and, as any other function in the \gls{OS}, is written by the \gls{OS} developer.
\chapter{Common Memory-Safety Mistakes}
\label{chap:context:common-mem-safety-mistakes}
Building upon \autoref{context::introduction}, which describes the basic mechanics of memory usage and how mistakes come to existence, this chapter explains some of the most common software vulnerabilities that are related to memory-safety.
\label{context::common-mem-safety-mistakes}
Building upon \autoref{context::introduction}, which describes the basic mechanics of memory usage and how mistakes come to existence, this chapter presents and explains common software vulnerabilities that are related to memory-safety.
The relevant vulnerability classes are explained alongside exemplary manifestations in \gls{C}/\gls{C++}.
In \autoref{rnd::porting-c-vulns}, these are ported and compared to functionally equivalent versions written in \gls{Rust}.
\section{\glsentrylong{CWE}}
\label{context::common-mem-safety-mistakes::cwe}
Ongoing effort of collecting, analyzing and classifying vulnerabilities and their underlying weaknesses has been expended by the \textit{The MITRE Corporation} in form of the \gls{CWE}.
It has grown to a large relational database of typed weaknesses.
The following information is provided for enumerations of the type weakness class:
@ -204,31 +362,45 @@ The following information is provided for enumerations of the type weakness clas
\item Relationships
\end{itemize}
\subsection{Relevant Weaknesses}
The relevant weakness for this thesis are \gls{CWE-633} and respectively all of its children, as it serves as an umbrella weakness.
The relevant weaknesses for this thesis are children of the umbrella weakness \citetitle{MITRE-CWE-633}.
% TODO test the autocite command with footnotes
One of its children, \citep{MITRE-CWE-119}, is particularly interesting.
If this weakness is manifested, a direct violation of the memory-safety defined in \autoref{context::introduction::memory-safety::def} must have occurred, which "can cause read or write operations to be performed on memory locations that may be associated with other variables, data structures, or internal program data.
\subsection{\citetitle{MITRE-CWE-119}}
\label{context::common-mem-safety-mistakes::cwe::119}
One of its children weaknesses, \citetitle{MITRE-CWE-119}, is particularly interesting.
Manifestations of this weakness are a direct violation of the memory-safety defined in \autoref{context::introduction::memory-safety::def} must have occurred, which "can cause read or write operations to be performed on memory locations that may be associated with other variables, data structures, or internal program data.
As a result, an attacker may be able to execute arbitrary code, alter the intended control flow, read sensitive information, or cause the system to crash"\cite{MITRE-CWE-119}.
This can happen on certain languages, which "allow direct addressing of memory locations and do not automatically ensure that these locations are valid for the memory buffer that is being referenced.
\gls{C}, \gls{C++}, \gls{asm} and languages without memory management support"\cite{MITRE-CWE-119}.
The documented formulation of languages prone to this weakness is incorrect, as it doesn't conform with the earlier statement of languages that "allow direct addressing of memory locations".
\gls{C}, \gls{C++}, \gls{asm} and languages without memory management support"\autocite{MITRE-CWE-119}.
This formulation of languages prone to this weakness is incorrect, as it doesn't conform with the earlier statement of languages that "allow direct addressing of memory locations".
Direct memory addressing support doesn't imply a lack of memory management support.
Interestingly there are languages - like \gls{Rust} - that provide memory management support and still allow direct memory addressing.
This will be explained in \autoref{context::rust} in more detail.
There are languages that provide memory management support and still allow direct memory addressing, which is interesting for \gls{OS} development.
\gls{Rust} is one of these languages, although it requires the developer to explicitly acknowledge all direct memory access operations with the \textit{unsafe} keyword.
More information on \gls{Rust} follows in \autoref{context::rust}.
\subsection{Statistics}
This section presents data with the intention of expressing the weakness's severity in real-world software.
The data is based on publicly available sources, thus the completeness of is questionable, because many organizations might choose to not disclose their vulnerabilities, either to protect their reputation or for security reasons as already explained in \autoref{context::introduction::memory-safety-violation-in-sw}.
\label{context::common-mem-safety-mistakes::cwe::statistics}
One of the main reasons for me to work on this topic is the increasing number of vulnerabilities based on memory-safety issues.
\subsubsection{NVD's CWE-119 Statistics}
This section is intended to express the weakness's severity in real-world software based on available statistics.
The only data available is based on publicly available sources, thus the completeness of it is questionable, because many organizations might choose to not disclose their vulnerabilities, either to protect their reputation or for security reasons as explained in \autoref{context::introduction::memory-safety-violation-in-sw}.
The data and visualizations are supplied by the \gls{NVD}, which collects the data based on the umbrella weakness CWE-635\footnote{http://cwe.mitre.org/data/definitions/635.html} that was specifically created for the \gls{NVD}.
The numbers of these selected weaknesses are detailed in the following figures, the rest is grouped as \textit{other}.
\autoref{fig:vulnerability-ratio-history} and \autoref{fig:vulnerability-counts-history} display statistics on vulnerabilities grouped by their \gls{CWE} category.
Only the most significant categories are labeled in these figures, the rest is grouped as \textit{other}.
The category \textit{buffer\footnote{A limited chunk of memory used by programs to store various data} errors} represents \autocite{MITRE-CWE-119}.
\autoref{fig:vulnerability-ratio-history} and \autoref{fig:vulnerability-counts-history} display a decade of data on vulnerabilities grouped by their \gls{CWE} category.
The category called \textit{buffer\footnote{A bounded chunk of memory used by programs to store and exchange data} errors} represents \autocite{MITRE-CWE-119}.
In \autoref{fig:vulnerability-ratio-history} it has the color light blue, 2nd from the bottom in the legend, and in \autoref{fig:vulnerability-counts-history} it has the color blue, 2nd from the top in the legend.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/Relative-Vulnerability-Type-Totals-By-Year}
\caption{Vulnerability Relative Counts History}
\label{fig:vulnerability-ratio-history}
\includegraphics[width=\textwidth]{gfx/Vulnerability-Type-Change-by-Year}
\caption{Vulnerability Absolute Counts History}
\label{fig:vulnerability-counts-history}
\end{figure}
\begin{table}
\centering
@ -253,20 +425,40 @@ The category \textit{buffer\footnote{A limited chunk of memory used by programs
\label{tab:vulnerability-buffer-error-by-history}
\end{table}
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/Relative-Vulnerability-Type-Totals-By-Year}
\caption{Vulnerability Relative Counts History}
\label{fig:vulnerability-ratio-history}
\includegraphics[width=\textwidth]{gfx/Vulnerability-Type-Change-by-Year}
\caption{Vulnerability Absolute Counts History}
\label{fig:vulnerability-counts-history}
\end{figure}
In \autoref{tab:vulnerability-buffer-error-by-history}, the column \textit{relative count} represents \autoref{fig:vulnerability-ratio-history}, and the column \textit{absolute count} represents \autoref{fig:vulnerability-counts-history}.
With 16.34 percent of all vulnerabilities known by 2017, and an average of 12.92 percent over the last 10 years, \gls{CWE-119} is to be taken seriously.
With 16.34 percent of all vulnerabilities known by 2016, and an average of 12.92 percent over ten years, \autocite{MITRE-CWE-119} makes up a significant part of real-world weaknesses.
\section{Example Manifestations}
\subsection{Vulnerable APIs in Linux and C/C++}
\label{context::common-mem-safety-mistakes::vuln-apis-linux-c}
\glspl{api} are a ubiquitous for programmers to access all kinds of functionality, serving as interfaces to network services, providing existing algorithms in form of libraries and frameworks, or interfacing with the local \gls{OS}.
It is inherently dangerous to expose any sort of functionality through an \gls{api}, as it might contain bugs that will be spread widely with rising popularity.
Every \gls{OS} needs to provide an \gls{api} for it's core functionality to be useful and extendable.
A very popular and widely supported \gls{OS} is \gls{LX}.
The system libraries and the kernel are written in \gls{C}, the latter containing some hardware specific \gls{asm} code.
\gls{LX} is very popular for embedded systems, network servers and large-scale computers. % TODO: reference
Through \gls{android}, \gls{LX} has been distributed to a huge amount of mobile devices within the last decade. % TODO: reference
The list of vulnerabilities that are found in \gls{LX} device drivers which were written by \gls{android} device vendors is very concerning.
Even though Device drivers are not necessarily complex per-se, as they essentially just copy data to and from the hardware they target, but the difficulty is in performing these transfers only under safe circumstances.
\gls{LX} has a huge ecosystem with existing libraries for any imaginable use-case from cryptography to artificial intelligence to give random examples.
It is necessary to investigate manifestations of these errors in detail in order to analyze if these might be prevented by using \gls{Rust}.
The manifestations of memory-safety related vulnerabilities in the \gls{LX} ecosystem are given in the next section.
\section{Manifestations}
\label{context::common-mem-safety-mistakes::manifestations}
% Significance of the Study
% The significance is a statement of why it is important to determine the answer to the gap in the knowledge, and is related to improving the human condition. The contribution to the body of knowledge is described, and summarizes who will be able to use the knowledge to make better decisions, improve policy, advance science, or other uses of the new information. The “new” data is the information used to fill the gap in the knowledge.
This section contains real-world and \textit{re}constructed example manifestations of memory-safety related weaknesses.
% TODO
\subsection{The Stack Clash}
A recent and high impact vulnerability named \textit{Stack Clash}\footnote{https://blog.qualys.com/securitylabs/2017/06/19/the-stack-clash}, is briefly described as \textit{"a vulnerability in the memory management of several operating systems. It affects Linux, OpenBSD, NetBSD, FreeBSD and Solaris, on i386 and amd64. It can be exploited by attackers to corrupt memory and execute arbitrary code."}
The \gls{LX} specific vulnerability is listed as CVE-2017-1000364\footnote{http://www.cvedetails.com/cve/CVE-2017-1000364/}, where \textit{"an issue was discovered in the size of the stack guard page on Linux, specifically a 4k stack guard page is not sufficiently large and can be "jumped" over (the stack guard page is bypassed)"}.
It is assigned to the \autocite{MITRE-CWE-119} explained in \autoref{context::common-mem-safety-mistakes::cwe::119}.
% TODO explain that this CWE-119 vulnerability is also "Execute Code"
% TODO: more references and deeper explanation of what happens: see introduction in https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
\subsection{Uninitialized Pointers}
@ -300,11 +492,6 @@ if (ptr == NULL) {
}
\end{lstlisting}
\section{The Stack Clash}
A recent and high impact vulnerability named \textit{Stack Clash}\footnote{https://blog.qualys.com/securitylabs/2017/06/19/the-stack-clash}, is briefly described as \textit{"a vulnerability in the memory management of several operating systems. It affects Linux, OpenBSD, NetBSD, FreeBSD and Solaris, on i386 and amd64. It can be exploited by attackers to corrupt memory and execute arbitrary code."}
The \gls{LX} specific vulnerability is listed as CVE-2017-1000364\footnote{http://www.cvedetails.com/cve/CVE-2017-1000364/}, where \textit{"an issue was discovered in the size of the stack guard page on Linux, specifically a 4k stack guard page is not sufficiently large and can be "jumped" over (the stack guard page is bypassed)"}.
% TODO: more references and deeper explanation of what happens: see introduction in https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
\chapter{Safe \gls{OS} Development}
\label{context::introduciton::safe-os-dev}
@ -314,13 +501,11 @@ In order to protect the memory of each executed program according to \autoref{co
\subsection{Detecting Memory-Safety Violations ASAP}
\label{context::safe-os-dev::detecting-safety-violations-asap}
Given that it can not be prevented for individuals to type erroneous code into their code editors.
Ideally, the \gls{compiler} should be able to detect the programmers technical mistakes, especially the ones that have a negative impact on memory-safety.
Not only beginners or sloppy programmers, but advanced programmers can profit too.
Everybody makes mistakes from time to time, depending on the level of focus which is not a constant.
Advanced programmers can profit too, as everybody makes mistakes from time to time, depending on the level of focus which is not a constant.
The human aspect suggests that systems needs to be designed to be testable, and then tested thoroughly in order to mitigate the risks of erroneous software being executed by the end-user.
@ -328,13 +513,11 @@ In addition to the presence and quality of tests, their timing in the software l
The earliest tests can be as soon as the process of software development itself, and the latest ones can be at the time of execution on the production system of the end-user.
It is desirable to place tests as early as possible in the software life cycle, to prevent them from compromising running systems that hold sensitive data and offer important services.
The dimension of time can also be translated to hierarchically lower system components at run-time.
This suggests that the \gls{OS} must be tested before the other executed \glspl{app}, etc.
This can be easily explained.
From a \gls{app} perspective, testing every permutation of \gls{OS} runtime states can be impossible, because the \gls{app} can not freely mutate the system's state.
Even if it could, testing all possible permutations of system state is limited by time and resource restrictions.
That's why even disciplined software engineers write tests that only target common error cases, like system memory exhaustion, and ensure syntactic and semantic correctness for the \gls{app} being developed.
Edge cases that happen only under specific system circumstances, possibly influenced by other components on the system as described in the beginning of \autoref{context::introduction::memory-safety}, are at high risk of remaining untested, and the \gls{app} developer is forced to trust the underlying \gls{OS}.
This suggests that since the \gls{OS} is lower in the hierarchy of system components at runtime, testing of the \gls{OS} must happen regardless of specific \glspl{app} and development time.
Especially testing the \gls{OS}'s internal states which can not be directly mutated via the \gls{api} exposed to the \glspl{app}.
To explain this from the \gls{app} perspective, testing the \gls{OS} at runtime states is not plausible , because the \gls{app} can not freely mutate the system's state.
Even if it could, testing all possible permutations of system state in every possible \gls{app} would be highly redundant and nonetheless leaves the risk for untested edge cases that happen only under specific system circumstances, possibly influenced by other components on the system as described in the beginning of \autoref{context::introduction::memory-safety}.
The \gls{app} developer is forced to trust the underlying \gls{OS}.
This puts high importance on the safety of the \gls{OS} design and implementation.
\subsection{The Effects Of \Glspl{proglang} on Memory-Safety}
@ -356,15 +539,10 @@ By defining an abstraction layer in form of a programming language, the language
In \autoref{context::introduction::memory-safety}, specifically in \autoref{context::introduction::memory-safety::detection}, it was explained that programming languages have direct impact on the memory-safety.
This section gives an example of how severe this impact is and explains the requirements on a \gls{OS} language.
\chapter{CWE Examples} % TODO is this chapter required?
% Significance of the Study
% The significance is a statement of why it is important to determine the answer to the gap in the knowledge, and is related to improving the human condition. The contribution to the body of knowledge is described, and summarizes who will be able to use the knowledge to make better decisions, improve policy, advance science, or other uses of the new information. The “new” data is the information used to fill the gap in the knowledge.
One of the main reasons for me to work on this topic is the increasing number of vulnerabilities based on memory-safety issues, represented by the statistics shown in \autoref{TODO}
\section{Linux and C}
A very popular and widespread \gls{OS} is \gls{LX} which is written in \gls{C} and some hardware specific \gls{asm} code.
Recent years have shown how prone it is to vulnerabilities that result from programming errors related to memory management.
\chapter{Mitigation Attempts}
\section{\glsentrytext{C}}
With the growing number of vulnerabilities, various solutions have been proposed to increase the safety of C, either with static code analysis or via \gls{compiler}-generated checks imposed at runtime. (TODO: reference).
Static analysis are not very effective on a language that has not been designed to be safety-analyzed. TODO? reference?

View file

@ -1,4 +1,17 @@
% // vim: set ft=tex:
\chapter{Result Evaluation}
\chapter{Result Generalization}
\chapter{Conclusion}
\section{Low-Level Safe Abstractions in Rust}
% TODO: Is the static analysis of hardware specific assembly code possible and useful at all?
% TODO: LLVM knows about the target and can potentially give hints about hardware specific instructions
\section{Tracking \textit{'static}ally allocated Resources}
\section{The Necessary Evils of \textit{unsafe}}
\chapter{Result Evaluation}
% TODO: repeat that rust *can* be used to increase safety in the OS, but it doesn't guarantee it per-se
\chapter{Summary}
\chapter{Final Conclusion}

View file

@ -2,7 +2,10 @@
\chapter{Topic Refinement}
% TODO: is this chapter required?
\chapter{Derived Research Questions}
\chapter{Research Questions}
Setting up and maintaining the paging-structure, as well as allocating physical memory for the virtual pages is a complex task in the \gls{OS}.
Developing this part of the \gls{OS} is error-prone, and is not well-supported by mainstream \glspl{proglang}.
\subsection{Definition Of Additional Analysis Rules To Extend Safety Checks}
% TODO: How can Business Logical
@ -32,13 +35,13 @@
\chapter{Porting \glsentrytext{C} Vulnerabilities}
\label{rnd:porting-c-vulns}
\label{rnd::porting-c-vulns}
In this chapter, the examples from \autoref{TODO} ported to \gls{Rust} for evaluation.
\chapter{\glsentrytext{LX} Modules Written In \glsentrytext{Rust}}
% TODO: describe Difficulties with the GPL Macros used Within Kernel Modules
\chapter{Existing \glsentrytext{OS}-Development Projects Based On Rust}
\label{rnd::existing-os-in-rust}
\section{Libraries}
@ -54,6 +57,7 @@ In this chapter, the examples from \autoref{TODO} ported to \gls{Rust} for evalu
\chapter{\glsentrytext{imezzos}: Adding Preemptive \glsentrytext{OS}-Level Multitasking}
\section{Timed Interrupts For Scheduling and Dispatching}
\section{Simple Stack Allocation Scheme}
\section{Risk Of Stack-Overflow}
@ -64,17 +68,3 @@ In this chapter, the examples from \autoref{TODO} ported to \gls{Rust} for evalu
% Stack size for each function: calculated,
% Call-Tree: calculated,
\chapter{Result Generalization}
\section{Low-Level Safe Abstractions in Rust}
% TODO: Is the static analysis of hardware specific assembly code possible and useful at all?
% TODO: LLVM knows about the target and can potentially give hints about hardware specific instructions
\section{Tracking \textit{'static}ally allocated Resources}
\section{The Necessary Evils of \textit{unsafe}}
\chapter{Result Evaluation}
% TODO: repeat that rust *can* be used to increase safety in the OS, but it doesn't guarantee it per-se
\chapter{Summary}

View file

@ -3,13 +3,15 @@ Any changes to this file will be lost if it is regenerated by Mendeley.
BibTeX export options can be customized via Options -> BibTeX in Mendeley Desktop
@misc{MITRE-CWE-119,
author = {MITRE},
booktitle = {2.11},
title = {{CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer}},
url = {http://cwe.mitre.org/data/definitions/119.html},
urldate = {2017-08-31},
year = {2017}
@article{Getreu2016,
annote = {- runtime checkis are expensive
- critical with energy restriction on the target device},
author = {Getreu, Jens},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Embedded System Security with Rust - Case Study of Heartbleed.pdf:pdf},
pages = {1--24},
title = {{Embedded System Security with Rust}},
year = {2016}
}
@article{Xu2015,
abstract = {Since vulnerabilities in Linux kernel are on the increase, attackers have turned their interests into related exploitation techniques. However, compared with numerous researches on exploiting use-after-free vulnerabilities in the user applications, few efforts studied how to exploit use-after-free vulnerabilities in Linux kernel due to the difficulties that mainly come from the uncertainty of the kernel memory layout. Without specific information leakage, attackers could only conduct a blind memory overwriting strategy trying to corrupt the critical part of the kernel, for which the success rate is negligible. In this work, we present a novel memory collision strategy to exploit the use-after-free vulnerabilities in Linux kernel reliably. The insight of our exploit strategy is that a probabilistic memory collision can be constructed according to the widely deployed kernel memory reuse mechanisms, which significantly increases the success rate of the attack. Based on this insight, we present two practical memory collision attacks: An object-based attack that leverages the memory recycling mechanism of the kernel allocator to achieve freed vulnerable object covering, and a physmap-based attack that takes advantage of the overlap between the physmap and the SLAB caches to achieve a more flexible memory manipulation. Our proposed attacks are universal for various Linux kernels of different architectures and could successfully exploit systems with use-after-free vulnerabilities in kernel. Particularly, we achieve privilege escalation on various popular Android devices (kernel version{\textgreater}=4.3) including those with 64-bit processors by exploiting the CVE-2015-3636 use-after-free vulnerability in Linux kernel. To our knowledge, this is the first generic kernel exploit for the latest version of Android. Finally, to defend this kind of memory collision, we propose two corresponding mitigation schemes.},
@ -25,139 +27,6 @@ title = {{From Collision To Exploitation: Unleashing Use-After-Free Vulnerabilit
url = {http://dl.acm.org/citation.cfm?doid=2810103.2813637},
year = {2015}
}
@article{Szekeres2013,
abstract = {Memory corruption bugs in software written in low-level languages like C or C++ are one of the oldest problems in computer security. The lack of safety in these languages allows attackers to alter the program's behavior or take full control over it by hijacking its control flow. This problem has existed for more than 30 years and a vast number of potential solutions have been proposed, yet memory corruption attacks continue to pose a serious threat. Real world exploits show that all currently deployed protections can be defeated. This paper sheds light on the primary reasons for this by describing attacks that succeed on today's systems. We systematize the current knowledge about various protection techniques by setting up a general model for memory corrup- tion attacks. Using this model we show what policies can stop which attacks. The model identifies weaknesses of currently deployed techniques, as well as other proposed protections enforcing stricter policies. We analyze the reasons why protection mechanisms imple- menting stricter polices are not deployed. To achieve wide adoption, protection mechanisms must support a multitude of features and must satisfy a host of requirements. Especially important is performance, as experience shows that only solutions whose overhead is in reasonable bounds get deployed. A comparison of different enforceable policies helps de- signers of new protection mechanisms in finding the balance between effectiveness (security) and efficiency.We identify some open research problems, and provide suggestions on improving the adoption of newer techniques.},
author = {Szekeres, L??szl?? and Payer, Mathias and Wei, Tao and Song, Dawn},
doi = {10.1109/SP.2013.13},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/SoK$\backslash$: Eternal War in Memory.pdf:pdf},
isbn = {9780769549774},
issn = {10816011},
journal = {Proceedings - IEEE Symposium on Security and Privacy},
pages = {48--62},
title = {{SoK: Eternal war in memory}},
year = {2013}
}
@article{Chisnall2015,
abstract = {We propose a new memory-safe interpretation of the C ab-stract machine that provides stronger protection to benefit security and debugging. Despite ambiguities in the specifi-cation intended to provide implementation flexibility, con-temporary implementations of C have converged on a mem-ory model similar to the PDP-11, the original target for C. This model lacks support for memory safety despite well-documented impacts on security and reliability. Attempts to change this model are often hampered by as-sumptions embedded in a large body of existing C code, dat-ing back to the memory model exposed by the original C compiler for the PDP-11. Our experience with attempting to implement a memory-safe variant of C on the CHERI ex-perimental microprocessor led us to identify a number of problematic idioms. We describe these as well as their in-teraction with existing memory safety schemes and the as-sumptions that they make beyond the requirements of the C specification. Finally, we refine the CHERI ISA and abstract model for C, by combining elements of the CHERI capabil-ity model and fat pointers, and present a softcore CPU that implements a C abstract machine that can run legacy C code with strong memory protection guarantees.},
author = {Chisnall, David and Rothwell, Colin and Watson, Robert N M and Woodruff, Jonathan and Vadera, Munraj and Moore, Simon W and Roe, Michael and Davis, Brooks and Neumann, Peter G},
doi = {10.1145/2694344.2694367},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Beyond the PDP-11$\backslash$: Architectural support for a memory-safe C abstract machine.pdf:pdf},
isbn = {9781450328357},
issn = {01635964},
journal = {Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems},
pages = {117--130},
title = {{Beyond the PDP-11 : Architectural support for a memory-safe C abstract machine}},
url = {http://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201503-asplos2015-cheri-cmachine.pdf},
year = {2015}
}
@article{Lattner2005,
abstract = {The LLVM Compiler Infrastructure (http://llvm.cs. uiuc.edu) is a$\backslash$nrobust system that is well suited for a wide variety of research$\backslash$nand development work. This brief paper introduces the LLVM system$\backslash$nand provides pointers to more extensive documentation, complementing$\backslash$nthe tutorial presented at LCPC.},
archivePrefix = {arXiv},
arxivId = {9780201398298},
author = {Lattner, Chris and Adve, Vikram},
doi = {10.1007/11532378_2},
eprint = {9780201398298},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/The LLVM Compiler Framework and Infrastructure Tutorial.pdf:pdf},
isbn = {978-3-540-28009-5},
issn = {03029743},
journal = {Languages and Compilers for High Performance Computing},
number = {Part 1},
pages = {15--16},
pmid = {4520227},
title = {{The LLVM Compiler Framework and Infrastructure Tutorial}},
url = {http://dx.doi.org/10.1007/11532378{\_}2},
year = {2005}
}
@article{Caballero2012,
abstract = {Use-after-free vulnerabilities are rapidly growing in popularity, especially for exploiting web browsers. Use-after-free (and double-free) vulnerabilities are caused by a program operating on a dangling pointer. In this work we propose early detection, a novel runtime approach for finding and diagnosing use-after-free and double-free vulnerabilities. While previous work focuses on the creation of the vulnerability (i.e., the use of a dangling pointer), early detection shifts the focus to the creation of the dangling pointer(s) at the root of the vulnerability. Early detection increases the effectiveness of testing by identifying unsafe dangling pointers in executions where they are created but not used. It also accelerates vulnerability analysis and minimizes the risk of incomplete fixes, by automatically collecting information about all dangling pointers involved in the vulnerability. We implement our early detection technique in a tool called Undangle. We evaluate Undangle for vulnerability analysis on 8 real-world vulnerabilities. The analysis uncovers that two separate vulnerabilities in Firefox had a common root cause and that their patches did not completely fix the underlying bug. We also evaluate Undangle for testing on the Firefox web browser identifying a potential vulnerability.},
author = {Caballero, Juan and Grieco, Gustavo and Marron, Mark and Nappa, Antonio},
doi = {10.1145/2338965.2336769},
isbn = {9781450314541},
issn = {1450314546},
journal = {ISSTA},
keywords = {automated testing,binary analysis,debugging,dynamic analysis},
pages = {133},
title = {{Undangle: early detection of dangling pointers in use-after-free and double-free vulnerabilities}},
url = {http://dl.acm.org/citation.cfm?doid=2338965.2336769},
year = {2012}
}
@article{Dhurjati2003,
abstract = {Traditional approaches to enforcing memory safety of programs rely heavily on runtime checks of memory accesses and on garbage collection, both of which are unattractive for embedded applications. The long-term goal of our work is to enable 100{\%} static enforcement of memory safety for embedded programs through advanced compiler techniques and minimal semantic restrictions on programs. The key result of this paper is a compiler technique that ensures memory safety of dynamically allocated memory without programmer annotations, runtime checks, or garbage collection, and works for a large subclass of type-safe C programs. The technique is based on a fully automatic pool allocation (i.e., region-inference) algorithm for C programs we developed previously, and it ensures safety of dynamically allocated memory while retaining explicit deallocation of individual objects within regions (to avoid garbage collection). For a diverse set of embedded C programs (and using a previous technique to avoid null pointer checks), we show that we are able to statically ensure the safety of pointer and dynamic memory usage in all these programs. We also describe some improvements over our previous work in static checking of array accesses. Overall, we achieve 100{\%} static enforcement of memory safety without new language syntax for a significant subclass of embedded C programs, and the subclass is much broader if array bounds checks are ignored.},
author = {Dhurjati, D and Kowshik, S and Adve, V and Lattner, C},
doi = {10.1145/780742.780743},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Memory Safety Without Runtime Checks or Garbage.pdf:pdf},
isbn = {0362-1340},
issn = {03621340},
journal = {Acm Sigplan Notices},
keywords = {automatic pool allocation,compilers,embedded systems,languages,programming languages,region management,security,static analysis},
number = {7},
pages = {69--80},
title = {{Memory safety without runtime checks or garbage collection}},
volume = {38},
year = {2003}
}
@book{AMD64Vol1,
author = {AMD},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/AMD64 Architecture Programmer's Manual Volume 1$\backslash$: Application Programming.pdf:pdf},
keywords = {AMD64,SIMD,extended media instructions,legacy m},
number = {26568},
title = {{AMD64 Architecture Programmer's Manual Volume 1: Application Programming}},
volume = {4},
year = {2012}
}
@article{Corporation2011a,
abstract = {The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 1, describes the basic architecture and programming environment of Intel 64 and IA-32 processors. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 2A {\&} 2B, describe the instruction set of the processor and the opcode struc- ture. These volumes apply to application programmers and to programmers who write operating systems or executives. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 3A {\&} 3B, describe the operating-system support environment of Intel 64 and IA-32 processors. These volumes target operating- system and BIOS designers. In addition, the Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 3B, addresses the programming environment for classes of software that host operating systems.},
author = {Corporation, Intel},
doi = {10.1109/MAHC.2010.22},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/64-ia-32-architectures-software-developer-vol-1-manual.pdf:pdf},
isbn = {253665-057US},
issn = {15222594},
journal = {System},
keywords = {253665,64,ia 32 architecture},
number = {253665},
title = {{Intel {\textregistered} 64 and IA-32 Architectures Software Developer ' s Manual Volume 1}},
volume = {1},
year = {2011}
}
@inproceedings{Kuznetsov2014,
abstract = {Systems code is often written in low-level languages like C/C++, which offer many benefits but also dele- gate memory management to programmers. This invites memory safety bugs that attackers can exploit to divert control flow and compromise the system. Deployed de- fense mechanisms (e.g., ASLR, DEP) are incomplete, and stronger defense mechanisms (e.g., CFI) often have high overhead and limited guarantees [19, 15, 9]. We introduce code-pointer integrity (CPI), a new de- sign point that guarantees the integrity of all code point- ers in a program (e.g., function pointers, saved return ad- dresses) and thereby prevents all control-flow hijack at- tacks, including return-oriented programming. We also introduce code-pointer separation (CPS), a relaxation of CPI with better performance properties. CPI and CPS offer substantially better security-to-overhead ratios than the state of the art, they are practical (we protect a complete FreeBSD system and over 100 packages like apache and postgresql), effective (prevent all attacks in the RIPE benchmark), and efficient: on SPEC CPU2006, CPS averages 1.2{\%} overhead for C and 1.9{\%} for C/C++, while CPI's overhead is 2.9{\%} for C and 8.4{\%} for C/C++. A prototype implementation of CPI and CPS can be obtained from http://levee.epfl.ch. 1},
author = {Kuznetsov, Volodymyr and Szekeres, L{\'{a}}szl{\'{o}} and Payer, Mathias},
booktitle = {Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation},
isbn = {9781931971164},
pages = {147--163},
title = {{Code-pointer integrity}},
url = {https://www.usenix.org/conference/osdi14/technical-sessions/presentation/kuznetsov{\%}5Cnhttps://www.usenix.org/system/files/conference/osdi14/osdi14-paper-kuznetsov.pdf?utm{\_}source=dlvr.it{\&}utm{\_}medium=tumblr},
year = {2014}
}
@article{Getreu2016,
author = {Getreu, Jens},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Embedded System Security with Rust - Case Study of Heartbleed.pdf:pdf},
pages = {1--24},
title = {{Embedded System Security with Rust}},
year = {2016}
}
@article{Affairs2015,
author = {Affairs, Post Doctoral},
file = {:home/steveej/src/steveej/msc-thesis/docs/You can't spell trust without Rust.pdf:pdf},
title = {{YOU CAN ' T SPELL TRUST WITHOUT RUST alexis beingessner Master ' s in Computer Science Carleton University}},
year = {2015}
}
@book{AMD64Vol2,
author = {AMD},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/AMD64 Architecture Programmer's Manual Volume 2$\backslash$: System Programming.pdf:pdf},
keywords = {24593,AMD64 Architecture Programmer's Manual Volume 2: S},
number = {24592},
title = {{AMD64 Architecture Programmer's Manual Volume 2: System Programming}},
volume = {1},
year = {2012}
}
@misc{MITRE-CWE-633,
author = {MITRE},
title = {{CWE-633: Weaknesses that Affect Memory}},
url = {http://cwe.mitre.org/data/definitions/633.html},
urldate = {2017-08-31},
year = {2017}
}
@inproceedings{Ma2013,
abstract = {—Aiming at the problem of higher memory consumption and lower execution efficiency during the dynamic detecting to C/C++ programs memory vulnerabilities, this paper presents a dynamic detection method called ISC. The ISC improves the Safe-C using pointer analysis technology. Firstly, the ISC defines a simple and efficient fat pointer representation instead of the safe pointer in the Safe-C. Furthermore, the ISC uses the unification-based analysis algorithm with one level flow static pointer. This identification reduces the number of pointers that need to be converted to fat pointers. Then in the process of program running, the ISC detects memory vulnerabilities through constantly inspecting the attributes of fat pointers. Experimental results indicate that the ISC could detect memory vulnerabilities such as buffer overflows and dangling pointers. Comparing with the Safe-C, the ISC dramatically reduces the memory consumption and lightly improves the execution efficiency.},
author = {Ma, Rui and Chen, Lingkui and Hu, Changzhen and Xue, Jingfeng and Zhao, Xiaolin},
@ -170,6 +39,11 @@ pages = {52--57},
title = {{A dynamic detection method to C/C++ programs memory vulnerabilities based on pointer analysis}},
year = {2013}
}
@misc{Endler,
author = {Endler, Matthias},
title = {{A curated list of static analysis tools, linters and code quality checkers for various programming languages}},
url = {https://github.com/mre/awesome-static-analysis}
}
@misc{MITRE-CWE,
author = {MITRE},
title = {{CWE - Common Weakness Enumeration}},
@ -177,19 +51,6 @@ url = {http://cwe.mitre.org},
urldate = {2017-08-31},
year = {2017}
}
@article{Levy2015a,
abstract = {Rust, a new systems programming language, provides compile-time memory safety checks to help eliminate runtime bugs that manifest from improper memory management. This feature is advantageous for operating system development, and especially for embedded OS development, where recovery and debugging are particularly challenging. However, embedded platforms are highly event-based, and Rust's memory safety mechanisms largely presume threads. In our experience developing an operating system for embedded systems in Rust, we have found that Rust's ownership model prevents otherwise safe resource sharing common in the embedded domain, conflicts with the reality of hardware resources, and hinders using closures for programming asynchronously. We describe these experiences and how they relate to memory safety as well as illustrate our workarounds that preserve the safety guarantees to the largest extent possible. In addition, we draw from our experience to propose a new language extension to Rust that would enable it to provide better memory safety tools for event-driven platforms.},
author = {Levy, Amit and Andersen, Michael P. and Campbell, Bradford and Culler, David and Dutta, Prabal and Ghena, Branden and Levis, Philip and Pannuto, Pat},
doi = {10.1145/2818302.2818306},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/tock-plos2015.pdf:pdf},
isbn = {9781450339421},
journal = {PLOS: Workshop on Programming Languages and Operating Systems},
keywords = {embedded operating systems,linear types,ownership,rust},
pages = {21--26},
title = {{Ownership is Theft: Experiences Building an Embedded OS in Rust}},
url = {http://dl.acm.org/citation.cfm?id=2818302.2818306},
year = {2015}
}
@article{Corporation2011,
abstract = {The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 1, describes the basic architecture and programming environment of Intel 64 and IA-32 processors. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 2A {\&} 2B, describe the instruction set of the processor and the opcode struc- ture. These volumes apply to application programmers and to programmers who write operating systems or executives. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 3A {\&} 3B, describe the operating-system support environment of Intel 64 and IA-32 processors. These volumes target operating- system and BIOS designers. In addition, the Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 3B, addresses the programming environment for classes of software that host operating systems.},
author = {Corporation, Intel},
@ -210,16 +71,17 @@ file = {:home/steveej/src/github/steveej/msc-thesis/docs/A Rust-based Runtime fo
title = {{A Rust-based Runtime for the Internet of Things}},
year = {2017}
}
@article{Arpaci-Dusseau2015,
abstract = {A book covering the fundamentals of operating systems, including virtualization of the CPU and memory, threads and concurrency, and file and storage systems. Written by professors active in the field for 20 years, this text has been developed in the classrooms of the University of Wisconsin-Madison, and has been used in the instruction of thousands of students.},
author = {{Arpaci-Dusseau Remzi}, Arpaci-Dusseau Andrea},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/operating{\_}systems{\_}{\_}three{\_}easy{\_}pieces{\_}{\_}electronic{\_}version{\_}0{\_}91{\_}.pdf:pdf},
journal = {Arpaci-Dusseau},
number = {0.91},
pages = {665},
title = {{Operating Systems: Three Easy Pieces}},
volume = {Electronic},
year = {2015}
@article{Szekeres2013,
abstract = {Memory corruption bugs in software written in low-level languages like C or C++ are one of the oldest problems in computer security. The lack of safety in these languages allows attackers to alter the program's behavior or take full control over it by hijacking its control flow. This problem has existed for more than 30 years and a vast number of potential solutions have been proposed, yet memory corruption attacks continue to pose a serious threat. Real world exploits show that all currently deployed protections can be defeated. This paper sheds light on the primary reasons for this by describing attacks that succeed on today's systems. We systematize the current knowledge about various protection techniques by setting up a general model for memory corrup- tion attacks. Using this model we show what policies can stop which attacks. The model identifies weaknesses of currently deployed techniques, as well as other proposed protections enforcing stricter policies. We analyze the reasons why protection mechanisms imple- menting stricter polices are not deployed. To achieve wide adoption, protection mechanisms must support a multitude of features and must satisfy a host of requirements. Especially important is performance, as experience shows that only solutions whose overhead is in reasonable bounds get deployed. A comparison of different enforceable policies helps de- signers of new protection mechanisms in finding the balance between effectiveness (security) and efficiency.We identify some open research problems, and provide suggestions on improving the adoption of newer techniques.},
author = {Szekeres, L??szl?? and Payer, Mathias and Wei, Tao and Song, Dawn},
doi = {10.1109/SP.2013.13},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/SoK$\backslash$: Eternal War in Memory.pdf:pdf},
isbn = {9780769549774},
issn = {10816011},
journal = {Proceedings - IEEE Symposium on Security and Privacy},
pages = {48--62},
title = {{SoK: Eternal war in memory}},
year = {2013}
}
@article{Reed2015,
abstract = {Rust is a new systems language that uses some advanced type system features, specifically affine types and regions, to statically guarantee memory safety and eliminate the need for a garbage collector. While each individual addition to the type system is well understood in isolation and are known to be sound, the combined system is not known to be sound. Furthermore, Rust uses a novel checking scheme for its regions, known as the Borrow Checker, that is not known to be correct. Since Rust's goal is to be a safer alternative to C/C++, we should ensure that this safety scheme actually works. We present a formal semantics that captures the key features relevant to memory safety, unique pointers and borrowed references, specifies how they guarantee memory safety, and describes the operation of the Borrow Checker. We use this model to prove the soudness of some core operations and justify the conjecture that the model, as a whole, is sound. Additionally, our model provides a syntactic version of the Borrow Checker, which may be more understandable than the non-syntactic version in Rust.},
@ -230,10 +92,155 @@ pages = {1--37},
title = {{Patina: A Formalization of the Rust Programming Language}},
year = {2015}
}
@misc{Endler,
author = {Endler, Matthias},
title = {{A curated list of static analysis tools, linters and code quality checkers for various programming languages}},
url = {https://github.com/mre/awesome-static-analysis}
@book{AMD64Vol1,
author = {AMD},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/AMD64 Architecture Programmer's Manual Volume 1$\backslash$: Application Programming.pdf:pdf},
keywords = {AMD64,SIMD,extended media instructions,legacy m},
number = {26568},
title = {{AMD64 Architecture Programmer's Manual Volume 1: Application Programming}},
volume = {4},
year = {2012}
}
@misc{MITRE-CWE-119,
author = {MITRE},
booktitle = {2.11},
title = {{CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer}},
url = {http://cwe.mitre.org/data/definitions/119.html},
urldate = {2017-08-31},
year = {2017}
}
@article{Backus1962,
abstract = {The report gives a defining description of the programming language Scheme. Scheme is a statically scoped and properly tail-recursive dialect of the Lisp programming language invented by Guy Lewis Steele, Jr. and Gerald Jay Sussman. It was designed to have an exceptionally clear and simple semantics and few different ways to form expressions. A wide variety of programming paradigms, including imperative, functional, and message passing styles, find convenient expression in Scheme.},
author = {Backus, J. W. and Bauer, F. L. and Green, J. and Katz, C. and McCarthy, J. and Naur, P. and Perlis, A. J. and Rutishauser, H. and Samelson, K. and Vauquois, B. and Wegstein, J. H. and van Wijngaarden, A. and Woodger, M. and van der Poel, W. L.},
doi = {10.1007/BF01386340},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Revised report on the algorithmic language Algol 60.pdf:pdf},
isbn = {9780521193993},
issn = {0029599X},
journal = {Numerische Mathematik},
number = {1},
pages = {420--453},
title = {{Revised report on the algorithmic language Algol 60}},
volume = {4},
year = {1962}
}
@book{AMD64Vol2,
author = {AMD},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/AMD64 Architecture Programmer's Manual Volume 2$\backslash$: System Programming.pdf:pdf},
keywords = {24593,AMD64 Architecture Programmer's Manual Volume 2: S},
number = {24592},
title = {{AMD64 Architecture Programmer's Manual Volume 2: System Programming}},
volume = {1},
year = {2012}
}
@article{Corporation2011a,
abstract = {The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 1, describes the basic architecture and programming environment of Intel 64 and IA-32 processors. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 2A {\&} 2B, describe the instruction set of the processor and the opcode struc- ture. These volumes apply to application programmers and to programmers who write operating systems or executives. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 3A {\&} 3B, describe the operating-system support environment of Intel 64 and IA-32 processors. These volumes target operating- system and BIOS designers. In addition, the Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 3B, addresses the programming environment for classes of software that host operating systems.},
author = {Corporation, Intel},
doi = {10.1109/MAHC.2010.22},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/64-ia-32-architectures-software-developer-vol-1-manual.pdf:pdf},
isbn = {253665-057US},
issn = {15222594},
journal = {System},
keywords = {253665,64,ia 32 architecture},
number = {253665},
title = {{Intel {\textregistered} 64 and IA-32 Architectures Software Developer ' s Manual Volume 1}},
volume = {1},
year = {2011}
}
@article{Caballero2012,
abstract = {Use-after-free vulnerabilities are rapidly growing in popularity, especially for exploiting web browsers. Use-after-free (and double-free) vulnerabilities are caused by a program operating on a dangling pointer. In this work we propose early detection, a novel runtime approach for finding and diagnosing use-after-free and double-free vulnerabilities. While previous work focuses on the creation of the vulnerability (i.e., the use of a dangling pointer), early detection shifts the focus to the creation of the dangling pointer(s) at the root of the vulnerability. Early detection increases the effectiveness of testing by identifying unsafe dangling pointers in executions where they are created but not used. It also accelerates vulnerability analysis and minimizes the risk of incomplete fixes, by automatically collecting information about all dangling pointers involved in the vulnerability. We implement our early detection technique in a tool called Undangle. We evaluate Undangle for vulnerability analysis on 8 real-world vulnerabilities. The analysis uncovers that two separate vulnerabilities in Firefox had a common root cause and that their patches did not completely fix the underlying bug. We also evaluate Undangle for testing on the Firefox web browser identifying a potential vulnerability.},
author = {Caballero, Juan and Grieco, Gustavo and Marron, Mark and Nappa, Antonio},
doi = {10.1145/2338965.2336769},
isbn = {9781450314541},
issn = {1450314546},
journal = {ISSTA},
keywords = {automated testing,binary analysis,debugging,dynamic analysis},
pages = {133},
title = {{Undangle: early detection of dangling pointers in use-after-free and double-free vulnerabilities}},
url = {http://dl.acm.org/citation.cfm?doid=2338965.2336769},
year = {2012}
}
@article{Balasubramanian2017,
abstract = {Rust is a new system programming language that offers a practical and safe alternative to C. Rust is unique in that it enforces safety without runtime overhead, most importantly, without the overhead of garbage collection. While zero-cost safety is remarkable on its own, we argue that the super-powers of Rust go beyond safety. In particular, Rust's linear type system enables capabilities that cannot be implemented efficiently in traditional languages, both safe and unsafe, and that dramatically improve security and reliability of system software. We show three examples of such capabilities: zero-copy software fault isolation, efficient static information flow analysis, and automatic checkpointing. While these capabilities have been in the spotlight of systems research for a long time, their practical use is hindered by high cost and complexity. We argue that with the adoption of Rust these mechanisms will become commoditized.},
author = {Balasubramanian, Abhiram and Baranowski, Marek S and Burtsev, Anton and Irvine, Uc and Rakamari, Zvonimir and Ryzhyk, Leonid and Research, Vmware},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/DRAFT$\backslash$: System Programming in Rust$\backslash$: Beyond Safety.pdf:pdf},
title = {{DRAFT: System Programming in Rust: Beyond Safety}},
year = {2017}
}
@inproceedings{Kuznetsov2014,
abstract = {Systems code is often written in low-level languages like C/C++, which offer many benefits but also dele- gate memory management to programmers. This invites memory safety bugs that attackers can exploit to divert control flow and compromise the system. Deployed de- fense mechanisms (e.g., ASLR, DEP) are incomplete, and stronger defense mechanisms (e.g., CFI) often have high overhead and limited guarantees [19, 15, 9]. We introduce code-pointer integrity (CPI), a new de- sign point that guarantees the integrity of all code point- ers in a program (e.g., function pointers, saved return ad- dresses) and thereby prevents all control-flow hijack at- tacks, including return-oriented programming. We also introduce code-pointer separation (CPS), a relaxation of CPI with better performance properties. CPI and CPS offer substantially better security-to-overhead ratios than the state of the art, they are practical (we protect a complete FreeBSD system and over 100 packages like apache and postgresql), effective (prevent all attacks in the RIPE benchmark), and efficient: on SPEC CPU2006, CPS averages 1.2{\%} overhead for C and 1.9{\%} for C/C++, while CPI's overhead is 2.9{\%} for C and 8.4{\%} for C/C++. A prototype implementation of CPI and CPS can be obtained from http://levee.epfl.ch. 1},
author = {Kuznetsov, Volodymyr and Szekeres, L{\'{a}}szl{\'{o}} and Payer, Mathias},
booktitle = {Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation},
isbn = {9781931971164},
pages = {147--163},
title = {{Code-pointer integrity}},
url = {https://www.usenix.org/conference/osdi14/technical-sessions/presentation/kuznetsov{\%}5Cnhttps://www.usenix.org/system/files/conference/osdi14/osdi14-paper-kuznetsov.pdf?utm{\_}source=dlvr.it{\&}utm{\_}medium=tumblr},
year = {2014}
}
@article{Levy2015a,
abstract = {Rust, a new systems programming language, provides compile-time memory safety checks to help eliminate runtime bugs that manifest from improper memory management. This feature is advantageous for operating system development, and especially for embedded OS development, where recovery and debugging are particularly challenging. However, embedded platforms are highly event-based, and Rust's memory safety mechanisms largely presume threads. In our experience developing an operating system for embedded systems in Rust, we have found that Rust's ownership model prevents otherwise safe resource sharing common in the embedded domain, conflicts with the reality of hardware resources, and hinders using closures for programming asynchronously. We describe these experiences and how they relate to memory safety as well as illustrate our workarounds that preserve the safety guarantees to the largest extent possible. In addition, we draw from our experience to propose a new language extension to Rust that would enable it to provide better memory safety tools for event-driven platforms.},
author = {Levy, Amit and Andersen, Michael P. and Campbell, Bradford and Culler, David and Dutta, Prabal and Ghena, Branden and Levis, Philip and Pannuto, Pat},
doi = {10.1145/2818302.2818306},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/tock-plos2015.pdf:pdf},
isbn = {9781450339421},
journal = {PLOS: Workshop on Programming Languages and Operating Systems},
keywords = {embedded operating systems,linear types,ownership,rust},
pages = {21--26},
title = {{Ownership is Theft: Experiences Building an Embedded OS in Rust}},
url = {http://dl.acm.org/citation.cfm?id=2818302.2818306},
year = {2015}
}
@article{Lattner2005,
abstract = {The LLVM Compiler Infrastructure (http://llvm.cs. uiuc.edu) is a$\backslash$nrobust system that is well suited for a wide variety of research$\backslash$nand development work. This brief paper introduces the LLVM system$\backslash$nand provides pointers to more extensive documentation, complementing$\backslash$nthe tutorial presented at LCPC.},
archivePrefix = {arXiv},
arxivId = {9780201398298},
author = {Lattner, Chris and Adve, Vikram},
doi = {10.1007/11532378_2},
eprint = {9780201398298},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/The LLVM Compiler Framework and Infrastructure Tutorial.pdf:pdf},
isbn = {978-3-540-28009-5},
issn = {03029743},
journal = {Languages and Compilers for High Performance Computing},
number = {Part 1},
pages = {15--16},
pmid = {4520227},
title = {{The LLVM Compiler Framework and Infrastructure Tutorial}},
url = {http://dx.doi.org/10.1007/11532378{\_}2},
year = {2005}
}
@misc{IEEEspectrum-proglangs,
author = {IEEE},
title = {{Interactive: The Top Programming Languages 2017}},
url = {https://spectrum.ieee.org/static/interactive-the-top-programming-languages-2017},
urldate = {2017-09-08},
year = {2017}
}
@article{Chisnall2015,
abstract = {We propose a new memory-safe interpretation of the C ab-stract machine that provides stronger protection to benefit security and debugging. Despite ambiguities in the specifi-cation intended to provide implementation flexibility, con-temporary implementations of C have converged on a mem-ory model similar to the PDP-11, the original target for C. This model lacks support for memory safety despite well-documented impacts on security and reliability. Attempts to change this model are often hampered by as-sumptions embedded in a large body of existing C code, dat-ing back to the memory model exposed by the original C compiler for the PDP-11. Our experience with attempting to implement a memory-safe variant of C on the CHERI ex-perimental microprocessor led us to identify a number of problematic idioms. We describe these as well as their in-teraction with existing memory safety schemes and the as-sumptions that they make beyond the requirements of the C specification. Finally, we refine the CHERI ISA and abstract model for C, by combining elements of the CHERI capabil-ity model and fat pointers, and present a softcore CPU that implements a C abstract machine that can run legacy C code with strong memory protection guarantees.},
author = {Chisnall, David and Rothwell, Colin and Watson, Robert N M and Woodruff, Jonathan and Vadera, Munraj and Moore, Simon W and Roe, Michael and Davis, Brooks and Neumann, Peter G},
doi = {10.1145/2694344.2694367},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Beyond the PDP-11$\backslash$: Architectural support for a memory-safe C abstract machine.pdf:pdf},
isbn = {9781450328357},
issn = {01635964},
journal = {Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems},
pages = {117--130},
title = {{Beyond the PDP-11 : Architectural support for a memory-safe C abstract machine}},
url = {http://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201503-asplos2015-cheri-cmachine.pdf},
year = {2015}
}
@article{Dhurjati2003,
abstract = {Traditional approaches to enforcing memory safety of programs rely heavily on runtime checks of memory accesses and on garbage collection, both of which are unattractive for embedded applications. The long-term goal of our work is to enable 100{\%} static enforcement of memory safety for embedded programs through advanced compiler techniques and minimal semantic restrictions on programs. The key result of this paper is a compiler technique that ensures memory safety of dynamically allocated memory without programmer annotations, runtime checks, or garbage collection, and works for a large subclass of type-safe C programs. The technique is based on a fully automatic pool allocation (i.e., region-inference) algorithm for C programs we developed previously, and it ensures safety of dynamically allocated memory while retaining explicit deallocation of individual objects within regions (to avoid garbage collection). For a diverse set of embedded C programs (and using a previous technique to avoid null pointer checks), we show that we are able to statically ensure the safety of pointer and dynamic memory usage in all these programs. We also describe some improvements over our previous work in static checking of array accesses. Overall, we achieve 100{\%} static enforcement of memory safety without new language syntax for a significant subclass of embedded C programs, and the subclass is much broader if array bounds checks are ignored.},
author = {Dhurjati, D and Kowshik, S and Adve, V and Lattner, C},
doi = {10.1145/780742.780743},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Memory Safety Without Runtime Checks or Garbage.pdf:pdf},
isbn = {0362-1340},
issn = {03621340},
journal = {Acm Sigplan Notices},
keywords = {automatic pool allocation,compilers,embedded systems,languages,programming languages,region management,security,static analysis},
number = {7},
pages = {69--80},
title = {{Memory safety without runtime checks or garbage collection}},
volume = {38},
year = {2003}
}
@article{Merity2016,
abstract = {Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either reproduce a word from the recent context or produce a word from a standard softmax classifier. Our pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. In order to evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora we also introduce the freely available WikiText corpus.},
@ -246,10 +253,27 @@ title = {{Pointer Sentinel Mixture Models}},
url = {http://arxiv.org/abs/1609.07843},
year = {2016}
}
@article{Balasubramanian2017,
abstract = {Rust is a new system programming language that offers a practical and safe alternative to C. Rust is unique in that it enforces safety without runtime overhead, most importantly, without the overhead of garbage collection. While zero-cost safety is remarkable on its own, we argue that the super-powers of Rust go beyond safety. In particular, Rust's linear type system enables capabilities that cannot be implemented efficiently in traditional languages, both safe and unsafe, and that dramatically improve security and reliability of system software. We show three examples of such capabilities: zero-copy software fault isolation, efficient static information flow analysis, and automatic checkpointing. While these capabilities have been in the spotlight of systems research for a long time, their practical use is hindered by high cost and complexity. We argue that with the adoption of Rust these mechanisms will become commoditized.},
author = {Balasubramanian, Abhiram and Baranowski, Marek S and Burtsev, Anton and Irvine, Uc and Rakamari, Zvonimir and Ryzhyk, Leonid and Research, Vmware},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/DRAFT$\backslash$: System Programming in Rust$\backslash$: Beyond Safety.pdf:pdf},
title = {{DRAFT: System Programming in Rust: Beyond Safety}},
@misc{MITRE-CWE-633,
author = {MITRE},
title = {{CWE-633: Weaknesses that Affect Memory}},
url = {http://cwe.mitre.org/data/definitions/633.html},
urldate = {2017-08-31},
year = {2017}
}
@article{Arpaci-Dusseau2015,
abstract = {A book covering the fundamentals of operating systems, including virtualization of the CPU and memory, threads and concurrency, and file and storage systems. Written by professors active in the field for 20 years, this text has been developed in the classrooms of the University of Wisconsin-Madison, and has been used in the instruction of thousands of students.},
author = {{Arpaci-Dusseau Remzi}, Arpaci-Dusseau Andrea},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/operating{\_}systems{\_}{\_}three{\_}easy{\_}pieces{\_}{\_}electronic{\_}version{\_}0{\_}91{\_}.pdf:pdf},
journal = {Arpaci-Dusseau},
number = {0.91},
pages = {665},
title = {{Operating Systems: Three Easy Pieces}},
volume = {Electronic},
year = {2015}
}
@article{Affairs2015,
author = {Affairs, Post Doctoral},
file = {:home/steveej/src/steveej/msc-thesis/docs/You can't spell trust without Rust.pdf:pdf},
title = {{YOU CAN ' T SPELL TRUST WITHOUT RUST alexis beingessner Master ' s in Computer Science Carleton University}},
year = {2015}
}

View file

@ -13,7 +13,7 @@
\usepackage{multirow,tabularx,tabu}
\usepackage{ctable,multirow,spreadtab}
\usepackage[backend=biber,style=numeric,url=true]{biblatex}
\usepackage[backend=biber,style=numeric,citestyle=numeric,url=true]{biblatex}
\addbibresource{thesis.bib}
%\usepackage[hyphens]{url}
@ -29,6 +29,14 @@
\usepackage{graphicx}
\usepackage{color}
\usepackage[parfill]{parskip}
\usepackage{amsmath}
\newcommand{\iitemA}{\setlength\itemindent{0pt}\item}
\newcommand{\iitemB}{\setlength\itemindent{25pt}\item}
\newcommand{\iitemC}{\setlength\itemindent{50pt}\item}
\newcommand{\topic}{Guarantees On In-Kernel Memory-Safety Using Rust's Static Code Analysis}
\newcommand{\authorOne}{Stefan Junker}
@ -137,17 +145,18 @@
\tableofcontents
\part{Context}
\label{context}
\printnoidxglossary
\include{parts/context/context}
\part{Research And Development}
\label{part:rnd}
\label{rnd}
\include{parts/research_and_development/research_and_development}
\part{Evaluation And Conclusion}
\label{part:eval_and_conclusion}
\label{eval_and_conclusion}
\include{parts/eval_and_conclusion/eval_and_conclusion}
\newpage