thesis/context: work on introduction

* Explain some of Memory-Management
* Explain CWE with relevant examples
* Add CWE-119 Statistics
This commit is contained in:
steveej 2017-08-31 22:31:56 +02:00
parent c32d440432
commit f50dd56fff
13 changed files with 638 additions and 395 deletions

10
.gitignore vendored
View file

@ -5,10 +5,10 @@
*.fls *.fls
*.lof *.lof
*.log *.log
*.lol
*.lot *.lot
*.synctex.gz *.synctex.gz
*.toc *.toc
src/docs/*.pdf
*.dvi *.dvi
*.glo *.glo
*.ist *.ist
@ -16,6 +16,12 @@ src/docs/*.pdf
.dot/* .dot/*
*.out *.out
*.bbl *.bbl
*-blx.bib
*.bcf
*.blg *.blg
*.run.xml
docs/ .vscode/
/src/docs/thesis.pdf
/docs/

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

View file

@ -97,7 +97,7 @@
}, },
} }
\newglossaryentry{Linux}{ \newglossaryentry{LX}{
name = Linux, name = Linux,
description = { description = {
is a generic term referring to the family of Unix-like is a generic term referring to the family of Unix-like
@ -159,7 +159,7 @@
\newglossaryentry{lxvfs}{ \newglossaryentry{lxvfs}{
name = Linux VFS, name = Linux VFS,
description = { description = {
Virtual Filesystem Switch, a filesystem abstraction layer in \gls{Linux}. Virtual Filesystem Switch, a filesystem abstraction layer in \gls{LX}.
}, },
} }
@ -186,6 +186,13 @@
} }
} }
\newglossaryentry{program} {
name = {program},
description = {
A set of logically grouped instructions.
},
}
\newglossaryentry{pm}{ \newglossaryentry{pm}{
name=package manager, name=package manager,
description={ description={
@ -255,15 +262,35 @@
\newglossaryentry{appc}{ \newglossaryentry{appc}{
name=App Container, name=App Container,
description={ description={
Specific variant of an \gls{sac} defined by the \gls{appcorg}. Specific variant of an \glsentrytext{sac} defined by the \glsentrytext{appcorg}.
} }
} }
\newglossaryentry{NVD}{
name = {NVD},
description = {https://nvd.nist.gov/},
long = {National Vulnerability Database},
first = {\glsentrylong{NVD}}
}
\newglossaryentry{CWE}{ \newglossaryentry{CWE}{
name=Common Weakness Enumeration, name = {CWE™},
description={ long = Common Weakness Enumeration,
A formal list of software weakness types. description = {a community-developed list of common software security weaknesses. It serves as a common language, a measuring stick for software security tools, and as a baseline for weakness identification, mitigation, and prevention efforts},
} first = {\glsentrylong{CWE}, "\glsentrydesc{CWE}"\cite{MITRE-CWE}}
}
\newglossaryentry{CWE-633}{
name = CWE-633,
description = {Weaknesses in this category affect memory resources},
first = {CWE-633: \glsentrydesc{CWE-633}\cite{MITRE-CWE-633}}
}
\newglossaryentry{CWE-119}{
name = CWE-119,
description = {Improper Restriction of Operations within the Bounds of a Memory Buffer},
short = {buffer error},
first = {CWE-119: \glsentrydesc{CWE-119}\cite{MITRE-CWE-119}}
} }
\newglossaryentry{C}{ \newglossaryentry{C}{
@ -273,25 +300,45 @@
} }
} }
\newglossaryentry{C++}{
name=C++,
, description={
A \glsentrytext {proglag} based on \glsentrytext{C}, enahnced by features like object-orientation, lambdas, and much more.
}
}
\newglossaryentry{asm}{ \newglossaryentry{asm}{
name=Assembly programming language, name=Assembly programming language,
, description={ description={
TODO ASM TODO ASM
} }
} }
\newglossaryentry{amd64}{
name = AMD64,
long = AMD64,
description={
TODO AMD64
},
first = {\glsentrylong{amd64}},
}
\newglossaryentry{CPU}{ \newglossaryentry{CPU}{
name=Central Processing Unit name = CPU,
, description={ long = Central Processing Unit,
description={
TODO CPU TODO CPU
} },
first = {\glsentrylong{CPU}},
} }
\newglossaryentry{MMU}{ \newglossaryentry{MMU}{
name=Memory Management Unit name = MMU,
, description={ long = Memory Management Unit,
description={
TODO MMU TODO MMU
} },
first = {\glsentrylong{MMU}},
} }
\newglossaryentry{sysadmin}{ \newglossaryentry{sysadmin}{
@ -307,3 +354,4 @@
TODO realtime TODO realtime
} }
} }

View file

@ -6,195 +6,267 @@ This thesis studies the feasibility of using compile-time code analysis, as foun
Because an \gls{OS} is nothing but a \gls{app}, this study could be applied to all \glspl{app}, but the focus is on the implementation of \glspl{OS} which is the \gls{app} that is responsible for managing the system's resources and provide abstractions for higher level applications. Because an \gls{OS} is nothing but a \gls{app}, this study could be applied to all \glspl{app}, but the focus is on the implementation of \glspl{OS} which is the \gls{app} that is responsible for managing the system's resources and provide abstractions for higher level applications.
The \gls{OS} is the only \gls{app} that required unrestricted access to these resources, with the task of managing them safely according to the rules that are either hard-coded or set up by the \gls{sysadmin}. The \gls{OS} is the only \gls{app} that required unrestricted access to these resources, with the task of managing them safely according to the rules that are either hard-coded or set up by the \gls{sysadmin}.
\section{Memory And Safety} \section{Motivational Hypothesis}
\label{context::introduction::memory-safety}
% In Chapter 1 this is a summary of the methodology and contains a brief outline of three things: (a) the participants in a qualitative study or the subjects of a quantitative study (human participants are referred tyo as participants, non-human subjects are referred to as subjects), (b) the instrumentation used to collect data, and (c) the procedure that will be followed. All of these elements will be reported in detail in Chapter 3. In a quantitative study, the instrumentation will be validated in Chapter 3 in detail. In a qualitative study, if it is a researcher-created questionnaire, validating the correctness of the interview protocol is usually accomplished with a pilot study. For either a quantitative or a qualitative study, using an already validated survey instrument is easier to defend and does not require a pilot study; however, Chapter 3 must contain a careful review of the instrument and how it was validated by the creator.
% In a qualitative study, which usually involves interviews, the instrumentation is an interview protocol a pre-determined set of questions that every participant is asked that are based on the primary research questions. A qualitative interview should contain no less than 10 open-ended questions and take no less than 1 hour to administer to qualify as “robust” research.
% In the humanities, a demographic survey should be circulated with most quantitative and qualitative studies to establish the parameters of the participant pool. Demographic surveys are nearly identical in most dissertations. In the sciences, a demographic survey is rarely needed.
Memory-safety is a term that is only vaguely defined in general, thus a definition is given for the context of this thesis.
For a thorough understanding of the issues discussed further in this document, it might be helpful to review the basics of how memory is used in current computer systems.
For decades computer systems or more specifically their \glspl{CPU} were designed to execute instructions that were previously loaded into volatile main memory, typically from a secondary, persistent memory.
These instructions are themselves able to alter the very main memory they are stored at, which allows for great flexibility but also involves the risk of corrupting a consistent chain of instructions or other memory content like data.
As any other \gls{app}, the \gls{OS} is executed in form of a set of logically grouped instructions, called a program.
Loading the \gls{OS}'s program into memory is not the responsibility of the \gls{OS}, it belongs to the components earlier in the boot process, namely the boot loader and system firmware.
The \gls{OS} takes over the responsibility to protect the main and secondary memory from the point where it is being handed control over by the bootloader.
Loading further programs into main memory is done by the \gls{OS}, either according to scheduled jobs set up by the \gls{sysadmin}, or based on well-defined events which can be triggered by any form of input via the system's interfaces.
For example, the \gls{OS} can load and execute a program stored on the hard-disk, after the user has gave the appropriate instructions via a terminal.
The execution of other programs is potentially dangerous, because the program might the memory content of other programs and their data.
It is the responsibility of the \gls{OS} to prevent programs from being able to interfere with each other under any circumstances, keeping the memory content in a consistent state at all times.
This requires an extensive amount of care and foresight from the developers of the \gls{OS}, to ensure memory consistency in any of the various events and combinations thereof that might possibly occur at runtime.
\subsection{A Definition Of Memory-Safety in the \glsentrytext{OS}}
\label{context::introduction::memory-safety::def}
If the \gls{OS} is memory-safe, any program, whether it is part of the \gls{OS} or any other \gls{app}, memory access is restricted to memory regions that have been allocated for this specific program, preventing it from reading and writing to memory regions of other programs.
\subsection{The Human Aspect}
\label{context::introduction::memory-safety::human-aspect}
Programs are written by humans which is an important aspect against memory-safety.
No human is born as a flawless software engineer.
Beginners will start writing programs before he or she masters this skill in perfection.
Also, with each generation of humans there will always be new beginners that will start learning from scratch.
This requires a sustainable method to prevent mistakes, especially such that have an impact on memory-safety.
Advanced programmers can profit too, as they also make mistakes on a regular basis, depending on their level of focus which can vary momentarily.
\subsection{Detecting Memory-Safety Violations - Before They Occur}
\label{context::introduction::memory-safety::detection}
The human aspect suggests that systems needs to be designed to be testable at first, and then tested thoroughly in order to mitigate the risks of erroneous software being executed by the end-user.
Besides the presence and quality of tests, their point in the software life cycle plays an important role.
The earliest tests can be as soon as the process of software development itself, and the latest ones can be at the time of execution on the production system of the end-user.
It is desirable to place tests as early as possible in the software life cycle, to prevent them from compromising running systems that hold sensitive data and offer important services.
The dimension of time can also be translated to hierarchically lower system components at run-time.
This suggests that the \gls{OS} must be tested before the other executed \glspl{app}, etc.
This can be easily explained.
From a \gls{app} perspective, testing every permutation of \gls{OS} runtime states can be impossible, because the \gls{app} can not freely mutate the system's state.
Even if it could, testing all possible permutations of system state is limited by time and resource restrictions.
That's why even disciplined software engineers write tests that only target common error cases, like system memory exhaustion, and ensure syntactic and semantic correctness for the \gls{app} being developed.
Edge cases that happen only under specific system circumstances, possibly influenced by other components on the system as described in the beginning of \autoref{context::introduction::memory-safety}, are at high risk of remaining untested, and the \gls{app} developer is forced to trust the underlying \gls{OS}.
This puts high importance on the safety of the \gls{OS} design and implementation.
\subsection{Abstraction: Safety vs. Functionality}
\label{context::introduction::memory-safety::abstr-safety-function}
In computer systems, safety and functionality are counter-proportional towards each other, because with increased functionality also grows complexity, and error cases become more difficult to find.
Applying this analogy to software development, during which the errors are created in the first place, might be misleading.
It might seem that the more abstraction is provided by a language, the higher the functionality is.
In fact, the opposite is the case.
Abstraction can be used to impose limits on what the programmer can instruct the system to do.
By defining an abstraction layer in form of a programming language, the language defines which of the underlying functionality will be exposed through it.
In addition, the language can introduce obligated rules that make the written program easier to analyze in an automated fashion, before it gets compiled into the underlying representation.
\section{Safety In Language Compilers And Static Analyzers}
\label{context::introduction::language-compilers-analyzers}
% The theoretical framework is the foundational theory that is used to provide a perspective upon which the study is based. There are hundreds of theories in the literature. For instance, if a study in the social sciences is about stress that may be causing teachers to quit, Apples Intensification Theory could be cited as the theory was that stress is cumulative and the result of continuing overlapping, progressively stringent responsibilities for teachers that eventually leads to the desire to quit. In the sciences, research about new species that may have evolved from older, extinct species would be based on the theory of evolution pioneered by Darwin.
% Some departments put the theoretical framework explanation in Chapter 1; some put it in Chapter 2.
In \autoref{context::introduction::memory-safety}, specifically in \autoref{context::introduction::memory-safety::detection}, it was explained that programming languages have direct impact on the memory-safety.
This section gives an example of how severe this impact is and explains the requirements on a \gls{OS} language.
\subsection{\glsentrytext{Linux} and \glsentrytext{C}: Zero Memory-Safety A Day}
% Significance of the Study
% The significance is a statement of why it is important to determine the answer to the gap in the knowledge, and is related to improving the human condition. The contribution to the body of knowledge is described, and summarizes who will be able to use the knowledge to make better decisions, improve policy, advance science, or other uses of the new information. The “new” data is the information used to fill the gap in the knowledge.
A very popular and widespread \gls{OS} is \gls{Linux} which is written in \gls{C} and some hardware specific \gls{asm} code.
Recent years have shown how prone it is to vulnerabilities that result from programming errors related to memory management.
A very recent and high impact vulnerability is known as CVE-2017-1000364\footnote{http://www.cvedetails.com/cve/CVE-2017-1000364/}, where \textit{"an issue was discovered in the size of the stack guard page on Linux, specifically a 4k stack guard page is not sufficiently large and can be "jumped" over (the stack guard page is bypassed)"}.
With the growing number of vulnerabilities, various solutions have been proposed to increase the safety of C, either with static code analysis or via generated checks imposed at runtime. (TODO: reference).
Static analysis are not very effective on a language that has not been designed to be safety-analyzed. TODO? reference?
For this reason there have been attempts to define subsets of the \gls{C} language that can be safety checked, TODO: refernces of Cyclone, CCured, etc..
Safety checks that are performed at runtime introduce a high degree of overhead, which makes it an nonviable option in the domain of \gls{OS} development, where many code paths must be very fast to ensure the operation of high speed I/O devices\cite{Balasubramanian2017} or tasks with \gls{realtime} requirements. (TODO: explain realtime requirements)
This has been forcing \gls{OS} developers to prioritize performance over safety. (TODO: reference)
Details about the challenge of writing code that does memory management safely, and related vulnerabilities are given further along in \autoref{chap:mmt}.
\subsection{\glsentrytext{OS} Programming Language Choice}
Criteria for the choice of programming language are much different from choosing a language for other types of \glspl{app}.
This is a list of what is required for implementing an \glspl{OS}
\begin{itemize}
\item{Raw access to \gls{CPU} instructions}
\item{Deterministic temporal behavior}
\end{itemize}
* TODO: put in some scientific background about static checks
* affine types
\section{Academic And Industrial Activities}
% Primary Research Questions % Primary Research Questions
% The primary research question is the basis for data collection and arises from the Purpose of the Study. There may be one, or there may be several. When the research is finished, the contribution to the knowledge will be the answer to these questions. Do not confuse the primary research questions with interview questions in a qualitative study, or survey questions in a quantitative study. The research questions in a qualitative study are followed by both a null and an alternate hypothesis. % The primary research question is the basis for data collection and arises from the Purpose of the Study. There may be one, or there may be several. When the research is finished, the contribution to the knowledge will be the answer to these questions. Do not confuse the primary research questions with interview questions in a qualitative study, or survey questions in a quantitative study. The research questions in a qualitative study are followed by both a null and an alternate hypothesis.
% Hypotheses % Hypotheses
% A hypothesis is a testable prediction for an observed phenomenon, namely, the gap in the knowledge. Each research question will have both a null and an alternative hypothesis in a quantitative study. Qualitative studies do not have hypotheses. The two hypotheses should follow the research question upon which they are based. Hypotheses are testable predictions to the gap in the knowledge. In a qualitative study the hypotheses are replaced with the primary research questions. % A hypothesis is a testable prediction for an observed phenomenon, namely, the gap in the knowledge. Each research question will have both a null and an alternative hypothesis in a quantitative study. Qualitative studies do not have hypotheses. The two hypotheses should follow the research question upon which they are based. Hypotheses are testable predictions to the gap in the knowledge. In a qualitative study the hypotheses are replaced with the primary research questions.
* TODO: mention paper's by tockos team
* TODO: mention electrolyte, formal verification for Rust %TODO: mention paper's by tockos team
%TODO: mention electrolyte, formal verification for Rust
According to my best-effort literature research in Q1/2017, the hypothesis that \textit{Rust's static code analysis can guarantee memory safety in the \gls{OS}} has not been studied explicitly. According to my best-effort literature research in Q1/2017, the hypothesis that \textit{Rust's static code analysis can guarantee memory safety in the \gls{OS}} has not been studied explicitly.
This is to my surprise, because as explained in more details in this chapter the situation in This is to my surprise, because as explained in \autoref{context::introduction::memory-safety}, memory-safety in \gls{OS} development is critical, and \gls{Rust} offers attractive features that might bring improvements, which is covered in \autoref{context::rust}.
\gls{OS} is critical and \gls{Rust} offers attractive features to help improve this situation. The hypothesis cannot be trivially approved or denied, which drives the research efforts for my final thesis project.
However, the hypothesis cannot be trivially approved or denied, which drives the research efforts for my final thesis project.
Besides this specific hypothesis, many implementations of \glspl{OS} with \gls{Rust} have appeared in public. Besides this specific hypothesis, many implementations of \glspl{OS} with \gls{Rust} have appeared in public.
These range from proof-of-concept and educational work like \gls{imezzos} and \gls{blogos}, to implementations that aim to be production grade software like \gls{redoxos} and \gls{tockos}. Their purposes range from proof-of-concept and educational work like \gls{imezzos} and \gls{blogos}, to implementations that aim to be production grade software like \gls{redoxos} and \gls{tockos}.
These implementations are subject to evaluation in \ref{part:rnd}.
% Purpose of the Study The final results presented will be of qualitative nature, captured by analyzing the existing and a self-developed \gls{Rust}-implementations of popular memory management techniques.
%The Purpose of the Study is a statement contained within one or two paragraphs that identifies the research design, such as qualitative, quantitative, mixed methods, ethnographic, or another design. The research variables, if a quantitative study, are identified, for instance, independent, dependent, comparisons, relationships, or other variables. The population that will be used is identified, whether it will be randomly or purposively chosen, and the location of the study is summarized. Most of these factors will be discussed in detail in Chapter 3.
The results will be of qualitative nature, captured by analyzing existing and a self-developed \gls{Rust}-implementations of popular memory management techniques.
In addition to the sole analysis of \gls{Rust}-implementations, comparisons will be made, discerning the level of memory safety guarantees gained over similarly intending implementations in \gls{C}. In addition to the sole analysis of \gls{Rust}-implementations, comparisons will be made, discerning the level of memory safety guarantees gained over similarly intending implementations in \gls{C}.
\section{Assumptions, Limitations, and Scope (Delimitations)} \section{Assessing Memory-Safety}
% Assumptions are self-evident truths. In a qualitative study, it may be assumed that participants be highly qualified in the study is about administrators. It can be assumed that participants will answer truthfully and accurately to the interview questions based on their personal experience, and that participants will respond honestly and to the best of their individual abilities. \label{context::introduction::memory-safety}
% Limitations of a study are those things over which the research has no control. Evident limitations are potential weaknesses of a study. Researcher biases and perceptual misrepresentations are potential limitations in a qualitative study; in a quantitative study, a limitation may be the capability of an instrument to accurately record data. Memory-safety is a term that is only vaguely defined in general, thus a definition is given for the context of this thesis.
For a thorough understanding of the issues discussed further in this document, it might be helpful to review the basics of how memory is used in current computer systems.
% Scope is the extent of the study and contains measurements. In a qualitative study this would include the number of participants, the geographical location, and other pertinent numerical data. In a quantitative study the size of the elements of the experiment are cited. The generalizability of the study may be cited. The word generalizability, which is not in the Word 2007 dictionary, means the extent to which the data are applicable in places other than where the study took place, or under what conditions the study took place. For decades computer systems, more specifically their \glspl{CPU}, were designed to execute instructions that were previously loaded into volatile main memory, typically from a secondary, persistent memory.
% Delimitations are limitations on the research design imposed deliberately by the researcher. Delimitations in a social sciences study would be such things as the specific school district where a study took place, or in a scientific study, the number of repetitions. These instructions are themselves able to alter the very main memory they are stored at, which allows for great flexibility but also involves the risk of corrupting a consistent chain of instructions or other memory content like data.
\section{Premised Trust In Hardware} As any other \gls{app}, the \gls{OS} is loaded and executed in form of one or multiple sets of logically grouped instructions, called \glspl{program}.
* TODO: is it worth to explain ECC? Loading the \gls{OS}'s program into memory is not the responsibility of the \gls{OS}, it belongs to the components earlier in the boot process, namely the boot loader and system firmware.
* TODO: explain that the hardware might be unsafe but this is not in scope of the thesis The \gls{OS} takes over the responsibility to protect the main and secondary memory from the point where it is being handed control over by the bootloader.
Loading further programs into main memory is done by the \gls{OS}, either according to scheduled jobs set up by the \gls{sysadmin}, or based on well-defined events which can be triggered by any form of input via the system's interfaces.
For example, the \gls{OS} can load and execute a program stored on the hard-disk, after the user has gave the appropriate instructions via a terminal.
The execution of other programs is potentially dangerous, because they might attempt to access the memory content of other programs and their data.
\section{Recap} It is the responsibility of the \gls{OS} to prevent executed programs from being able to mutually interfere with memory content that is not theirs, keeping the memory in a safe state at all times \footnote{This does not include memory-safety \textit{within} each of these executed programs, as the \gls{OS} has no pertinent knowledge of the program's intentions.}.
% Summarize the content of Chapter 1 and preview of content of Chapter 2. This requires an extensive amount of care and foresight from the developers of the \gls{OS}, to ensure memory consistency in any of the various events and combinations thereof that might possibly occur at runtime.
\label{chap:mmt}
The \autoref{chap:mmt} gives a detailed introduction to memory management in contemporary architectures and \glspl{OS}.
\chapter{Sophisticated Memory Management Techniques} \subsection{A Definition Of Memory-Safety For \glsentryplural{OS}}
* TODO: in the beginnings application software had full control over memory \label{context::introduction::memory-safety::def}
* TODO: from single-job via batch systems to multiprocessing If the \gls{OS} is memory-safe, any program, whether it is part of the \gls{OS} or any installed \gls{app}, is only able to access its allocated memory regions.
Additionally, if the \gls{OS} supports shared memory regions, each shared memory region may only be accessible by programs that have been granted access to it.
As the result of collaborations between hard- and software developers, the memory management task in the \gls{OS} can be partially delegated to the \gls{CPU}'s \gls{MMU}. \section{Memory-Safety Violation in Software}
A complete understanding of this task is necessary in order to reason about it's safety. \label{context::introduction::memory-safety-violation-in-sw}
This chapter provides an introduction to hardware-supported memory-management and protection techniques for the x86\_64 architecture. Software that has memory-safety violations is vulnerable to random crashes and intentional attacks.
This is why information on safety related mistakes in software shouldn't be publicly available immediately.
Ideally, before the vulnerability is publicly known, all systems that run the erroneous software in question should have the chance to update the software is question, so that any potential attackers can't leverage the known vulnerability.
This introduces a dilemma, because software updates usually contain publicly known information, at least in the open-source sector.
\section{Resource Abstraction: Protection And Efficiency} Any existing or hypothetical solution to this dilemma is not in scope of this thesis, but two conclusions can be made.
* TODO: recap that management has been motivated by multiprocessing without side-effects First, public statistics in the area of software vulnerabilities are questionable with regard to their completeness.
* TODO: brief history and market share of x86\_64 processors and ARM Second, and more importantly, memory-safety related software mistakes should be detected as early as possible, ideally before the software is released and installed anywhere.
\section{Virtual Addresses} \subsection{Human Aspects}
* TODO: describe dynamic (relocatable) addresses \label{context::introduction::human-aspect}
* TODO: describe swapping To detect software mistakes early, it is helpful to analyze where they originate.
* TODO: describe virtual address This section emphasizes the fact that software - even if software-generators are interleaved - is ultimately produced by humans.
This aspect is relevant to assessing the origins of memory-safety related errors, as only errors made by humans during any stage of the development process can lead to unsafe memory access at runtime.
The following assumptions are made based on common sense
\begin{itemize}
\item{No human is born as a flawless software engineer.}
\item{Beginners will start writing programs before they master this skill in perfection.}
\item{With each generation of humans there will always be new beginners that will start learning from scratch.}
\item{Capabilities and motivation vary significantly between individuals.}
\item{Less capable or motivated individuals will eventually write software for production use.}
\item{Education is not ideal.}
\end{itemize}
Combining these assumptions, it cannot generally be assumed that every beginner that writes software has learned about the involved risks, and is determined and capable to ensure memory-safety and other high quality standards in their software.
% * TODO: parse http://wiki.osdev.org/Memory_Management_Unit From my personal experience with software developers and students of software engineering, I have received the impression that many do not prioritize safety in their software.
\section{Paging} The most severe example for this in my personal career is a former team partner in one of our \gls{C}/\gls{C++} programming courses.
* TODO: describe Despite the fact that the professor instructed us to use valgrind\footnote{a runtime memory analyzer and debugger} to verify our programs, my partner was satisfied with the result after writing the algorithms to his best understanding and correcting all errors detected by the \gls{compiler}.
Discussing the topic with him did not lead to any understanding on his side, and even after verifying that his program had easily detectable memory issues, he insisted on the correct result of the algorithm and pointed out the lack of time.
I realized similar mindset in some of the other teams.
\subsection{Multi-Level Paging} This personal experience is no scientific proof nor is it statistically significant.
It does create a feeling of insecurity, because if their software is distributed widely a few of these people are enough to risk the security of thousands of systems.
\subsection{Top-Level Page Table Self-Reference} Plenty of educational, economical or methodological solutions are imaginable for this problem.
Higher focus on safety and testing in education, enforced internal company guidelines, or industry wide third party software certification requirements can be attempted.
For this thesis such constraints are out of scope, and the focus is on examining technical methods that detect and indicate mistakes as early as possible.
\subsection{Caching Lookups} \subsection{Technical Aspect}
The problem on the technical side is that the \gls{compiler} was not able to detect all errors that are in the source code and the human was able to produce an executable program.
The resulting executable program might merely serve its purpose, and can contain severe technical mistakes that are not considered an error by the \gls{compiler}.
This is especially likely in low-abstraction languages like \gls{C}, where technical mistakes and intended behavior are difficult to distinguish.
\subsection{Full Example} \section{Hardware-supported Memory-Management}
This section provides an overview of hardware-supported memory-management and protection techniques, which are necessary to understand in order to reason about memory-safety in the \gls{OS}.
To keep this section as short as possible, 64-Bit mode as described in \cite{AMD64Vol2} is assumed.
To effects of this are, in short, that the system relies primarily on paging memory management, thus memory segmentation can be neglected in this context.
To improve the efficiency and safety of memory-management, developers of hardware and software have been collaborating to offload some memory-management operations from the \gls{OS} to the \gls{CPU}'s \gls{MMU}.
This improves speed and adds runtime memory permission checks\cite[p. 117]{AMD64Vol2}.
\subsection{Virtualization - Challenges Of Multitasking}
In order to concurrently run multiple programs easily and presumably safely, the \gls{OS} conducts virtualization of the \gls{CPU}, memory and other resources\cite{Arpaci-Dusseau2015}.
This allows to perform preemptive multitasking transparently to the programs at runtime, which means that it has no side-effects on the running programs and it needs not be considered during \gls{app} development.
\subsubsection{Task Switching}
When the \gls{OS} preempts a task it needs to store and preserve the current task's context in a well-known and protected memory location, so that it can be restored when this task is resumed.
The context consists of all volatile resources that can possibly be overwritten by another task.
This is at minimum a set of \gls{CPU} registers depending on the specific architecture.
For \gls{amd64}, see \autoref{tab:task-minimum-context-registers}.
\begin{table}
\begin{tabularx}{\textwidth}{| c | X | X |}
\hline
\textbf{descriptive name} &
\textbf{register names on amd64} &
\textbf{description} \\
\hline
the instruction pointer register & RIP & address of the next instruction to be fetched \\
\hline
the instruction pointer register & RIP & address of the next instruction to be fetched \\
\hline
all general-purpose registers & RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8R15 & any data \\
\hline
the instruction pointer register & RIP & address of the next instruction to be fetched \\
\hline
the stack pointer register & RSP & address of current position in stack \\
\hline
the flags register & RFLAGS & various attributes, e.g. the interrupt flag \\
\hline
the instruction pointer register & RIP & address of the next instruction to be fetched \\
\hline
\end{tabularx}
\caption{Minimum Context Registers on amd64\cite[p. 28]{AMD64Vol2}}
\label{tab:task-minimum-context-registers}
\end{table}
\subsection{Virtual Address Translation and Paging}
% TODO: why virtual addressing?
On \gls{amd64}, the software's instructions use virtual memory addresses, which are translated to physical memory addresses by the \gls{MMU} of the \gls{CPU} at the time the instructions are executed.
The responsibility falls onto the \gls{OS}, thus \gls{app} developers don't have to consider paging in the logic of their programs.
To avoid the need for storing a translation mapping for every possible address, mappings are grouped into fixed-size pieces, called \textit{page}s.
This works by encoding the offset within the page into virtual address, together with the index into the translation array, which is an array commonly called the \textit{page table}.
The translation itself is performed by the \gls{MMU} according to a map that is called page table, which is a structure maintained in memory by the \gls{OS}.
This memory structure can be stored anywhere in memory, and the address is handed to the \gls{MMU} via a specific \gls{CPU} register, which is \textit{CR3} on \gls{amd64}.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/Virtual-to-Physical-Address-Translation-Long-Mode.png}
\caption{Virtual to Physical Address in Long Mode\cite{AMD64Vol2}}
\label{fig:virtual-addr-transl}
\end{figure}
\subsubsection{Multi-Level Paging}
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/amd64-4kb-page-translation-long-mode}
\caption{4-Kbyte Page Translation—Long Mode\cite{AMD64Vol2}}
\label{fig:4kb-page-transl}
\end{figure}
\subsubsection{Top-Level Page Table Self-Reference}
\subsubsection{Caching Lookups}
\subsubsection{Full Example}
* http://taptipalit.blogspot.de/2013/10/theory-recursive-mapping-page.html * http://taptipalit.blogspot.de/2013/10/theory-recursive-mapping-page.html
* https://www.coresecurity.com/blog/getting-physical-extreme-abuse-of-intel-based-paging-systems-part-2-windows * https://www.coresecurity.com/blog/getting-physical-extreme-abuse-of-intel-based-paging-systems-part-2-windows
\subsubsection{Swapping}
The physical memory can only hold a limited number of pages, and the \gls{OS} is responsible to swap the pages into and from physical memory from and to a persistent memory.
Swapping is only mentioned for the sake of completeness, and is not further pursued in this thesis.
\section{Stack And Heap Concept}
\section{Memory Allocation} \subsection{Premised Trust In Hardware}
\chapter{Memory-Related Software-Programming Weaknesses} \subsection{Stack And Heap Concept}
\label{chap:context.mem-weaknesses}
Software vulnerabilities can be categorized by their underlying weaknesses.
This chapter explains the weaknesses of interest for this project and gives concrete examples for their manifestation.
\section{Weakness Categories} \subsection{Memory Allocation}
This work focuses on the following weaknesses defined in the \gls{CWE}
\chapter{Common Memory-Safety Mistakes}
\label{chap:context:common-mem-safety-mistakes}
Building upon \autoref{context::introduction}, which describes the basic mechanics of memory usage and how mistakes come to existence, this chapter explains some of the most common software vulnerabilities that are related to memory-safety.
The relevant vulnerability classes are explained alongside exemplary manifestations in \gls{C}/\gls{C++}.
In \autoref{rnd::porting-c-vulns}, these are ported and compared to functionally equivalent versions written in \gls{Rust}.
\section{\glsentrylong{CWE}}
Ongoing effort of collecting, analyzing and classifying vulnerabilities and their underlying weaknesses has been expended by the \textit{The MITRE Corporation} in form of the \gls{CWE}.
It has grown to a large relational database of typed weaknesses.
The following information is provided for enumerations of the type weakness class:
\begin{itemize} \begin{itemize}
\item{Improper Restriction of Operations within the Bounds of a Memory Buffer} \item Description
https://cwe.mitre.org/data/definitions/119.html \item Applicable Platforms
\item Common Consequences
% TODO: find more \item Likelihood of Exploit
\item Demonstrative Examples
\item Potential Mitigations
\item Relationships
\end{itemize} \end{itemize}
\section{Manifestation Examples} \subsection{Relevant Weaknesses}
The relevant weakness for this thesis are \gls{CWE-633} and respectively all of its children, as it serves as an umbrella weakness.
% TODO test the autocite command with footnotes
One of its children, \citep{MITRE-CWE-119}, is particularly interesting.
If this weakness is manifested, a direct violation of the memory-safety defined in \autoref{context::introduction::memory-safety::def} must have occurred, which "can cause read or write operations to be performed on memory locations that may be associated with other variables, data structures, or internal program data.
As a result, an attacker may be able to execute arbitrary code, alter the intended control flow, read sensitive information, or cause the system to crash"\cite{MITRE-CWE-119}.
This can happen on certain languages, which "allow direct addressing of memory locations and do not automatically ensure that these locations are valid for the memory buffer that is being referenced.
\gls{C}, \gls{C++}, \gls{asm} and languages without memory management support"\cite{MITRE-CWE-119}.
The documented formulation of languages prone to this weakness is incorrect, as it doesn't conform with the earlier statement of languages that "allow direct addressing of memory locations".
Direct memory addressing support doesn't imply a lack of memory management support.
Interestingly there are languages - like \gls{Rust} - that provide memory management support and still allow direct memory addressing.
This will be explained in \autoref{context::rust} in more detail.
\subsection{Statistics}
This section presents data with the intention of expressing the weakness's severity in real-world software.
The data is based on publicly available sources, thus the completeness of is questionable, because many organizations might choose to not disclose their vulnerabilities, either to protect their reputation or for security reasons as already explained in \autoref{context::introduction::memory-safety-violation-in-sw}.
\subsubsection{NVD's CWE-119 Statistics}
The data and visualizations are supplied by the \gls{NVD}, which collects the data based on the umbrella weakness CWE-635\footnote{http://cwe.mitre.org/data/definitions/635.html} that was specifically created for the \gls{NVD}.
\autoref{fig:vulnerability-ratio-history} and \autoref{fig:vulnerability-counts-history} display statistics on vulnerabilities grouped by their \gls{CWE} category.
Only the most significant categories are labeled in these figures, the rest is grouped as \textit{other}.
The category \textit{buffer\footnote{A limited chunk of memory used by programs to store various data} errors} represents \autocite{MITRE-CWE-119}.
\begin{table}
\centering
\begin{spreadtab}{{tabular}{ c | c | c }}
@ Year & @ \% & @ count \\
\hline
@ 2007 & 6.75 & 490 \\
@ 2008 & 10.01 & 550 \\
@ 2009 & 9.84 & 530 \\
@ 2010 & 11.58 & 530 \\
@ 2011 & 15.95 & 600 \\
@ 2012 & 13.67 & 650 \\
@ 2013 & 14.63 & 670 \\
@ 2014 & 9.69 & 800 \\
@ 2015 & 15.18 & 1050 \\
@ 2016 & 18.46 & 1150 \\
@ 2017 & 16.34 & @ - \\
\hline
@ Average & :={round(sum([0,-11]:[0,-1])/11, 2)} & @- \\
\end{spreadtab}
\caption{Vulnerability \textit{"buffer error"} Counts History}
\label{tab:vulnerability-buffer-error-by-history}
\end{table}
\begin{figure}
\centering
\includegraphics[width=\textwidth]{gfx/Relative-Vulnerability-Type-Totals-By-Year}
\caption{Vulnerability Relative Counts History}
\label{fig:vulnerability-ratio-history}
\includegraphics[width=\textwidth]{gfx/Vulnerability-Type-Change-by-Year}
\caption{Vulnerability Absolute Counts History}
\label{fig:vulnerability-counts-history}
\end{figure}
In \autoref{tab:vulnerability-buffer-error-by-history}, the column \textit{relative count} represents \autoref{fig:vulnerability-ratio-history}, and the column \textit{absolute count} represents \autoref{fig:vulnerability-counts-history}.
With 16.34 percent of all vulnerabilities known by 2017, and an average of 12.92 percent over the last 10 years, \gls{CWE-119} is to be taken seriously.
\section{Example Manifestations}
\subsection{Uninitialized Pointers} \subsection{Uninitialized Pointers}
@ -228,14 +300,101 @@ if (ptr == NULL) {
} }
\end{lstlisting} \end{lstlisting}
\subsection{TODO: more} \section{The Stack Clash}
A recent and high impact vulnerability named \textit{Stack Clash}\footnote{https://blog.qualys.com/securitylabs/2017/06/19/the-stack-clash}, is briefly described as \textit{"a vulnerability in the memory management of several operating systems. It affects Linux, OpenBSD, NetBSD, FreeBSD and Solaris, on i386 and amd64. It can be exploited by attackers to corrupt memory and execute arbitrary code."}
The \gls{LX} specific vulnerability is listed as CVE-2017-1000364\footnote{http://www.cvedetails.com/cve/CVE-2017-1000364/}, where \textit{"an issue was discovered in the size of the stack guard page on Linux, specifically a 4k stack guard page is not sufficiently large and can be "jumped" over (the stack guard page is bypassed)"}.
% TODO: more references and deeper explanation of what happens: see introduction in https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
\chapter{Safe \gls{OS} Development}
\label{context::introduciton::safe-os-dev}
This section gives a brief summary of relevant concepts of \gls{OS} development on common hardware platforms, focusing on memory management and its risks.
In order to protect the memory of each executed program according to \autoref{context::introduction::memory-safety::def}, the \gls{OS} must be designed, developed, and tested carefully.
\subsection{Detecting Memory-Safety Violations ASAP}
\label{context::safe-os-dev::detecting-safety-violations-asap}
Given that it can not be prevented for individuals to type erroneous code into their code editors.
Ideally, the \gls{compiler} should be able to detect the programmers technical mistakes, especially the ones that have a negative impact on memory-safety.
Not only beginners or sloppy programmers, but advanced programmers can profit too.
Everybody makes mistakes from time to time, depending on the level of focus which is not a constant.
The human aspect suggests that systems needs to be designed to be testable, and then tested thoroughly in order to mitigate the risks of erroneous software being executed by the end-user.
In addition to the presence and quality of tests, their timing in the software life cycle plays an important role.
The earliest tests can be as soon as the process of software development itself, and the latest ones can be at the time of execution on the production system of the end-user.
It is desirable to place tests as early as possible in the software life cycle, to prevent them from compromising running systems that hold sensitive data and offer important services.
The dimension of time can also be translated to hierarchically lower system components at run-time.
This suggests that the \gls{OS} must be tested before the other executed \glspl{app}, etc.
This can be easily explained.
From a \gls{app} perspective, testing every permutation of \gls{OS} runtime states can be impossible, because the \gls{app} can not freely mutate the system's state.
Even if it could, testing all possible permutations of system state is limited by time and resource restrictions.
That's why even disciplined software engineers write tests that only target common error cases, like system memory exhaustion, and ensure syntactic and semantic correctness for the \gls{app} being developed.
Edge cases that happen only under specific system circumstances, possibly influenced by other components on the system as described in the beginning of \autoref{context::introduction::memory-safety}, are at high risk of remaining untested, and the \gls{app} developer is forced to trust the underlying \gls{OS}.
This puts high importance on the safety of the \gls{OS} design and implementation.
\subsection{The Effects Of \Glspl{proglang} on Memory-Safety}
There are dozens of \glspl{proglang} used by humans to write \glspl{app}, but only a few are used to write \glspl{OS}.
\subsubsection{Abstraction: Safety vs. Functionality}
\label{context::introduction::memory-safety::abstr-safety-function}
In computer systems, safety and functionality are counter-proportional towards each other, because with increased functionality also grows complexity, and error cases become more difficult to find.
Applying this analogy to software development, during which the errors are created in the first place, might be misleading.
It might seem that the more abstraction is provided by a language, the higher the available functionality is.
In fact, the opposite is the case.
Abstraction can be used to impose limits on what the programmer can instruct the system to do.
By defining an abstraction layer in form of a programming language, the language defines which of the underlying functionality will be exposed through it.
, the language can introduce obligated rules that make the written program easier to analyze in an automated fashion, before it gets compiled into the underlying representation.
\section{Safety In Language Compilers And Static Analyzers}
\label{context::introduction::language-compilers-analyzers}
In \autoref{context::introduction::memory-safety}, specifically in \autoref{context::introduction::memory-safety::detection}, it was explained that programming languages have direct impact on the memory-safety.
This section gives an example of how severe this impact is and explains the requirements on a \gls{OS} language.
\chapter{CWE Examples} % TODO is this chapter required?
% Significance of the Study
% The significance is a statement of why it is important to determine the answer to the gap in the knowledge, and is related to improving the human condition. The contribution to the body of knowledge is described, and summarizes who will be able to use the knowledge to make better decisions, improve policy, advance science, or other uses of the new information. The “new” data is the information used to fill the gap in the knowledge.
One of the main reasons for me to work on this topic is the increasing number of vulnerabilities based on memory-safety issues, represented by the statistics shown in \autoref{TODO}
\section{Linux and C}
A very popular and widespread \gls{OS} is \gls{LX} which is written in \gls{C} and some hardware specific \gls{asm} code.
Recent years have shown how prone it is to vulnerabilities that result from programming errors related to memory management.
With the growing number of vulnerabilities, various solutions have been proposed to increase the safety of C, either with static code analysis or via \gls{compiler}-generated checks imposed at runtime. (TODO: reference).
Static analysis are not very effective on a language that has not been designed to be safety-analyzed. TODO? reference?
For this reason there have been attempts to define subsets of the \gls{C} language that can be safety checked, TODO: refernces of Cyclone, CCured, etc..
Safety checks that are performed at runtime introduce a high degree of overhead, which makes it a nonviable option in the domain of \gls{OS} development, where many code paths must be very fast to ensure the operation of high speed I/O devices\cite{Balasubramanian2017} or tasks with \gls{realtime} requirements. (TODO: explain realtime requirements)
This has been forcing \gls{OS} developers to prioritize performance over safety. (TODO: reference)
Details about the challenge of writing code that does memory management safely, and related vulnerabilities are given further along in \autoref{chap:mmt}.
\section{Choice of \Glsentrytext{proglang} Choice}
Criteria for the choice of programming language are much different from choosing a language for other types of \glspl{app}.
This is a list of what is required for implementing an \glspl{OS}
\begin{itemize}
\item{Raw access to \gls{CPU} instructions}
\item{Deterministic temporal behavior}
\end{itemize}
% TODO: put in some scientific background about static checks
% * affine types
\chapter{Memory-Safety Analysis Techniques} \chapter{Memory-Safety Analysis Techniques}
As per the previous \autoref{chap:context.mem-weaknesses} there is general awareness of the problems, and there has been ongoing effort to develop and improve techniques that assist the programmer to detect and avoid such mistakes first- or secondhand. As per the previous \autoref{chap:context.mem-weaknesses} there is general awareness of the problems, and there has been ongoing effort to develop and improve techniques that assist the programmer to detect and avoid such mistakes first- or secondhand.
\section{Static vs. Dynamic Analysis} \section{Static vs. Dynamic Analysis}
* TODO: explain first-/secondhand -> static/dynamic -> compile-time/runtime -> offline/online % TODO: explain first-/secondhand -> static/dynamic -> compile-time/runtime -> offline/online
* TODO: Explain static and dynamic checks % TODO: Explain static and dynamic checks
\section{Requirements} \section{Requirements}

View file

@ -1,74 +1,80 @@
% // vim: set ft=tex: % // vim: set ft=tex:
\chapter{Topic Refinement} \chapter{Topic Refinement}
- TODO: is this chapter required? % TODO: is this chapter required?
\chapter{Derived Research Questions} \chapter{Derived Research Questions}
\subsection{Definition Of Additional Analysis Rules To Extend Safety Checks} \subsection{Definition Of Additional Analysis Rules To Extend Safety Checks}
* TODO: How can Business Logical % TODO: How can Business Logical
Examples: % Examples:
* TLB needs to be reset on Task Change % TLB needs to be reset on Task Change
* Registers need to be % Registers need to be
\subsubsection{Software Fault Isolation} \subsubsection{Software Fault Isolation}
* TODO: content from \cite{Balasubramanian2017} % TODO: content from \cite{Balasubramanian2017}
\subsection{More Detailed Research Questions} \subsection{More Detailed Research Questions}
* Which language items help with managing memory? % TODO Which language items help with managing memory?
* How generic can the memory allocators be written? % TODO How generic can the memory allocators be written?
Guarantees to be statically checked: % TODO Guarantees to be statically checked:
* Control access to duplicates in page tables % TODO * Control access to duplicates in page tables
* Tasks can't access unallocated (physical) memory % TODO * Tasks can't access unallocated (physical) memory
* Tasks can't access other tasks memory % TODO * Tasks can't access other tasks memory
\subsection{Interrupts} \subsection{Interrupts}
* https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf p. 2848 % TODO https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf p. 2848
\section{Software Tests} \section{Software Tests}
* TODO: describe that tests are mostly semantics as opposed to static checks being mostly syntactical and technical % TODO: describe that tests are mostly semantics as opposed to static checks being mostly syntactical and technical
* TODO: They necessary in addition to static checks to cover the well-known use-cases and edge-cases. TODO: example? % TODO: They necessary in addition to static checks to cover the well-known use-cases and edge-cases.
% TODO: example?
\chapter{\glsentrytext{Linux} Modules Written In \glsentrytext{Rust}}
* TODO: describe Difficulties with the GPL Macros used Within Kernel Modules \chapter{Porting \glsentrytext{C} Vulnerabilities}
\label{rnd:porting-c-vulns}
In this chapter, the examples from \autoref{TODO} ported to \gls{Rust} for evaluation.
\chapter{\glsentrytext{LX} Modules Written In \glsentrytext{Rust}}
% TODO: describe Difficulties with the GPL Macros used Within Kernel Modules
\chapter{Existing \glsentrytext{OS}-Development Projects Based On Rust} \chapter{Existing \glsentrytext{OS}-Development Projects Based On Rust}
\section{Libraries} \section{Libraries}
\subsection{Libfringe} \subsection{Libfringe}
* https://github.com/edef1c/libfringe % TODO: https://github.com/edef1c/libfringe
\section{Systems} \section{Systems}
\subsection{intermezzOS} \subsection{intermezzOS}
\subsection{Blog OS} \subsection{Blog OS}
\subsection{Redox} \subsection{Redox}
\subsection{Tock} \subsection{Tock}
\chapter{\glsentrytext{imezzos}: Adding Preemptive \glsentrytext{OS}-Level Multitasking} \chapter{\glsentrytext{imezzos}: Adding Preemptive \glsentrytext{OS}-Level Multitasking}
\section{Timed Interrupts For Scheduling and Dispatching} \section{Timed Interrupts For Scheduling and Dispatching}
\section{Simple Stack Allocation Scheme} \section{Simple Stack Allocation Scheme}
\section{Risk Of Stack-Overflow} \section{Risk Of Stack-Overflow}
* TODO: The compiler doesn't check for stack overflows. % TODO: The compiler doesn't check for stack overflows.
* TODO: Describe possible implementation. % TODO: Describe possible implementation.
Parameters: % Parameters:
Stack limit for each function: user defined constant, % Stack limit for each function: user defined constant,
Stack size for each function: calculated, % Stack size for each function: calculated,
Call-Tree: calculated, % Call-Tree: calculated,
\chapter{Result Generalization} \chapter{Result Generalization}
\section{Low-Level Safe Abstractions in Rust} \section{Low-Level Safe Abstractions in Rust}
* TODO: Is the static analysis of hardware specific assembly code possible and useful at all? % TODO: Is the static analysis of hardware specific assembly code possible and useful at all?
* LLVM knows about the target and can potentially give hints about hardware specific instructions % TODO: LLVM knows about the target and can potentially give hints about hardware specific instructions
\section{Tracking \textit{'static}ally allocated Resources} \section{Tracking \textit{'static}ally allocated Resources}
\section{The Necessary Evils of \textit{unsafe}} \section{The Necessary Evils of \textit{unsafe}}
\chapter{Result Evaluation} \chapter{Result Evaluation}
* TODO: repeat that rust *can* be used to increase safety in the OS, but it doesn't guarantee it per-se % TODO: repeat that rust *can* be used to increase safety in the OS, but it doesn't guarantee it per-se
\chapter{Summary} \chapter{Summary}

View file

@ -3,190 +3,14 @@ Any changes to this file will be lost if it is regenerated by Mendeley.
BibTeX export options can be customized via Options -> BibTeX in Mendeley Desktop BibTeX export options can be customized via Options -> BibTeX in Mendeley Desktop
@article{Reed2015, @misc{MITRE-CWE-119,
abstract = {Rust is a new systems language that uses some advanced type system features, specifically affine types and regions, to statically guarantee memory safety and eliminate the need for a garbage collector. While each individual addition to the type system is well understood in isolation and are known to be sound, the combined system is not known to be sound. Furthermore, Rust uses a novel checking scheme for its regions, known as the Borrow Checker, that is not known to be correct. Since Rust's goal is to be a safer alternative to C/C++, we should ensure that this safety scheme actually works. We present a formal semantics that captures the key features relevant to memory safety, unique pointers and borrowed references, specifies how they guarantee memory safety, and describes the operation of the Borrow Checker. We use this model to prove the soudness of some core operations and justify the conjecture that the model, as a whole, is sound. Additionally, our model provides a syntactic version of the Borrow Checker, which may be more understandable than the non-syntactic version in Rust.}, author = {MITRE},
author = {Reed, Eric}, booktitle = {2.11},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Patina$\backslash$: A Formalization of the Rust Programming Language.pdf:pdf}, title = {{CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer}},
number = {February}, url = {http://cwe.mitre.org/data/definitions/119.html},
pages = {1--37}, urldate = {2017-08-31},
title = {{Patina: A Formalization of the Rust Programming Language}},
year = {2015}
}
@article{Dhurjati2003,
abstract = {Traditional approaches to enforcing memory safety of programs rely heavily on runtime checks of memory accesses and on garbage collection, both of which are unattractive for embedded applications. The long-term goal of our work is to enable 100{\%} static enforcement of memory safety for embedded programs through advanced compiler techniques and minimal semantic restrictions on programs. The key result of this paper is a compiler technique that ensures memory safety of dynamically allocated memory without programmer annotations, runtime checks, or garbage collection, and works for a large subclass of type-safe C programs. The technique is based on a fully automatic pool allocation (i.e., region-inference) algorithm for C programs we developed previously, and it ensures safety of dynamically allocated memory while retaining explicit deallocation of individual objects within regions (to avoid garbage collection). For a diverse set of embedded C programs (and using a previous technique to avoid null pointer checks), we show that we are able to statically ensure the safety of pointer and dynamic memory usage in all these programs. We also describe some improvements over our previous work in static checking of array accesses. Overall, we achieve 100{\%} static enforcement of memory safety without new language syntax for a significant subclass of embedded C programs, and the subclass is much broader if array bounds checks are ignored.},
author = {Dhurjati, D and Kowshik, S and Adve, V and Lattner, C},
doi = {10.1145/780742.780743},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Memory Safety Without Runtime Checks or Garbage.pdf:pdf},
isbn = {0362-1340},
issn = {03621340},
journal = {Acm Sigplan Notices},
keywords = {automatic pool allocation,compilers,embedded systems,languages,programming languages,region management,security,static analysis},
number = {7},
pages = {69--80},
title = {{Memory safety without runtime checks or garbage collection}},
volume = {38},
year = {2003}
}
@inproceedings{Kuznetsov2014,
abstract = {Systems code is often written in low-level languages like C/C++, which offer many benefits but also dele- gate memory management to programmers. This invites memory safety bugs that attackers can exploit to divert control flow and compromise the system. Deployed de- fense mechanisms (e.g., ASLR, DEP) are incomplete, and stronger defense mechanisms (e.g., CFI) often have high overhead and limited guarantees [19, 15, 9]. We introduce code-pointer integrity (CPI), a new de- sign point that guarantees the integrity of all code point- ers in a program (e.g., function pointers, saved return ad- dresses) and thereby prevents all control-flow hijack at- tacks, including return-oriented programming. We also introduce code-pointer separation (CPS), a relaxation of CPI with better performance properties. CPI and CPS offer substantially better security-to-overhead ratios than the state of the art, they are practical (we protect a complete FreeBSD system and over 100 packages like apache and postgresql), effective (prevent all attacks in the RIPE benchmark), and efficient: on SPEC CPU2006, CPS averages 1.2{\%} overhead for C and 1.9{\%} for C/C++, while CPI's overhead is 2.9{\%} for C and 8.4{\%} for C/C++. A prototype implementation of CPI and CPS can be obtained from http://levee.epfl.ch. 1},
author = {Kuznetsov, Volodymyr and Szekeres, L{\'{a}}szl{\'{o}} and Payer, Mathias},
booktitle = {Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation},
isbn = {9781931971164},
pages = {147--163},
title = {{Code-pointer integrity}},
url = {https://www.usenix.org/conference/osdi14/technical-sessions/presentation/kuznetsov{\%}5Cnhttps://www.usenix.org/system/files/conference/osdi14/osdi14-paper-kuznetsov.pdf?utm{\_}source=dlvr.it{\&}utm{\_}medium=tumblr},
year = {2014}
}
@article{Merity2016,
abstract = {Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either reproduce a word from the recent context or produce a word from a standard softmax classifier. Our pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. In order to evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora we also introduce the freely available WikiText corpus.},
archivePrefix = {arXiv},
arxivId = {1609.07843},
author = {Merity, Stephen and Xiong, Caiming and Bradbury, James and Socher, Richard},
eprint = {1609.07843},
journal = {Arxiv},
title = {{Pointer Sentinel Mixture Models}},
url = {http://arxiv.org/abs/1609.07843},
year = {2016}
}
@article{Chisnall2015,
abstract = {We propose a new memory-safe interpretation of the C ab-stract machine that provides stronger protection to benefit security and debugging. Despite ambiguities in the specifi-cation intended to provide implementation flexibility, con-temporary implementations of C have converged on a mem-ory model similar to the PDP-11, the original target for C. This model lacks support for memory safety despite well-documented impacts on security and reliability. Attempts to change this model are often hampered by as-sumptions embedded in a large body of existing C code, dat-ing back to the memory model exposed by the original C compiler for the PDP-11. Our experience with attempting to implement a memory-safe variant of C on the CHERI ex-perimental microprocessor led us to identify a number of problematic idioms. We describe these as well as their in-teraction with existing memory safety schemes and the as-sumptions that they make beyond the requirements of the C specification. Finally, we refine the CHERI ISA and abstract model for C, by combining elements of the CHERI capabil-ity model and fat pointers, and present a softcore CPU that implements a C abstract machine that can run legacy C code with strong memory protection guarantees.},
author = {Chisnall, David and Rothwell, Colin and Watson, Robert N M and Woodruff, Jonathan and Vadera, Munraj and Moore, Simon W and Roe, Michael and Davis, Brooks and Neumann, Peter G},
doi = {10.1145/2694344.2694367},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Beyond the PDP-11$\backslash$: Architectural support for a memory-safe C abstract machine.pdf:pdf},
isbn = {9781450328357},
issn = {01635964},
journal = {Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems},
pages = {117--130},
title = {{Beyond the PDP-11 : Architectural support for a memory-safe C abstract machine}},
url = {http://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201503-asplos2015-cheri-cmachine.pdf},
year = {2015}
}
@article{Arpaci-Dusseau2015,
abstract = {A book covering the fundamentals of operating systems, including virtualization of the CPU and memory, threads and concurrency, and file and storage systems. Written by professors active in the field for 20 years, this text has been developed in the classrooms of the University of Wisconsin-Madison, and has been used in the instruction of thousands of students.},
author = {{Arpaci-Dusseau Remzi}, Arpaci-Dusseau Andrea},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/operating{\_}systems{\_}{\_}three{\_}easy{\_}pieces{\_}{\_}electronic{\_}version{\_}0{\_}91{\_}.pdf:pdf},
journal = {Arpaci-Dusseau},
number = {0.91},
pages = {665},
title = {{Operating Systems: Three Easy Pieces}},
volume = {Electronic},
year = {2015}
}
@article{Szekeres2013,
abstract = {Memory corruption bugs in software written in low-level languages like C or C++ are one of the oldest problems in computer security. The lack of safety in these languages allows attackers to alter the program's behavior or take full control over it by hijacking its control flow. This problem has existed for more than 30 years and a vast number of potential solutions have been proposed, yet memory corruption attacks continue to pose a serious threat. Real world exploits show that all currently deployed protections can be defeated. This paper sheds light on the primary reasons for this by describing attacks that succeed on today's systems. We systematize the current knowledge about various protection techniques by setting up a general model for memory corrup- tion attacks. Using this model we show what policies can stop which attacks. The model identifies weaknesses of currently deployed techniques, as well as other proposed protections enforcing stricter policies. We analyze the reasons why protection mechanisms imple- menting stricter polices are not deployed. To achieve wide adoption, protection mechanisms must support a multitude of features and must satisfy a host of requirements. Especially important is performance, as experience shows that only solutions whose overhead is in reasonable bounds get deployed. A comparison of different enforceable policies helps de- signers of new protection mechanisms in finding the balance between effectiveness (security) and efficiency.We identify some open research problems, and provide suggestions on improving the adoption of newer techniques.},
author = {Szekeres, L??szl?? and Payer, Mathias and Wei, Tao and Song, Dawn},
doi = {10.1109/SP.2013.13},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/SoK$\backslash$: Eternal War in Memory.pdf:pdf},
isbn = {9780769549774},
issn = {10816011},
journal = {Proceedings - IEEE Symposium on Security and Privacy},
pages = {48--62},
title = {{SoK: Eternal war in memory}},
year = {2013}
}
@article{Corporation2011,
abstract = {The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 1, describes the basic architecture and programming environment of Intel 64 and IA-32 processors. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 2A {\&} 2B, describe the instruction set of the processor and the opcode struc- ture. These volumes apply to application programmers and to programmers who write operating systems or executives. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 3A {\&} 3B, describe the operating-system support environment of Intel 64 and IA-32 processors. These volumes target operating- system and BIOS designers. In addition, the Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 3B, addresses the programming environment for classes of software that host operating systems.},
author = {Corporation, Intel},
doi = {10.1109/MAHC.2010.22},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf:pdf},
isbn = {253665-057US},
issn = {15222594},
journal = {System},
keywords = {253665,IA-32 architecture,Intel 64},
number = {253665},
title = {{Intel {\textregistered} 64 and IA-32 Architectures Software Developer ' s Manual Volume 3}},
volume = {3},
year = {2011}
}
@article{Caballero2012,
abstract = {Use-after-free vulnerabilities are rapidly growing in popularity, especially for exploiting web browsers. Use-after-free (and double-free) vulnerabilities are caused by a program operating on a dangling pointer. In this work we propose early detection, a novel runtime approach for finding and diagnosing use-after-free and double-free vulnerabilities. While previous work focuses on the creation of the vulnerability (i.e., the use of a dangling pointer), early detection shifts the focus to the creation of the dangling pointer(s) at the root of the vulnerability. Early detection increases the effectiveness of testing by identifying unsafe dangling pointers in executions where they are created but not used. It also accelerates vulnerability analysis and minimizes the risk of incomplete fixes, by automatically collecting information about all dangling pointers involved in the vulnerability. We implement our early detection technique in a tool called Undangle. We evaluate Undangle for vulnerability analysis on 8 real-world vulnerabilities. The analysis uncovers that two separate vulnerabilities in Firefox had a common root cause and that their patches did not completely fix the underlying bug. We also evaluate Undangle for testing on the Firefox web browser identifying a potential vulnerability.},
author = {Caballero, Juan and Grieco, Gustavo and Marron, Mark and Nappa, Antonio},
doi = {10.1145/2338965.2336769},
isbn = {9781450314541},
issn = {1450314546},
journal = {ISSTA},
keywords = {automated testing,binary analysis,debugging,dynamic analysis},
pages = {133},
title = {{Undangle: early detection of dangling pointers in use-after-free and double-free vulnerabilities}},
url = {http://dl.acm.org/citation.cfm?doid=2338965.2336769},
year = {2012}
}
@book{AMD64Vol2,
author = {AMD},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/AMD64 Architecture Programmer's Manual Volume 2$\backslash$: System Programming.pdf:pdf},
keywords = {24593,AMD64 Architecture Programmer's Manual Volume 2: S},
number = {24592},
title = {{AMD64 Architecture Programmer's Manual Volume 2: System Programming}},
volume = {1},
year = {2012}
}
@article{Levy2015a,
abstract = {Rust, a new systems programming language, provides compile-time memory safety checks to help eliminate runtime bugs that manifest from improper memory management. This feature is advantageous for operating system development, and especially for embedded OS development, where recovery and debugging are particularly challenging. However, embedded platforms are highly event-based, and Rust's memory safety mechanisms largely presume threads. In our experience developing an operating system for embedded systems in Rust, we have found that Rust's ownership model prevents otherwise safe resource sharing common in the embedded domain, conflicts with the reality of hardware resources, and hinders using closures for programming asynchronously. We describe these experiences and how they relate to memory safety as well as illustrate our workarounds that preserve the safety guarantees to the largest extent possible. In addition, we draw from our experience to propose a new language extension to Rust that would enable it to provide better memory safety tools for event-driven platforms.},
author = {Levy, Amit and Andersen, Michael P. and Campbell, Bradford and Culler, David and Dutta, Prabal and Ghena, Branden and Levis, Philip and Pannuto, Pat},
doi = {10.1145/2818302.2818306},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/tock-plos2015.pdf:pdf},
isbn = {9781450339421},
journal = {PLOS: Workshop on Programming Languages and Operating Systems},
keywords = {embedded operating systems,linear types,ownership,rust},
pages = {21--26},
title = {{Ownership is Theft: Experiences Building an Embedded OS in Rust}},
url = {http://dl.acm.org/citation.cfm?id=2818302.2818306},
year = {2015}
}
@inproceedings{Ma2013,
abstract = {—Aiming at the problem of higher memory consumption and lower execution efficiency during the dynamic detecting to C/C++ programs memory vulnerabilities, this paper presents a dynamic detection method called ISC. The ISC improves the Safe-C using pointer analysis technology. Firstly, the ISC defines a simple and efficient fat pointer representation instead of the safe pointer in the Safe-C. Furthermore, the ISC uses the unification-based analysis algorithm with one level flow static pointer. This identification reduces the number of pointers that need to be converted to fat pointers. Then in the process of program running, the ISC detects memory vulnerabilities through constantly inspecting the attributes of fat pointers. Experimental results indicate that the ISC could detect memory vulnerabilities such as buffer overflows and dangling pointers. Comparing with the Safe-C, the ISC dramatically reduces the memory consumption and lightly improves the execution efficiency.},
author = {Ma, Rui and Chen, Lingkui and Hu, Changzhen and Xue, Jingfeng and Zhao, Xiaolin},
booktitle = {Proceedings - 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing, DASC 2013},
doi = {10.1109/DASC.2013.37},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/A Dynamic Detection Method to C-C++ Programs Memory Vulnerabilities Based on Pointer Analysis.pdf:pdf},
isbn = {9781479933815},
keywords = {dynamic detecting,fat pointer,improved Safe-C,memory vulnerability,pointer analysis},
pages = {52--57},
title = {{A dynamic detection method to C/C++ programs memory vulnerabilities based on pointer analysis}},
year = {2013}
}
@book{AMD64Vol1,
author = {AMD},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/AMD64 Architecture Programmer's Manual Volume 1$\backslash$: Application Programming.pdf:pdf},
keywords = {AMD64,SIMD,extended media instructions,legacy m},
number = {26568},
title = {{AMD64 Architecture Programmer's Manual Volume 1: Application Programming}},
volume = {4},
year = {2012}
}
@article{Getreu2016,
author = {Getreu, Jens},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Embedded System Security with Rust - Case Study of Heartbleed.pdf:pdf},
pages = {1--24},
title = {{Embedded System Security with Rust}},
year = {2016}
}
@article{Corporation2011a,
abstract = {The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 1, describes the basic architecture and programming environment of Intel 64 and IA-32 processors. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 2A {\&} 2B, describe the instruction set of the processor and the opcode struc- ture. These volumes apply to application programmers and to programmers who write operating systems or executives. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 3A {\&} 3B, describe the operating-system support environment of Intel 64 and IA-32 processors. These volumes target operating- system and BIOS designers. In addition, the Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 3B, addresses the programming environment for classes of software that host operating systems.},
author = {Corporation, Intel},
doi = {10.1109/MAHC.2010.22},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/64-ia-32-architectures-software-developer-vol-1-manual.pdf:pdf},
isbn = {253665-057US},
issn = {15222594},
journal = {System},
keywords = {253665,64,ia 32 architecture},
number = {253665},
title = {{Intel {\textregistered} 64 and IA-32 Architectures Software Developer ' s Manual Volume 1}},
volume = {1},
year = {2011}
}
@article{Nilsson2017,
author = {Nilsson, Fredrik},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/A Rust-based Runtime for the Internet of Things.pdf:pdf},
title = {{A Rust-based Runtime for the Internet of Things}},
year = {2017} year = {2017}
} }
@article{Affairs2015,
author = {Affairs, Post Doctoral},
file = {:home/steveej/src/steveej/msc-thesis/docs/You can't spell trust without Rust.pdf:pdf},
title = {{YOU CAN ' T SPELL TRUST WITHOUT RUST alexis beingessner Master ' s in Computer Science Carleton University}},
year = {2015}
}
@article{Xu2015, @article{Xu2015,
abstract = {Since vulnerabilities in Linux kernel are on the increase, attackers have turned their interests into related exploitation techniques. However, compared with numerous researches on exploiting use-after-free vulnerabilities in the user applications, few efforts studied how to exploit use-after-free vulnerabilities in Linux kernel due to the difficulties that mainly come from the uncertainty of the kernel memory layout. Without specific information leakage, attackers could only conduct a blind memory overwriting strategy trying to corrupt the critical part of the kernel, for which the success rate is negligible. In this work, we present a novel memory collision strategy to exploit the use-after-free vulnerabilities in Linux kernel reliably. The insight of our exploit strategy is that a probabilistic memory collision can be constructed according to the widely deployed kernel memory reuse mechanisms, which significantly increases the success rate of the attack. Based on this insight, we present two practical memory collision attacks: An object-based attack that leverages the memory recycling mechanism of the kernel allocator to achieve freed vulnerable object covering, and a physmap-based attack that takes advantage of the overlap between the physmap and the SLAB caches to achieve a more flexible memory manipulation. Our proposed attacks are universal for various Linux kernels of different architectures and could successfully exploit systems with use-after-free vulnerabilities in kernel. Particularly, we achieve privilege escalation on various popular Android devices (kernel version{\textgreater}=4.3) including those with 64-bit processors by exploiting the CVE-2015-3636 use-after-free vulnerability in Linux kernel. To our knowledge, this is the first generic kernel exploit for the latest version of Android. Finally, to defend this kind of memory collision, we propose two corresponding mitigation schemes.}, abstract = {Since vulnerabilities in Linux kernel are on the increase, attackers have turned their interests into related exploitation techniques. However, compared with numerous researches on exploiting use-after-free vulnerabilities in the user applications, few efforts studied how to exploit use-after-free vulnerabilities in Linux kernel due to the difficulties that mainly come from the uncertainty of the kernel memory layout. Without specific information leakage, attackers could only conduct a blind memory overwriting strategy trying to corrupt the critical part of the kernel, for which the success rate is negligible. In this work, we present a novel memory collision strategy to exploit the use-after-free vulnerabilities in Linux kernel reliably. The insight of our exploit strategy is that a probabilistic memory collision can be constructed according to the widely deployed kernel memory reuse mechanisms, which significantly increases the success rate of the attack. Based on this insight, we present two practical memory collision attacks: An object-based attack that leverages the memory recycling mechanism of the kernel allocator to achieve freed vulnerable object covering, and a physmap-based attack that takes advantage of the overlap between the physmap and the SLAB caches to achieve a more flexible memory manipulation. Our proposed attacks are universal for various Linux kernels of different architectures and could successfully exploit systems with use-after-free vulnerabilities in kernel. Particularly, we achieve privilege escalation on various popular Android devices (kernel version{\textgreater}=4.3) including those with 64-bit processors by exploiting the CVE-2015-3636 use-after-free vulnerability in Linux kernel. To our knowledge, this is the first generic kernel exploit for the latest version of Android. Finally, to defend this kind of memory collision, we propose two corresponding mitigation schemes.},
author = {Xu, Wen and Li, Juanru and Shu, Junliang and Yang, Wenbo and Xie, Tianyi and Zhang, Yuanyuan and Gu, Dawu}, author = {Xu, Wen and Li, Juanru and Shu, Junliang and Yang, Wenbo and Xie, Tianyi and Zhang, Yuanyuan and Gu, Dawu},
@ -201,17 +25,30 @@ title = {{From Collision To Exploitation: Unleashing Use-After-Free Vulnerabilit
url = {http://dl.acm.org/citation.cfm?doid=2810103.2813637}, url = {http://dl.acm.org/citation.cfm?doid=2810103.2813637},
year = {2015} year = {2015}
} }
@misc{Endler, @article{Szekeres2013,
author = {Endler, Matthias}, abstract = {Memory corruption bugs in software written in low-level languages like C or C++ are one of the oldest problems in computer security. The lack of safety in these languages allows attackers to alter the program's behavior or take full control over it by hijacking its control flow. This problem has existed for more than 30 years and a vast number of potential solutions have been proposed, yet memory corruption attacks continue to pose a serious threat. Real world exploits show that all currently deployed protections can be defeated. This paper sheds light on the primary reasons for this by describing attacks that succeed on today's systems. We systematize the current knowledge about various protection techniques by setting up a general model for memory corrup- tion attacks. Using this model we show what policies can stop which attacks. The model identifies weaknesses of currently deployed techniques, as well as other proposed protections enforcing stricter policies. We analyze the reasons why protection mechanisms imple- menting stricter polices are not deployed. To achieve wide adoption, protection mechanisms must support a multitude of features and must satisfy a host of requirements. Especially important is performance, as experience shows that only solutions whose overhead is in reasonable bounds get deployed. A comparison of different enforceable policies helps de- signers of new protection mechanisms in finding the balance between effectiveness (security) and efficiency.We identify some open research problems, and provide suggestions on improving the adoption of newer techniques.},
title = {{A curated list of static analysis tools, linters and code quality checkers for various programming languages}}, author = {Szekeres, L??szl?? and Payer, Mathias and Wei, Tao and Song, Dawn},
url = {https://github.com/mre/awesome-static-analysis} doi = {10.1109/SP.2013.13},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/SoK$\backslash$: Eternal War in Memory.pdf:pdf},
isbn = {9780769549774},
issn = {10816011},
journal = {Proceedings - IEEE Symposium on Security and Privacy},
pages = {48--62},
title = {{SoK: Eternal war in memory}},
year = {2013}
} }
@article{Balasubramanian2017, @article{Chisnall2015,
abstract = {Rust is a new system programming language that offers a practical and safe alternative to C. Rust is unique in that it enforces safety without runtime overhead, most importantly, without the overhead of garbage collection. While zero-cost safety is remarkable on its own, we argue that the super-powers of Rust go beyond safety. In particular, Rust's linear type system enables capabilities that cannot be implemented efficiently in traditional languages, both safe and unsafe, and that dramatically improve security and reliability of system software. We show three examples of such capabilities: zero-copy software fault isolation, efficient static information flow analysis, and automatic checkpointing. While these capabilities have been in the spotlight of systems research for a long time, their practical use is hindered by high cost and complexity. We argue that with the adoption of Rust these mechanisms will become commoditized.}, abstract = {We propose a new memory-safe interpretation of the C ab-stract machine that provides stronger protection to benefit security and debugging. Despite ambiguities in the specifi-cation intended to provide implementation flexibility, con-temporary implementations of C have converged on a mem-ory model similar to the PDP-11, the original target for C. This model lacks support for memory safety despite well-documented impacts on security and reliability. Attempts to change this model are often hampered by as-sumptions embedded in a large body of existing C code, dat-ing back to the memory model exposed by the original C compiler for the PDP-11. Our experience with attempting to implement a memory-safe variant of C on the CHERI ex-perimental microprocessor led us to identify a number of problematic idioms. We describe these as well as their in-teraction with existing memory safety schemes and the as-sumptions that they make beyond the requirements of the C specification. Finally, we refine the CHERI ISA and abstract model for C, by combining elements of the CHERI capabil-ity model and fat pointers, and present a softcore CPU that implements a C abstract machine that can run legacy C code with strong memory protection guarantees.},
author = {Balasubramanian, Abhiram and Baranowski, Marek S and Burtsev, Anton and Irvine, Uc and Rakamari, Zvonimir and Ryzhyk, Leonid and Research, Vmware}, author = {Chisnall, David and Rothwell, Colin and Watson, Robert N M and Woodruff, Jonathan and Vadera, Munraj and Moore, Simon W and Roe, Michael and Davis, Brooks and Neumann, Peter G},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/DRAFT$\backslash$: System Programming in Rust$\backslash$: Beyond Safety.pdf:pdf}, doi = {10.1145/2694344.2694367},
title = {{DRAFT: System Programming in Rust: Beyond Safety}}, file = {:home/steveej/src/github/steveej/msc-thesis/docs/Beyond the PDP-11$\backslash$: Architectural support for a memory-safe C abstract machine.pdf:pdf},
year = {2017} isbn = {9781450328357},
issn = {01635964},
journal = {Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems},
pages = {117--130},
title = {{Beyond the PDP-11 : Architectural support for a memory-safe C abstract machine}},
url = {http://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201503-asplos2015-cheri-cmachine.pdf},
year = {2015}
} }
@article{Lattner2005, @article{Lattner2005,
abstract = {The LLVM Compiler Infrastructure (http://llvm.cs. uiuc.edu) is a$\backslash$nrobust system that is well suited for a wide variety of research$\backslash$nand development work. This brief paper introduces the LLVM system$\backslash$nand provides pointers to more extensive documentation, complementing$\backslash$nthe tutorial presented at LCPC.}, abstract = {The LLVM Compiler Infrastructure (http://llvm.cs. uiuc.edu) is a$\backslash$nrobust system that is well suited for a wide variety of research$\backslash$nand development work. This brief paper introduces the LLVM system$\backslash$nand provides pointers to more extensive documentation, complementing$\backslash$nthe tutorial presented at LCPC.},
@ -231,3 +68,188 @@ title = {{The LLVM Compiler Framework and Infrastructure Tutorial}},
url = {http://dx.doi.org/10.1007/11532378{\_}2}, url = {http://dx.doi.org/10.1007/11532378{\_}2},
year = {2005} year = {2005}
} }
@article{Caballero2012,
abstract = {Use-after-free vulnerabilities are rapidly growing in popularity, especially for exploiting web browsers. Use-after-free (and double-free) vulnerabilities are caused by a program operating on a dangling pointer. In this work we propose early detection, a novel runtime approach for finding and diagnosing use-after-free and double-free vulnerabilities. While previous work focuses on the creation of the vulnerability (i.e., the use of a dangling pointer), early detection shifts the focus to the creation of the dangling pointer(s) at the root of the vulnerability. Early detection increases the effectiveness of testing by identifying unsafe dangling pointers in executions where they are created but not used. It also accelerates vulnerability analysis and minimizes the risk of incomplete fixes, by automatically collecting information about all dangling pointers involved in the vulnerability. We implement our early detection technique in a tool called Undangle. We evaluate Undangle for vulnerability analysis on 8 real-world vulnerabilities. The analysis uncovers that two separate vulnerabilities in Firefox had a common root cause and that their patches did not completely fix the underlying bug. We also evaluate Undangle for testing on the Firefox web browser identifying a potential vulnerability.},
author = {Caballero, Juan and Grieco, Gustavo and Marron, Mark and Nappa, Antonio},
doi = {10.1145/2338965.2336769},
isbn = {9781450314541},
issn = {1450314546},
journal = {ISSTA},
keywords = {automated testing,binary analysis,debugging,dynamic analysis},
pages = {133},
title = {{Undangle: early detection of dangling pointers in use-after-free and double-free vulnerabilities}},
url = {http://dl.acm.org/citation.cfm?doid=2338965.2336769},
year = {2012}
}
@article{Dhurjati2003,
abstract = {Traditional approaches to enforcing memory safety of programs rely heavily on runtime checks of memory accesses and on garbage collection, both of which are unattractive for embedded applications. The long-term goal of our work is to enable 100{\%} static enforcement of memory safety for embedded programs through advanced compiler techniques and minimal semantic restrictions on programs. The key result of this paper is a compiler technique that ensures memory safety of dynamically allocated memory without programmer annotations, runtime checks, or garbage collection, and works for a large subclass of type-safe C programs. The technique is based on a fully automatic pool allocation (i.e., region-inference) algorithm for C programs we developed previously, and it ensures safety of dynamically allocated memory while retaining explicit deallocation of individual objects within regions (to avoid garbage collection). For a diverse set of embedded C programs (and using a previous technique to avoid null pointer checks), we show that we are able to statically ensure the safety of pointer and dynamic memory usage in all these programs. We also describe some improvements over our previous work in static checking of array accesses. Overall, we achieve 100{\%} static enforcement of memory safety without new language syntax for a significant subclass of embedded C programs, and the subclass is much broader if array bounds checks are ignored.},
author = {Dhurjati, D and Kowshik, S and Adve, V and Lattner, C},
doi = {10.1145/780742.780743},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Memory Safety Without Runtime Checks or Garbage.pdf:pdf},
isbn = {0362-1340},
issn = {03621340},
journal = {Acm Sigplan Notices},
keywords = {automatic pool allocation,compilers,embedded systems,languages,programming languages,region management,security,static analysis},
number = {7},
pages = {69--80},
title = {{Memory safety without runtime checks or garbage collection}},
volume = {38},
year = {2003}
}
@book{AMD64Vol1,
author = {AMD},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/AMD64 Architecture Programmer's Manual Volume 1$\backslash$: Application Programming.pdf:pdf},
keywords = {AMD64,SIMD,extended media instructions,legacy m},
number = {26568},
title = {{AMD64 Architecture Programmer's Manual Volume 1: Application Programming}},
volume = {4},
year = {2012}
}
@article{Corporation2011a,
abstract = {The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 1, describes the basic architecture and programming environment of Intel 64 and IA-32 processors. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 2A {\&} 2B, describe the instruction set of the processor and the opcode struc- ture. These volumes apply to application programmers and to programmers who write operating systems or executives. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 3A {\&} 3B, describe the operating-system support environment of Intel 64 and IA-32 processors. These volumes target operating- system and BIOS designers. In addition, the Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 3B, addresses the programming environment for classes of software that host operating systems.},
author = {Corporation, Intel},
doi = {10.1109/MAHC.2010.22},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/64-ia-32-architectures-software-developer-vol-1-manual.pdf:pdf},
isbn = {253665-057US},
issn = {15222594},
journal = {System},
keywords = {253665,64,ia 32 architecture},
number = {253665},
title = {{Intel {\textregistered} 64 and IA-32 Architectures Software Developer ' s Manual Volume 1}},
volume = {1},
year = {2011}
}
@inproceedings{Kuznetsov2014,
abstract = {Systems code is often written in low-level languages like C/C++, which offer many benefits but also dele- gate memory management to programmers. This invites memory safety bugs that attackers can exploit to divert control flow and compromise the system. Deployed de- fense mechanisms (e.g., ASLR, DEP) are incomplete, and stronger defense mechanisms (e.g., CFI) often have high overhead and limited guarantees [19, 15, 9]. We introduce code-pointer integrity (CPI), a new de- sign point that guarantees the integrity of all code point- ers in a program (e.g., function pointers, saved return ad- dresses) and thereby prevents all control-flow hijack at- tacks, including return-oriented programming. We also introduce code-pointer separation (CPS), a relaxation of CPI with better performance properties. CPI and CPS offer substantially better security-to-overhead ratios than the state of the art, they are practical (we protect a complete FreeBSD system and over 100 packages like apache and postgresql), effective (prevent all attacks in the RIPE benchmark), and efficient: on SPEC CPU2006, CPS averages 1.2{\%} overhead for C and 1.9{\%} for C/C++, while CPI's overhead is 2.9{\%} for C and 8.4{\%} for C/C++. A prototype implementation of CPI and CPS can be obtained from http://levee.epfl.ch. 1},
author = {Kuznetsov, Volodymyr and Szekeres, L{\'{a}}szl{\'{o}} and Payer, Mathias},
booktitle = {Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation},
isbn = {9781931971164},
pages = {147--163},
title = {{Code-pointer integrity}},
url = {https://www.usenix.org/conference/osdi14/technical-sessions/presentation/kuznetsov{\%}5Cnhttps://www.usenix.org/system/files/conference/osdi14/osdi14-paper-kuznetsov.pdf?utm{\_}source=dlvr.it{\&}utm{\_}medium=tumblr},
year = {2014}
}
@article{Getreu2016,
author = {Getreu, Jens},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Embedded System Security with Rust - Case Study of Heartbleed.pdf:pdf},
pages = {1--24},
title = {{Embedded System Security with Rust}},
year = {2016}
}
@article{Affairs2015,
author = {Affairs, Post Doctoral},
file = {:home/steveej/src/steveej/msc-thesis/docs/You can't spell trust without Rust.pdf:pdf},
title = {{YOU CAN ' T SPELL TRUST WITHOUT RUST alexis beingessner Master ' s in Computer Science Carleton University}},
year = {2015}
}
@book{AMD64Vol2,
author = {AMD},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/AMD64 Architecture Programmer's Manual Volume 2$\backslash$: System Programming.pdf:pdf},
keywords = {24593,AMD64 Architecture Programmer's Manual Volume 2: S},
number = {24592},
title = {{AMD64 Architecture Programmer's Manual Volume 2: System Programming}},
volume = {1},
year = {2012}
}
@misc{MITRE-CWE-633,
author = {MITRE},
title = {{CWE-633: Weaknesses that Affect Memory}},
url = {http://cwe.mitre.org/data/definitions/633.html},
urldate = {2017-08-31},
year = {2017}
}
@inproceedings{Ma2013,
abstract = {—Aiming at the problem of higher memory consumption and lower execution efficiency during the dynamic detecting to C/C++ programs memory vulnerabilities, this paper presents a dynamic detection method called ISC. The ISC improves the Safe-C using pointer analysis technology. Firstly, the ISC defines a simple and efficient fat pointer representation instead of the safe pointer in the Safe-C. Furthermore, the ISC uses the unification-based analysis algorithm with one level flow static pointer. This identification reduces the number of pointers that need to be converted to fat pointers. Then in the process of program running, the ISC detects memory vulnerabilities through constantly inspecting the attributes of fat pointers. Experimental results indicate that the ISC could detect memory vulnerabilities such as buffer overflows and dangling pointers. Comparing with the Safe-C, the ISC dramatically reduces the memory consumption and lightly improves the execution efficiency.},
author = {Ma, Rui and Chen, Lingkui and Hu, Changzhen and Xue, Jingfeng and Zhao, Xiaolin},
booktitle = {Proceedings - 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing, DASC 2013},
doi = {10.1109/DASC.2013.37},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/A Dynamic Detection Method to C-C++ Programs Memory Vulnerabilities Based on Pointer Analysis.pdf:pdf},
isbn = {9781479933815},
keywords = {dynamic detecting,fat pointer,improved Safe-C,memory vulnerability,pointer analysis},
pages = {52--57},
title = {{A dynamic detection method to C/C++ programs memory vulnerabilities based on pointer analysis}},
year = {2013}
}
@misc{MITRE-CWE,
author = {MITRE},
title = {{CWE - Common Weakness Enumeration}},
url = {http://cwe.mitre.org},
urldate = {2017-08-31},
year = {2017}
}
@article{Levy2015a,
abstract = {Rust, a new systems programming language, provides compile-time memory safety checks to help eliminate runtime bugs that manifest from improper memory management. This feature is advantageous for operating system development, and especially for embedded OS development, where recovery and debugging are particularly challenging. However, embedded platforms are highly event-based, and Rust's memory safety mechanisms largely presume threads. In our experience developing an operating system for embedded systems in Rust, we have found that Rust's ownership model prevents otherwise safe resource sharing common in the embedded domain, conflicts with the reality of hardware resources, and hinders using closures for programming asynchronously. We describe these experiences and how they relate to memory safety as well as illustrate our workarounds that preserve the safety guarantees to the largest extent possible. In addition, we draw from our experience to propose a new language extension to Rust that would enable it to provide better memory safety tools for event-driven platforms.},
author = {Levy, Amit and Andersen, Michael P. and Campbell, Bradford and Culler, David and Dutta, Prabal and Ghena, Branden and Levis, Philip and Pannuto, Pat},
doi = {10.1145/2818302.2818306},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/tock-plos2015.pdf:pdf},
isbn = {9781450339421},
journal = {PLOS: Workshop on Programming Languages and Operating Systems},
keywords = {embedded operating systems,linear types,ownership,rust},
pages = {21--26},
title = {{Ownership is Theft: Experiences Building an Embedded OS in Rust}},
url = {http://dl.acm.org/citation.cfm?id=2818302.2818306},
year = {2015}
}
@article{Corporation2011,
abstract = {The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 1, describes the basic architecture and programming environment of Intel 64 and IA-32 processors. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 2A {\&} 2B, describe the instruction set of the processor and the opcode struc- ture. These volumes apply to application programmers and to programmers who write operating systems or executives. The Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volumes 3A {\&} 3B, describe the operating-system support environment of Intel 64 and IA-32 processors. These volumes target operating- system and BIOS designers. In addition, the Intel{\{}$\backslash$textregistered{\}} 64 and IA-32 Architectures Software Developer's Manual, Volume 3B, addresses the programming environment for classes of software that host operating systems.},
author = {Corporation, Intel},
doi = {10.1109/MAHC.2010.22},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf:pdf},
isbn = {253665-057US},
issn = {15222594},
journal = {System},
keywords = {253665,IA-32 architecture,Intel 64},
number = {253665},
title = {{Intel {\textregistered} 64 and IA-32 Architectures Software Developer ' s Manual Volume 3}},
volume = {3},
year = {2011}
}
@article{Nilsson2017,
author = {Nilsson, Fredrik},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/A Rust-based Runtime for the Internet of Things.pdf:pdf},
title = {{A Rust-based Runtime for the Internet of Things}},
year = {2017}
}
@article{Arpaci-Dusseau2015,
abstract = {A book covering the fundamentals of operating systems, including virtualization of the CPU and memory, threads and concurrency, and file and storage systems. Written by professors active in the field for 20 years, this text has been developed in the classrooms of the University of Wisconsin-Madison, and has been used in the instruction of thousands of students.},
author = {{Arpaci-Dusseau Remzi}, Arpaci-Dusseau Andrea},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/operating{\_}systems{\_}{\_}three{\_}easy{\_}pieces{\_}{\_}electronic{\_}version{\_}0{\_}91{\_}.pdf:pdf},
journal = {Arpaci-Dusseau},
number = {0.91},
pages = {665},
title = {{Operating Systems: Three Easy Pieces}},
volume = {Electronic},
year = {2015}
}
@article{Reed2015,
abstract = {Rust is a new systems language that uses some advanced type system features, specifically affine types and regions, to statically guarantee memory safety and eliminate the need for a garbage collector. While each individual addition to the type system is well understood in isolation and are known to be sound, the combined system is not known to be sound. Furthermore, Rust uses a novel checking scheme for its regions, known as the Borrow Checker, that is not known to be correct. Since Rust's goal is to be a safer alternative to C/C++, we should ensure that this safety scheme actually works. We present a formal semantics that captures the key features relevant to memory safety, unique pointers and borrowed references, specifies how they guarantee memory safety, and describes the operation of the Borrow Checker. We use this model to prove the soudness of some core operations and justify the conjecture that the model, as a whole, is sound. Additionally, our model provides a syntactic version of the Borrow Checker, which may be more understandable than the non-syntactic version in Rust.},
author = {Reed, Eric},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/Patina$\backslash$: A Formalization of the Rust Programming Language.pdf:pdf},
number = {February},
pages = {1--37},
title = {{Patina: A Formalization of the Rust Programming Language}},
year = {2015}
}
@misc{Endler,
author = {Endler, Matthias},
title = {{A curated list of static analysis tools, linters and code quality checkers for various programming languages}},
url = {https://github.com/mre/awesome-static-analysis}
}
@article{Merity2016,
abstract = {Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either reproduce a word from the recent context or produce a word from a standard softmax classifier. Our pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. In order to evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora we also introduce the freely available WikiText corpus.},
archivePrefix = {arXiv},
arxivId = {1609.07843},
author = {Merity, Stephen and Xiong, Caiming and Bradbury, James and Socher, Richard},
eprint = {1609.07843},
journal = {Arxiv},
title = {{Pointer Sentinel Mixture Models}},
url = {http://arxiv.org/abs/1609.07843},
year = {2016}
}
@article{Balasubramanian2017,
abstract = {Rust is a new system programming language that offers a practical and safe alternative to C. Rust is unique in that it enforces safety without runtime overhead, most importantly, without the overhead of garbage collection. While zero-cost safety is remarkable on its own, we argue that the super-powers of Rust go beyond safety. In particular, Rust's linear type system enables capabilities that cannot be implemented efficiently in traditional languages, both safe and unsafe, and that dramatically improve security and reliability of system software. We show three examples of such capabilities: zero-copy software fault isolation, efficient static information flow analysis, and automatic checkpointing. While these capabilities have been in the spotlight of systems research for a long time, their practical use is hindered by high cost and complexity. We argue that with the adoption of Rust these mechanisms will become commoditized.},
author = {Balasubramanian, Abhiram and Baranowski, Marek S and Burtsev, Anton and Irvine, Uc and Rakamari, Zvonimir and Ryzhyk, Leonid and Research, Vmware},
file = {:home/steveej/src/github/steveej/msc-thesis/docs/DRAFT$\backslash$: System Programming in Rust$\backslash$: Beyond Safety.pdf:pdf},
title = {{DRAFT: System Programming in Rust: Beyond Safety}},
year = {2017}
}

View file

@ -10,8 +10,8 @@
\usepackage{geometry} \usepackage{geometry}
\geometry{a4paper, top=25mm, left=30mm, right=35mm, bottom=35mm, headsep=10mm, footskip=12mm} \geometry{a4paper, top=25mm, left=30mm, right=35mm, bottom=35mm, headsep=10mm, footskip=12mm}
%\usepackage{multirow,tabularx,tabu} \usepackage{multirow,tabularx,tabu}
\usepackage{ctable,multirow} \usepackage{ctable,multirow,spreadtab}
\usepackage[backend=biber,style=numeric,url=true]{biblatex} \usepackage[backend=biber,style=numeric,url=true]{biblatex}
\addbibresource{thesis.bib} \addbibresource{thesis.bib}
@ -26,6 +26,8 @@
\makenoidxglossaries \makenoidxglossaries
\usepackage{listings} \usepackage{listings}
\usepackage{graphicx}
\usepackage{color}
\newcommand{\topic}{Guarantees On In-Kernel Memory-Safety Using Rust's Static Code Analysis} \newcommand{\topic}{Guarantees On In-Kernel Memory-Safety Using Rust's Static Code Analysis}