msc-thesis/src/docs/parts/context/context.tex

225 lines
15 KiB
TeX
Raw Normal View History

% // vim: set ft=tex:
\chapter{Introduction}
This thesis studies the feasibility of using static code analysis, as found in the \gls{Rust} \gls{compiler}, for ensuring safety within an \gls{OS} kernel.
Because an \gls{OS} is nothing but a \gls{app}, this study could be applied to all \glspl{app}, but the focus is on the implementation of \glspl{OS} which is the \gls{app} that is responsible for managing the system's resources and provide abstractions for higher level applications.
The \gls{OS} is the only \gls{app} that has unrestricted access to these resources, with the task of managing these safely according to the rules that were set up by the \gls{sysadmin}.
\section{A Definition of Memory Safety In The \gls{OS}}
A clear definition of memory safety is laid out in this section.
For decades computer systems are able to execute instructions that they have previously loaded into their main memory.
The details of loading the instructions for the \gls{OS} are irrelevant for the understanding, but it is assumed that the persistent memory that holds these instructions is the responsibility of the \gls{sysadmin} and not the \gls{OS}.
Once the \gls{OS} is in execution, it is responsible for loading the instructions and data for other \gls{app} into main memory.
This happens either automatically through configured jobs, or based on well-defined events which can be any form of input via the system's interfaces.
The latter is potentially dangerous because it requires an extensive amount of care and foresight from the developers of the \gls{OS} and \glspl{app} to prepare a system for the various events that might possibly occur.
This is not an easy task, especially if the interface or the environment of a system are diverse and complex.
In this context, memory safety is the ability to prevent an alteration of the memory content that would otherwise lead to malfunctioning at best, and malicious behavior at worst.
\section{Academic And Industrial Activities}
% Primary Research Questions
% The primary research question is the basis for data collection and arises from the Purpose of the Study. There may be one, or there may be several. When the research is finished, the contribution to the knowledge will be the answer to these questions. Do not confuse the primary research questions with interview questions in a qualitative study, or survey questions in a quantitative study. The research questions in a qualitative study are followed by both a null and an alternate hypothesis.
% Hypotheses
% A hypothesis is a testable prediction for an observed phenomenon, namely, the gap in the knowledge. Each research question will have both a null and an alternative hypothesis in a quantitative study. Qualitative studies do not have hypotheses. The two hypotheses should follow the research question upon which they are based. Hypotheses are testable predictions to the gap in the knowledge. In a qualitative study the hypotheses are replaced with the primary research questions.
* TODO: mention redox, tockos, intermezzOS and more activities
* TODO: mention paper's by tockos team
* TODO: mention electrolyte, formal verification for Rust
According to my best-effort literature research in Q1/2017, the hypothesis that \textit{Rust's static code analysis can guarantee memory safety in the \gls{OS}} has not been studied explicitly.
This is to my surprise, because as explained in more details in this chapter the situation in
\gls{OS} is critical and \gls{Rust} offers attractive features to help improve this situation.
However, the hypothesis cannot be trivially approved or denied, which drives the research efforts for my final thesis project.
% Purpose of the Study
%The Purpose of the Study is a statement contained within one or two paragraphs that identifies the research design, such as qualitative, quantitative, mixed methods, ethnographic, or another design. The research variables, if a quantitative study, are identified, for instance, independent, dependent, comparisons, relationships, or other variables. The population that will be used is identified, whether it will be randomly or purposively chosen, and the location of the study is summarized. Most of these factors will be discussed in detail in Chapter 3.
The results will be of qualitative nature, captured by analyzing existing and a self-developed \gls{Rust}-implementations of popular memory management techniques.
In addition to the sole analysis of \gls{Rust}-implementations, comparisons will be made, discerning the level of memory safety guarantees gained over similarly intending implementations in \gls{C}.
\section{Status Quo: Zero Memory-Safety A Day}
% Significance of the Study
% The significance is a statement of why it is important to determine the answer to the gap in the knowledge, and is related to improving the human condition. The contribution to the body of knowledge is described, and summarizes who will be able to use the knowledge to make better decisions, improve policy, advance science, or other uses of the new information. The “new” data is the information used to fill the gap in the knowledge.
A very popular and widespread is \gls{OS}, which has been developed with \gls{C} (and some assembly).
Recent years have shown how prone it is to vulnerabilities that result from the unsafe language design and programming errors.
With the growing number of vulnerabilities, various solutions have been proposed to increase the safety of C, either with static code analysis or via checks imposed at runtime. (TODO: reference).
The former is complex to perform on a language that has not been designed to be safety-analysed. TODO? reference?
Despite its complexity, attempts exist to define a subset of the \gls{C} language that can be safety checked, TODO: refernces of Cyclone, CCured, etc..
Safety checks that are performed at runtime introduce a high degree of overhead, which makes it an unviable option in the domain of \gls{OS} development, where many code paths must be very fast to ensure the operation of high speed I/O devices\cite{Balasubramanian2017} or other tasks with hard- or soft-realtime requirements.. (TODO: explain realtime requirements)
This has been forcing \gls{OS} developers to prioritize performance over safety. (TODO: reference)
Details about the challenge of writing code that does memory management safely, and related vulnerabilities are given further along in \autoref{chap:mmt}.
\section{Assessing Memory-Safety}
% In Chapter 1 this is a summary of the methodology and contains a brief outline of three things: (a) the participants in a qualitative study or the subjects of a quantitative study (human participants are referred tyo as participants, non-human subjects are referred to as subjects), (b) the instrumentation used to collect data, and (c) the procedure that will be followed. All of these elements will be reported in detail in Chapter 3. In a quantitative study, the instrumentation will be validated in Chapter 3 in detail. In a qualitative study, if it is a researcher-created questionnaire, validating the correctness of the interview protocol is usually accomplished with a pilot study. For either a quantitative or a qualitative study, using an already validated survey instrument is easier to defend and does not require a pilot study; however, Chapter 3 must contain a careful review of the instrument and how it was validated by the creator.
% In a qualitative study, which usually involves interviews, the instrumentation is an interview protocol a pre-determined set of questions that every participant is asked that are based on the primary research questions. A qualitative interview should contain no less than 10 open-ended questions and take no less than 1 hour to administer to qualify as “robust” research.
% In the humanities, a demographic survey should be circulated with most quantitative and qualitative studies to establish the parameters of the participant pool. Demographic surveys are nearly identical in most dissertations. In the sciences, a demographic survey is rarely needed.
* TODO: what is memory?
* TODO: when it is considered safe?
* TODO: Explain how memory-safety can be measured
\section{Compilers And Static Code Analysis}
% The theoretical framework is the foundational theory that is used to provide a perspective upon which the study is based. There are hundreds of theories in the literature. For instance, if a study in the social sciences is about stress that may be causing teachers to quit, Apples Intensification Theory could be cited as the theory was that stress is cumulative and the result of continuing overlapping, progressively stringent responsibilities for teachers that eventually leads to the desire to quit. In the sciences, research about new species that may have evolved from older, extinct species would be based on the theory of evolution pioneered by Darwin.
% Some departments put the theoretical framework explanation in Chapter 1; some put it in Chapter 2.
* TODO: put in some scientific background about static checks
* affine types
\section{Assumptions, Limitations, and Scope (Delimitations)}
% Assumptions are self-evident truths. In a qualitative study, it may be assumed that participants be highly qualified in the study is about administrators. It can be assumed that participants will answer truthfully and accurately to the interview questions based on their personal experience, and that participants will respond honestly and to the best of their individual abilities.
% Limitations of a study are those things over which the research has no control. Evident limitations are potential weaknesses of a study. Researcher biases and perceptual misrepresentations are potential limitations in a qualitative study; in a quantitative study, a limitation may be the capability of an instrument to accurately record data.
% Scope is the extent of the study and contains measurements. In a qualitative study this would include the number of participants, the geographical location, and other pertinent numerical data. In a quantitative study the size of the elements of the experiment are cited. The generalizability of the study may be cited. The word generalizability, which is not in the Word 2007 dictionary, means the extent to which the data are applicable in places other than where the study took place, or under what conditions the study took place.
% Delimitations are limitations on the research design imposed deliberately by the researcher. Delimitations in a social sciences study would be such things as the specific school district where a study took place, or in a scientific study, the number of repetitions.
\section{Premised Trust In Hardware}
* TODO: is it worth to explain ECC?
* TODO: explain that the hardware might be unsafe but this is not in scope of the thesis
2017-08-10 19:09:58 +02:00
\section{Recap}
% Summarize the content of Chapter 1 and preview of content of Chapter 2.
\label{chap:mmt}
The \autoref{chap:mmt} gives a detailed introduction to memory management in contemporary architectures and \glspl{OS}.
\chapter{Sophisticated Memory Management Techniques}
* TODO: in the beginnings application software had full control over memory
* TODO: from single-job via batch systems to multiprocessing
As the result of collaborations between hard- and software developers, the memory management task in the \gls{OS} is partially delegated to the \gls{CPU}'s \gls{MMU}.
A complete understand of this task is necessary in order to reason about it's safety.
This chapter starts with the provides a thorough introduction to modern memory management techniques on the x86\_64 architecture.
\section{Resource Abstraction: Protection And Efficiency}
* TODO: recap that management has been motivated by multiprocessing without side-effects
* TODO: brief history and market share of x86\_64 processors and ARM
\section{Virtual Addresses}
* TODO: describe dynamic (relocatable) addresses
* TODO: describe swapping
* TODO: describe virtual address
% * TODO: parse http://wiki.osdev.org/Memory_Management_Unit
\section{Paging}
* TODO: describe
\subsection{Multi-Level Paging}
2017-08-10 19:09:58 +02:00
\subsection{Top-Level Page Table Self-Reference}
\subsection{Caching Lookups}
\subsection{Full Example}
* http://taptipalit.blogspot.de/2013/10/theory-recursive-mapping-page.html
* https://www.coresecurity.com/blog/getting-physical-extreme-abuse-of-intel-based-paging-systems-part-2-windows
2017-08-10 19:09:58 +02:00
\section{Stack And Heap Concept}
\section{Memory Allocation}
2017-08-10 19:09:58 +02:00
\chapter{Memory-Related Software-Programming Weaknesses}
\label{chap:context.mem-weaknesses}
Software vulnerabilities can be categorized by their underlying weaknesses.
This chapter explains the weaknesses of interest for this project and gives concrete examples for their manifestation.
2017-08-10 19:09:58 +02:00
\section{Weakness Categories}
This work focuses on the following weaknesses defined in the \gls{CWE}
\begin{itemize}
\item{Improper Restriction of Operations within the Bounds of a Memory Buffer}
https://cwe.mitre.org/data/definitions/119.html
% TODO: find more
\end{itemize}
2017-08-10 19:09:58 +02:00
\section{Manifestation Examples}
\subsection{Uninitialized Pointers}
\begin{lstlisting}[language=C,
directivestyle={\color{black}}
emph={int,char,double,float,unsigned},
emphstyle={\color{blue}}
]
char *src = "hello" ; // character string constant
char *dst; // unallocated
strcpy(dst, src); // segfault and die
\end{lstlisting}
\subsection{Null-Pointers}
\begin{lstlisting}[language=C,
directivestyle={\color{black}}
emph={int,char,double,float,unsigned},
emphstyle={\color{blue}}
]
char *ptr;
int ret;
if (ret = init_ptr(ptr)) {
return ret;
}
if (ptr == NULL) {
// gracefully handles the NP
} else {
// may assume that the ptr address is not NULL
}
\end{lstlisting}
2017-08-10 19:09:58 +02:00
\subsection{TODO: more}
\chapter{Memory-Safety Analysis Techniques}
As per the previous \autoref{chap:context.mem-weaknesses} there is general awareness of the problems, and there has been ongoing effort to develop and improve techniques that assist the programmer to detect and avoid such mistakes first- or secondhand.
\section{Static vs. Dynamic Analysis}
* TODO: explain first-/secondhand -> static/dynamic -> compile-time/runtime -> offline/online
* TODO: Explain static and dynamic checks
\section{Requirements}
* TODO: which knowledge is required to analyze access to memory?
\section{Limitations}
* TODO: deadlock example
\chapter{Introduction To Rust}
\section{Compiler Architecture}
2017-08-10 19:09:58 +02:00
- TODO: Tokens? AST? LLVM? (http://embed.rs/articles/2016/arm-inline-assembly-rust/)
- TODO: BSYS SS17 GITHUB IO Rust Memory Layout - 4
\section{Static Analysis Features}
- TODO: How does static typing help with preventing programming errors
- TODO: How does the Rust's static analysis work, theoretically and practically
2017-08-10 19:09:58 +02:00
- TODO: How can memory be dynamically allocated and still safety checked?
\subsection{Ownership And Borrows}
- TODO: Who owns global 'static variables?
- https://nercury.github.io/rust/guide/2015/01/19/ownership.html
\subsection{Lifetimes}
- TODO: Where are global 'static variables allocated?
\subsection{Type Safety}
- TODO: how does casting work?
2017-08-10 19:09:58 +02:00
- TODO: demonstrate raw pointers
- TODO: what's the equivalent of void*?
\subsection{The Newtype Pattern}
\subsection{Zero-Cost Abstraction}
https://aturon.github.io/features/types/newtype.html
2017-08-10 19:09:58 +02:00
\subsection{Im/mutability}
- TODO: describe Rc, Arc, and {Ref,}Cell
\section Language Extension
\subsection{Syntax Extension}
\subsubsection{Macros}
\subsubsection{Annotations}
\subsection{Compiler Plugins}