context: restructure part

* more on virtualization
This commit is contained in:
steveej 2016-09-18 19:50:07 +02:00
parent 28f8d77200
commit ca933ff870
2 changed files with 100 additions and 78 deletions

View file

@ -1,24 +1,96 @@
% // vim: set ft=tex:
\chapter{Introduction}
% Chapter 1, with a highly focused review of the literature, and is normally the “prospectus” that a committee approves before the “proposal” to start research is approved. After the prospectus is approved, some of the review of literature may be moved into Chapter 2, which then becomes part of the proposal to do research.
% Chapter 1 is the engine that drives the rest of the document, and it must be a complete empirical argument as is found in courts of law. It should be filled with proofs throughout. It is not a creative writing project in a creative writing class; hence, once a word or phrase is established in Chapter 1, use the same word or phrase throughout the dissertation. The content is normally stylized into five chapters, repetitive in some sections from dissertation to dissertation. A lengthy dissertation may have more than five chapters, but regardless, most universities limit the total number of pages to 350 due to microfilming and binding considerations in libraries in those institutions requiring hard copies.
% Use plenty of transitional words and sentences from one section to another, as well as subheadings, which allow the reader to follow the writers train of thought. Following is an outline of the content of the empirical argument of Chapter 1. Universities often arrange the content in a different order, but the subject matter is the same in all dissertations because it is an empirical “opening statement” as might be found in a court of law. (Note that a dissertation could also be five pages of text and 50 pages of pictures of dragonfly wings and qualify for a Doctors degree in entomology.)
%State the general field of interest in one or two paragraphs, and end with a sentence that states what study will accomplish. Do not keep the reader waiting to find out the precise subject of the dissertation.
\chapter{Overview}
This thesis is a scientific approach to analyze and solve the practical problems of packaging and deploying \glspl{app} in the context of \gls{sac} technology.
For a lack of an official definition and common understanding what this technology is, the term \gls{sac} is defined in this chapter as a reference for the rest of the thesis.
The technology combines \gls{virtualization} techniques with new approaches to \gls{app} development and deployment.
The two main drivers for this technology have been long standing problems in information technology: optimal utilization of hardware, and simplification of software deployment to said hardware.
Because this is a problem in the operation of computers, it is expected that the problem is attempted to be solved on an \gls{OS} level or slightly above.
The following sections explain the ideologies and also gives a detailed insight to \gls{OS}-specific \gls{sac} technology, focusing on the \gls{Linux} \gls{OS}.
The two main drivers for this technology have been long standing problems in information technology; optimal utilization of hardware, and correct deployment of software to said hardware, both without sacrificing security.
This is necessary to understand the problem description, given at the end of the chapter.
The optimal utilization of hardware is done by collocating and running multiple \glspl{app} running simultaneously on the same hardware.
In order to increase security, these applications are separated by applying \gls{virtualization} techniques.
An introduction to \gls{virtualization} is given in this chapter, as it is important to understand this aspect of the \gls{sac} technology.
\section{\glspl{sac}}
The technology that is currently available to form \glspl{sac} reuses different patterns and techniques to solve a combination of different problems all at once.
The correctness of software deployment is not measurable as a simple number, but involves many factors.
The \gls{sac} approach is to create self-contained bundles for \glspl{app} in form of \glspl{saci}.
As the main concern of this thesis, related problems and state of the art solutions are explained in this chapter.
In the following chapters it is explained which technologies of \gls{virtualization} and \gls{OS} security mechanisms that form the foundation of \glspl{sac}.
Also explained are the difficulties of correct software development and deployment in form of \glspl{saci}.
These difficulties are the main subject of this thesis.
\chapter{Virtualization}
Since the first \gls{VM} \gls{OS} \cite[p.~217-218]{Sarton1975} was created, \gls{virtualization} has been an important field in computer sciences, both in research and in the industry, and has been subject to continuous development, improvement, and adoption.
The basics of virtualization boil down to one principle.
It is the principle of controlling and monitoring the availability and the access to soft- and hardware resources for users, their applications and whole virtual systems running on top of existing systems.
Virtualization techniques can be grouped by two categories: \glspl{hypervisor} and \gls{osvirt}.
\section{\glspl{hypervisor}}
The term \gls{hypervisor} is synonymous to the more self-explanatory terms control program \cite[p. 217]{Sarton1975} and \gls{VM} monitor.
The \gls{hypervisor} operates on a host machine and can control multiple \glspl{VM}.
The principle is easy to understand, because one can simply picture one or many virtual computers running on a real computer.
\glspl{VM} are presented with a set of virtual hardware resources that don't necessarily exist in the presented form on the underlying hardware machine they are being executed on.
\subsection{Running \glspl{OS}}
In order to be able to boot the virtual hardware and run services, \glspl{VM} need an \gls{OS} to begin with.
Specific to the environment and features of the \gls{hypervisor}, there are different storage formats for the file(s) that contain the \gls{OS} for the \gls{VM}.
In general, these files contain the \gls{OS} itself, as well as the installed applications in order to run the desired services within the \gls{VM}.
Compared to running such a service on the host machine directly, one obvious overhead is that this requires a said \gls{OS} to be installed and configured once upfront, and virtually booted for every execution of the service.
\subsubsection{Support for multiple \glspl{OS}}
Because each \gls{VM} on a host has a separate \gls{OS}, they can run whatever the \gls{hypervisor} supports, and are not tied to run the same \gls{OS}, or even the same platform, as the host machine.
This allows to create heterogeneous scenarios like running an ARM \gls{VM} using \gls{OBSD} on a x86 \gls{Linux} host, or vice versa.
% TODO: think about if I want to show this or not. \subsection{\gls{Linux}'s Kernel Virtual Machine \gls{hypervisor}}
% it's not part of the topic.
% it's not necessary to understand containers
\subsection{Overhead In Application Virtualization}
In case that multiple \glspl{VM} are supposed to run the same application, e.g. with different configuration files, each of them will have a separate copy of the \gls{OS} and the application itself.
It's assumable that solely the applications running on top of the virtualized \gls{OS} is the required subject to virtualization.
Therefore, it might not be necessary to run a separate virtualized \glspl{OS} just to for the sake of virtualizing the applications, especially if they are compatible with the same type of \gls{OS}.
This the main scenarios for the use of \gls{osvirt}, which is better suited because it doesn't require another full-fledged \gls{OS} to virtualize an application.
Instead, the applications run in the same \gls{OS} instance as separates processes, explained in the following section \ref{sect:sac-osvirt}.
\section{\gls{osvirt}}
\label{sect:sac-osvirt}
% TODO: explain generically what APC is and compare to Hypervisors
The technology of \gls{osvirt} has been under active development for about decade now\cite{Reshetova2014}, with the purpose of supporting virtualized applications rather than virtualizing whole machines.
The popularity of the technology has spiked with the release of one specific user-facing implementation named \gls{Docker}, which was originally exclusively available for the \gls{Linux}-platform.
Section \ref{virt-advent-sac} contains an overview of \gls{Docker}'s features that presumably helped to make it popular in a relatively short amount of time, compared to how long the underlying technology and similar tools have existed.
Beforehand, it is useful to learn the low-level mechanisms implemented in the \gls{Linux} kernel.
\subsection{Process Isolation}
\gls{osvirt} allows to virtualize applications on the \gls{OS} level instead of the machine level.
Its features are implemented in the kernel of the \gls{OS}, providing a low-overhead isolation and resource-control for user-space processes, in short: virtualization.
% TODO: refer to Linux Namespaces and Cgroups
% TODO show chroot
% TODO compare security features
% TODO compare performance
\subsection{\gls{app} Deployment}
% TODO compare productivity and deployment
\section{Its Advent With \glspl{sac}}
\label{virt-advent-sac}
Even though the underlying technology \gls{osvirt} had been available for a relatively long time, \gls{Docker}\cite{Fink2014}, since its release in 2014\footnote{http://blog.docker.com/2014/06/its-here-docker-1-0}, has brought \glspl{sac} to the attention and hands of the masses in the \gls{OSS} community.
From a psychological standpoint this is not surprising, as it has abstracted most complexities of the technology, adding ease of deployment, a platform for hosting the \gls{saci} in a Docker specific format, as well as a very convenient way for building the like using Dockerfiles(TODO reference).
Its popularity has come to a point where the term \textit{Docker} is being used interchangeably with the \gls{sac} technology itself.
% TODO: references for this claim
% TODO introduce \gls{LXC}, \gls{systemd-nspawn}, \gls{Docker}, and \gls{rkt}.
The first part of this section analyzes the \gls{sacr} aspects of the implementations, while the second part demonstrates currently popular approaches to assemble \glspl{saci}.
\chapter{Software Development \& Deployment}
\section{Bringing Together Developers \& Operators}
The technology that is currently available and combined to form \glspl{sac} reuses different patterns and techniques to solve a combination of problems that either software developers or system operators have faced separately.
These problems are all related to software deployment and system operation and can be represented by the following questions.
Only a subset of these problems and attempted solutions will be subject to research for this thesis.
\begin{enumerate}
\item How do we maximize the utilization of our hardware systems without compromising security?
@ -26,63 +98,14 @@ Only a subset of these problems and attempted solutions will be subject to resea
It is nonetheless briefly explained under the section \ref{sect:sac-osvirt}, to form a complete view on the scope of \glspl{sac} technology.
\item How do we guarantee that the application works on every target machine the same as on the developer machine?
\item How do we build multiple variants and/or versions of an application, install and run them simultaneously on the same target machine without exhibiting conflicts?
\item How do we build multiple variants and/or versions of an application?
\item How do we install, configure and run them simultaneously on the same target machine without exhibiting conflicts or sacrifice a single application's security?
\item How do we verify that an application runs on the target system has not been altered maliciously at one point in the deployment chain?
\end{enumerate}
Questions 2 - 4 are in the research scope of this thesis, while the concern of the last question is declared optional.
Only a subset of these problems and attempted solutions will be subject to research for this thesis.
Questions 2 - 4 are in the research scope of this thesis, while the concern of question 5 is declared optional and is not strictly required for the completion of this project.
These questions are very important to the ideology of \glspl{sac}, and they have their origin in the conventional methods of software deployment.
More thorough examination of these question is done in section \ref{sect:sd-challenges} of this chapter.
\subsection{Introduction to Virtualization}
% Background of the Problem
% This section is critically important as it must contain some mention of all the subject matter in the following Chapter 2 Review of the Literature 2 and the methodology in Chapter 3. Key words should abound that will subsequently be used again in Chapter 2. The section is a brief two to four page summary of the major findings in the field of interest that cites the most current finding in the subject area. A minimum of two to three citations to the literature per paragraph is advisable. The paragraphs must be a summary of unresolved issues, conflicting findings, social concerns, or educational, national, or international issues, and lead to the next section, the statement of the problem. The problem is the gap in the knowledge. The focus of the Background of the Problem is where a gap in the knowledge is found in the current body of empirical (research) literature.
Since the development of the first \gls{VM} \gls{OS} \cite[p.~217-218]{Sarton1975}, \gls{virtualization} has been an important field in computer sciences, both in research and in the industry.
It has been subject to constant development, improvement, and adoption.
The basics of virtualization boil down to one principle.
It is the principle of controlling and monitoring the availability and the access to soft- and hardware resources for users, their applications and whole virtual systems running on top of existing systems.
Modern virtualization techniques can be grouped by two categories: \glspl{hypervisor} and \gls{osvirt}.
\subsubsection{\glspl{hypervisor}}
The modern term \gls{hypervisor} is synonymous to the more self-explanatory terms control program \cite[p. 217]{Sarton1975} and \gls{VM} monitor.
The \gls{hypervisor} operates on a host machine and can control multiple \glspl{VM}.
The principle is easy to understand, because one can simply picture one or many virtual computers running on a real computer.
\glspl{VM} are presented with a set of virtual hardware resources that don't necessarily exist in the presented form on the underlying hardware machine they are being executed on.
In order to be able to boot the virtual hardware and run services, \glspl{VM} need a \gls{OS}.
Specific to the environment and features of the \gls{hypervisor}, there are different storage formats for the \gls{OS} files.
In general, these files consist of the \gls{OS} itself and installed applications files in order to run the desired services within the \gls{VM}.
Compared to running such a service on the host machine directly, one obvious overhead is that this requires a said \gls{OS} to be installed and configured once upfront, and virtually booted for every execution of the service.
In the case that several \gls{VM} run the same application, e.g. with different configuration files, each of them will have a separate copy of the \gls{OS} and the application files.
On the other hand, there are compatibility advantages, e.g. \glspl{VM} allow to run a different \glspl{OS} than the one running on the host machine.
This enables to create heterogeneous scenarios like running an \gls{OBSD} inside a virtual machine on a \gls{Linux} host, or vice versa.
However, there are cases when solely the applications running on top of the virtualized \gls{OS} is the required subject to virtualization.
In these cases, it's technically not necessary to have a separate virtualized \gls{OS} just to virtualize these application, especially if they are compatible with the same type of \gls{OS}.
This is one of the main scenarios for the use of \gls{osvirt} which is better suited, because it doesn't require a full-fledged \glspl{OS} to virtualize an application, as explained in the following section \ref{sect:sac-osvirt}.
\subsubsection{\gls{osvirt}}
\label{sect:sac-osvirt}
% TODO: explain generically what APC is and compare to Hypervisors
This technology has been under active development for about decade now\cite{Reshetova2014}, and is becoming more and more popular.
The popularity has exploded with one specific implementation named \gls{Docker}, which targets the \gls{Linux}-platform.
More about this historical and technological development is explained in section \ref{virt-advent-sac}.
\gls{osvirt} allows to virtualize applications on the \gls{OS} level instead of the machine level.
Its features are implemented in the kernel of the \gls{OS}, providing a low-overhead isolation and resource-control for user-space processes, in short: virtualization.
% TODO: refer to Linux Namespaces and Cgroups
\subsection{\gls{virtualization} Overview}
% TODO compare performance
\subsection{The Advent of \gls{sac}}
\label{virt-advent-sac}
Even though the underlying technology \gls{osvirt} had been available for a relatively long time, \gls{Docker}\cite{Fink2014}, since its release in 2014\footnote{http://blog.docker.com/2014/06/its-here-docker-1-0}, has brought \glspl{sac} to the attention and hands of the masses in the \gls{OSS} community.
From a psychological standpoint this is not surprising, as it has abstracted most of complexities of the technology, adding ease of deployment, a platform for hosting the \gls{saci} in a Docker specific format, as well as a very convenient way for building the like using Dockerfiles(TODO reference).
Its popularity has come to a point where the term \textit{Docker} is being used interchangeably with the \gls{sac} technology itself.
% TODO: references for this claim
% TODO introduce \gls{LXC}, \gls{systemd-nspawn}, \gls{Docker}, and \gls{rkt}.
More thorough examination of the questions and their presented problems is found in section \ref{sect:sd-challenges} of this chapter.
\section{Challenges of Software Development \& Deployment}
\label{sect:sd-challenges}
@ -92,7 +115,7 @@ In order to be executable on the target machine, the software needs to be transl
If the software is changed and updated, the cycle has to be repeated.
This represents a first challenge: software updates, or deployments in general, are not supposed to be negatively influenced by any previous version or state that exists on the target machine.
On the technical, this process starts with software developers who write software source code.
This code is then transformed and stored in executable binary files that contain specific platform-dependent machine-code with.
This code is then transformed and stored in executable binary files that contain specific platform-dependent machine-code.
The translation is done by processing the source code with a compiler toolchain.
The binary files are then made available as software packages that can be downloaded and installed on the target machines operating system.
Typically and ideally, this is done with the help of a software package manager, which itself is a software that is included in most modern operating systems.
@ -100,14 +123,13 @@ The location where the files of the package are installed on the target machine
Another challenge is to be able to verify that software hasn't been altered to differ from its intentional behavior at any point of the deployment process.
\section{State of the Art in \gls{osvirt}}
To analyze the current state of the art, different implementations of this technology.
The first part of this section analyzes the \gls{sacr} aspects of the implementations, while the second part demonstrates currently popular approaches to assemble \glspl{saci}.
\chapter{\glspl{saci}}
\section{No truly declarative method to create \gls{saci}}
\subsection{No proved methods to declare, reproduce, and trust the builds of \glspl{saci}}
% Statement of the Problem
% Arising from the background statement is this statement of the exact gap in the knowledge discussed in previous paragraphs that reviewed the most current literature found. A gap in the knowledge is the entire reason for the study, so state it specifically and exactly. Use the words “gap in the knowledge.” The problem statement will contain a definition of the general need for the study, and the specific problem that will be addressed.
\section{No Independent Verification Of The Content}
\section{Customization}
\chapter{Scope}
%TODO: explain scope of OSS systems running a \gls{Linux} based \gls{OS}.

View file

@ -86,10 +86,10 @@
\makeatletter
\renewcommand\paragraph{\startsection{paragraph}{4}{\z}%
{-3.25ex\plus -1ex \minus -.2ex}%
{0.0001pt \plus 0.2ex}%
{\normalfont\normalsize\bfseries}}
%\renewcommand\paragraph{\startsection{paragraph}{4}{\z}%
% {-3.25ex\plus -1ex \minus -.2ex}%
% {0.0001pt \plus 0.2ex}%
% {\normalfont\normalsize\bfseries}}
\renewcommand\subparagraph{\startsection{subparagraph}{5}{\z}%
{-3.25ex\plus -1ex \minus -.2ex}%
{0.0001pt \plus 0.2ex}%