diff --git a/src/docs/parts/context/context.tex b/src/docs/parts/context/context.tex index 71e3cff..49ed710 100644 --- a/src/docs/parts/context/context.tex +++ b/src/docs/parts/context/context.tex @@ -6,7 +6,7 @@ % Use plenty of transitional words and sentences from one section to another, as well as subheadings, which allow the reader to follow the writer’s train of thought. Following is an outline of the content of the empirical argument of Chapter 1. Universities often arrange the content in a different order, but the subject matter is the same in all dissertations because it is an empirical “opening statement” as might be found in a court of law. (Note that a dissertation could also be five pages of text and 50 pages of pictures of dragonfly wings and qualify for a Doctor’s degree in entomology.) %State the general field of interest in one or two paragraphs, and end with a sentence that states what study will accomplish. Do not keep the reader waiting to find out the precise subject of the dissertation. -This thesis is a scientific attempt to analyze and solve the practical problems of packaging and deploying \glspl{app} in the context of \gls{sac} technology. +This thesis is a scientific approach to analyze and solve the practical problems of packaging and deploying \glspl{app} in the context of \gls{sac} technology. For a lack of an official definition and common understanding what this technology is, the term \gls{sac} is defined in this chapter as a reference for the rest of the thesis. The two main drivers for this technology have been long standing problems in information technology: optimal utilization of hardware, and simplification of software deployment to said hardware. @@ -37,25 +37,28 @@ More thorough examination of these question is done in section \ref{sect:sd-chal \subsection{Introduction to Virtualization} % Background of the Problem % This section is critically important as it must contain some mention of all the subject matter in the following Chapter 2 Review of the Literature 2 and the methodology in Chapter 3. Key words should abound that will subsequently be used again in Chapter 2. The section is a brief two to four page summary of the major findings in the field of interest that cites the most current finding in the subject area. A minimum of two to three citations to the literature per paragraph is advisable. The paragraphs must be a summary of unresolved issues, conflicting findings, social concerns, or educational, national, or international issues, and lead to the next section, the statement of the problem. The problem is the gap in the knowledge. The focus of the Background of the Problem is where a gap in the knowledge is found in the current body of empirical (research) literature. -Virtualization has for decades been an important field in computer sciences, both in research and in the industry. +Since the development of the first \gls{VM} \gls{OS} \cite[p.~217-218]{Sarton1975}, \gls{virtualization} has been an important field in computer sciences, both in research and in the industry. It has been subject to constant development, improvement, and adoption. The basics of virtualization boil down to one principle. -It is the principle of controlling and monitoring the availability and the access to soft- and hardware resources for users, their applications and whole virtualized systems running on top of existing systems. +It is the principle of controlling and monitoring the availability and the access to soft- and hardware resources for users, their applications and whole virtual systems running on top of existing systems. Modern virtualization techniques can be grouped by two categories: \glspl{hypervisor} and \gls{osvirt}. \subsubsection{\glspl{hypervisor}} -A \gls{hypervisor}, synonymous to \gls{VM} Monitor, operates on a host machine and can control several \glspl{VM}. +The modern term \gls{hypervisor} is synonymous to the more self-explanatory terms control program \cite[p. 217]{Sarton1975} and \gls{VM} monitor. +The \gls{hypervisor} operates on a host machine and can control multiple \glspl{VM}. The principle is easy to understand, because one can simply picture one or many virtual computers running on a real computer. -\glspl{VM} are monitored programs which are presented with a set of virtual hardware resources that don't necessarily exist in the presented form on the underlying hardware machine they are being executed on. +\glspl{VM} are presented with a set of virtual hardware resources that don't necessarily exist in the presented form on the underlying hardware machine they are being executed on. -\glspl{VM} use a full \gls{OS} to boot the virtual hardware and run services. -This \gls{OS} typically consists of a kernel and a set of files, or virtual-disk-drives containing a set of files, containing the operating and application software to be executed on the virtual machine. -The virtual machine boots the kernel which then executes the operating system, typical to a machine boot process. -It allows to run a different OS, within the virtual machine than what is running on the host machine. -Further it allows to create heterogeneous scenarios like running \gls{OBSD} virtual machine on \gls{Linux}, or vice versa. -One significant overhead is that this requires a full-fledged \gls{OS} to be installed and configured once upfront, and virtually booted upon every execution. +In order to be able to boot the virtual hardware and run services, \glspl{VM} need a \gls{OS}. +Specific to the environment and features of the \gls{hypervisor}, there are different storage formats for the \gls{OS} files. +In general, these files consist of the \gls{OS} itself and installed applications files in order to run the desired services within the \gls{VM}. +Compared to running such a service on the host machine directly, one obvious overhead is that this requires a said \gls{OS} to be installed and configured once upfront, and virtually booted for every execution of the service. +In the case that several \gls{VM} run the same application, e.g. with different configuration files, each of them will have a separate copy of the \gls{OS} and the application files. +On the other hand, there are compatibility advantages, e.g. \glspl{VM} allow to run a different \glspl{OS} than the one running on the host machine. +This enables to create heterogeneous scenarios like running an \gls{OBSD} inside a virtual machine on a \gls{Linux} host, or vice versa. -However, there are cases when solely the applications running on top of the virtualized \gls{OS} is the required subject to virtualization, and it is technically not necessary to have a separate virtualized \gls{OS} for every virtualized application. +However, there are cases when solely the applications running on top of the virtualized \gls{OS} is the required subject to virtualization. +In these cases, it's technically not necessary to have a separate virtualized \gls{OS} just to virtualize these application, especially if they are compatible with the same type of \gls{OS}. This is one of the main scenarios for the use of \gls{osvirt} which is better suited, because it doesn't require a full-fledged \glspl{OS} to virtualize an application, as explained in the following section \ref{sect:sac-osvirt}. \subsubsection{\gls{osvirt}} @@ -65,7 +68,7 @@ This technology has been under active development for about decade now\cite{Resh The popularity has exploded with one specific implementation named \gls{Docker}, which targets the \gls{Linux}-platform. More about this historical and technological development is explained in section \ref{virt-advent-sac}. -\gls{osvirt} allows to virtualize applications on the \gls{OS} level instead of on the machine level. +\gls{osvirt} allows to virtualize applications on the \gls{OS} level instead of the machine level. Its features are implemented in the kernel of the \gls{OS}, providing a low-overhead isolation and resource-control for user-space processes, in short: virtualization. % TODO: refer to Linux Namespaces and Cgroups diff --git a/src/docs/thesis.bib b/src/docs/thesis.bib index 8cbe47d..3894490 100644 --- a/src/docs/thesis.bib +++ b/src/docs/thesis.bib @@ -1,36 +1,8 @@ -Automatically generated by Mendeley Desktop 1.16 +Automatically generated by Mendeley Desktop 1.16.3 Any changes to this file will be lost if it is regenerated by Mendeley. BibTeX export options can be customized via Options -> BibTeX in Mendeley Desktop -@inproceedings{Reshetova2014, -abstract = {The need for flexible, low-overhead virtualization is evident on many fronts ranging from high-density cloud servers to mobile devices. During the past decade OS-level virtualization has emerged as a new, efficient approach for virtualization, with implementations in multiple different Unix-based systems. Despite its popularity, there has been no systematic study of OS-level virtualization from the point of view of security. In this report, we conduct a comparative study of several OS-level virtualization systems, discuss their security and identify some gaps in current solutions.}, -archivePrefix = {arXiv}, -arxivId = {1407.4245}, -author = {Reshetova, Elena and Karhunen, Janne and Nyman, Thomas and Asokan, N}, -booktitle = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)}, -doi = {10.1007/978-3-319-11599-3_5}, -eprint = {1407.4245}, -file = {:home/steveej/.local/share/data/Mendeley Ltd./Mendeley Desktop/Downloaded/Reshetova et al. - 2014 - Security of OS-level virtualization technologies(2).pdf:pdf}, -isbn = {9783319115986}, -issn = {16113349}, -pages = {77--93}, -title = {{Security of OS-level virtualization technologies}}, -volume = {8788}, -year = {2014} -} -@article{Fink2014, -abstract = {Docker is a relatively new method of virtualization available natively for 64-bit Linux. Compared to more traditional virtualization techniques, Docker is lighter on system resources, offers a git-like system of commits and tags, and can be scaled from your laptop to the cloud.}, -author = {Fink, John}, -file = {:home/steveej/src/github/steveej/msc-thesis/papers/Docker - a Software as a Service, Operating System-Level Virtualization Framework.pdf:pdf}, -journal = {Code4Lib}, -number = {25}, -pages = {3--5}, -title = {{Docker: a Software as a Service, Operating System-Level Virtualization Framework}}, -url = {http://journal.code4lib.org/articles/9669}, -volume = {1}, -year = {2014} -} @book{Utrecht2006, abstract = {Software deployment is the set of activities related to getting$\backslash$r$\backslash$nsoftware components to work on the machines of end users. It includes$\backslash$r$\backslash$nactivities such as installation, upgrading, uninstallation, and so on.$\backslash$r$\backslash$nMany tools have been developed to support deployment, but they all$\backslash$r$\backslash$nhave serious limitations with respect to correctness. For instance,$\backslash$r$\backslash$nthe installation of a component can lead to the failure of previously$\backslash$r$\backslash$ninstalled components; a component might require other components that$\backslash$r$\backslash$nare not present; and it is generally difficult to undo deployment$\backslash$r$\backslash$nactions. The fundamental causes of these problems are a lack of$\backslash$r$\backslash$nisolation between components, the difficulty in identifying the$\backslash$r$\backslash$ndependencies between components, and incompatibilities between$\backslash$r$\backslash$nversions and variants of components.$\backslash$r$\backslash$n $\backslash$r$\backslash$nThis thesis describes a better approach based on a purely functional$\backslash$r$\backslash$ndeployment model, implemented in a deployment system called Nix.$\backslash$r$\backslash$nComponents are stored in isolation from each other in a Nix store.$\backslash$r$\backslash$nEach component has a name that contains a cryptographic hash of all$\backslash$r$\backslash$ninputs that contributed to its build process, and the content of a$\backslash$r$\backslash$ncomponent never changes after it has been built. Hence the model is$\backslash$r$\backslash$npurely functional.$\backslash$r$\backslash$n $\backslash$r$\backslash$nThis storage scheme provides several important advantages. First, it$\backslash$r$\backslash$nensures isolation between components: if two components differ in any$\backslash$r$\backslash$nway, they will be stored in different locations and will not overwrite$\backslash$r$\backslash$neach other. Second, it allows us to identify component dependencies.$\backslash$r$\backslash$nUndeclared build time dependencies are prevented due to the absence of$\backslash$r$\backslash$n"global" component directories used in other deployment systems.$\backslash$r$\backslash$nRuntime dependencies can be found by scanning for cryptographic hashes$\backslash$r$\backslash$nin the binary contents of components, a technique analogous to$\backslash$r$\backslash$nconservative garbage collection in programming language$\backslash$r$\backslash$nimplementation. Since dependency information is complete, complete$\backslash$r$\backslash$ndeployment can be performed by copying closures of components under$\backslash$r$\backslash$nthe dependency relation.$\backslash$r$\backslash$n $\backslash$r$\backslash$nDevelopers and users are not confronted with components' cryptographic$\backslash$r$\backslash$nhashes directly. Components are built automatically from Nix$\backslash$r$\backslash$nexpressions, which describe how to build and compose arbitrary$\backslash$r$\backslash$nsoftware components; hashes are computed as part of this process.$\backslash$r$\backslash$nComponents are automatically made available to users through "user$\backslash$r$\backslash$nenvironments", which are synthesised sets of activated components.$\backslash$r$\backslash$nUser environments enable atomic upgrades and rollbacks, as well as$\backslash$r$\backslash$ndifferent sets of activated components for different users.$\backslash$r$\backslash$n $\backslash$r$\backslash$nNix expressions provide a source-based deployment model. However,$\backslash$r$\backslash$nsource-based deployment can be transparently optimised into binary$\backslash$r$\backslash$ndeployment by making pre-built binaries (keyed on their cryptographic$\backslash$r$\backslash$nhashes) available in a shared location such as a network server. This$\backslash$r$\backslash$nis referred to as transparent source/binary deployment.$\backslash$r$\backslash$n $\backslash$r$\backslash$nThe purely functional deployment model has been validated by applying$\backslash$r$\backslash$nit to the deployment of more than 278 existing Unix packages. In$\backslash$r$\backslash$naddition, this thesis shows that the model can be applied naturally to$\backslash$r$\backslash$nthe related activities of continuous integration using build farms,$\backslash$r$\backslash$nservice deployment and build management.}, author = {Utrecht, Universiteit and Magnificus, Rector}, @@ -47,3 +19,40 @@ url = {http://www.st.ewi.tudelft.nl/{~}dolstra/pubs/phd-thesis.pdf}, volume = {56}, year = {2006} } +@inproceedings{Reshetova2014, +abstract = {The need for flexible, low-overhead virtualization is evident on many fronts ranging from high-density cloud servers to mobile devices. During the past decade OS-level virtualization has emerged as a new, efficient approach for virtualization, with implementations in multiple different Unix-based systems. Despite its popularity, there has been no systematic study of OS-level virtualization from the point of view of security. In this report, we conduct a comparative study of several OS-level virtualization systems, discuss their security and identify some gaps in current solutions.}, +archivePrefix = {arXiv}, +arxivId = {1407.4245}, +author = {Reshetova, Elena and Karhunen, Janne and Nyman, Thomas and Asokan, N}, +booktitle = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)}, +doi = {10.1007/978-3-319-11599-3_5}, +eprint = {1407.4245}, +file = {:home/steveej/.local/share/data/Mendeley Ltd./Mendeley Desktop/Downloaded/Reshetova et al. - 2014 - Security of OS-level virtualization technologies(5).pdf:pdf}, +isbn = {9783319115986}, +issn = {16113349}, +pages = {77--93}, +title = {{Security of OS-level virtualization technologies}}, +volume = {8788}, +year = {2014} +} +@book{Sarton1975, +author = {Sarton, George}, +doi = {10.1007/978-3-319-33138-6}, +file = {:home/steveej/src/github/steveej/msc-thesis/papers/A Computing History Primer.pdf:pdf}, +isbn = {0882751727 (o.c.)}, +pages = {145}, +title = {{Introduction to the history of science.}}, +year = {1975} +} +@article{Fink2014, +abstract = {Docker is a relatively new method of virtualization available natively for 64-bit Linux. Compared to more traditional virtualization techniques, Docker is lighter on system resources, offers a git-like system of commits and tags, and can be scaled from your laptop to the cloud.}, +author = {Fink, John}, +file = {:home/steveej/src/github/steveej/msc-thesis/papers/Docker - a Software as a Service, Operating System-Level Virtualization Framework.pdf:pdf}, +journal = {Code4Lib}, +number = {25}, +pages = {3--5}, +title = {{Docker: a Software as a Service, Operating System-Level Virtualization Framework}}, +url = {http://journal.code4lib.org/articles/9669}, +volume = {1}, +year = {2014} +}