{"id":6189,"date":"2023-10-26T19:11:31","date_gmt":"2023-10-26T17:11:31","guid":{"rendered":"https:\/\/i4wpdev.cs.fau.de\/?page_id=6189"},"modified":"2023-11-02T10:41:09","modified_gmt":"2023-11-02T09:41:09","slug":"pave-note","status":"publish","type":"page","link":"https:\/\/i4wpdev.cs.fau.de\/en\/research\/pave-note","title":{"rendered":"PAVE Note"},"content":{"rendered":"<h4>Objective<\/h4>\r\n<p>\r\nThe <a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/pave\">PAVE<\/a> project pursues a far-reaching approach and\r\nproposes the concept of an operating system that provides a\r\n<em>power-fail aware byte-addressable virtual non-volatile memory<\/em> based on a\r\ncombination of DRAM and NVRAM. The operating system manages the virtual non-volatile memory\r\ntransparently for the machine programs (i.e., application processes) and tries,\r\nas far as possible, to take advantage of the persistence properties introduced\r\nby NVRAM. This integration of NVRAM into the VM level should\r\n<cite>pave the way<\/cite> &ndash; for application and system programs &ndash; for the use of NVRAM,\r\nboth for legacy software as well as for newly developed programs. In addition,\r\n<em>capacity scaling<\/em> is to be supported and achieved by the operating system\r\nin a flexible manner in order to increase the performance and improve the energy\r\nefficiency of the computing system.\r\n<\/p>\r\n\r\n<h5>Memory-hierarchy integration<\/h5>\r\n<p>\r\nAn &#8220;ideal computing system&#8221; is assumed that uses DRAM, if at all, only to\r\nhide the higher access times or higher latency that still exist with NVRAM.\r\nSimilar to the way DRAM is already used as a cache for an ordinary\r\nVM subsystem based on a kind of block-oriented non-volatile backing store,\r\nit will now become a cache for a byte-addressable non-volatile main memory.\r\nUnlike approaches that require application-level software changes for NVRAM use,\r\n<a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/pave\">PAVE<\/a> provides a transparent solution by default that\r\nautomatically makes NVRAM accessible in the virtual address space of a process,\r\neither implicitly as high capacity volatile memory (anonymous memory) or as\r\nnamed persistent data accessible via the standard file system interface.\r\nIn addition, an &#8220;expert interface&#8221; is offered in order to break this\r\ntransparency if desired and thereby bring the NVRAM or parts thereof under own\r\nadministration and control. The crucial point here is that despite NVRAM, the\r\nprocesses within this virtual (non-volatile memory) address space\r\nare not confronted with inconsistencies resulting from\r\nincomplete but interrupted or complete but repeated operations\r\nregarding persistent data and\r\ndo not have to be designed to be fail-safe,\r\nrespectively:\r\nProcess-state backup in the event of an interrupt makes the power failure functionally transparent!\r\n<\/p>\r\n\r\n<p>\r\nThe targeted VM subsystem transparently migrates pages between\r\nthe two types of memory according to access statistics\r\nduring normal operation and ensures\r\nthat modified pages are never propagated back into the storage hierarchy\r\nbefore a consistent, recoverable state is reached in NVRAM.\r\nIn serious exceptional cases, such as a power fail,\r\nall dirty pages related to logically persistent data that currently reside\r\nin fast DRAM are flushed back into NVRAM.\r\nStatic and dynamic analysis of the width of the remaining energy\r\nwindow on the one hand and the duration of the backup of the entire relevant\r\nvolatile system state on the other hand are used to define parameters that\r\nguarantee the backup process.\r\nBased on these parameters, the VM subsystem always ensures an upper bound of modified\r\nNVRAM pages in the cache to guarantee fail-safety.\r\nLast but not least, the\r\navailability of such an NVRAM page cache should also facilitate and accelerate\r\nthe management of typical persistent metadata of an operating system. This\r\nprimarily concerns metadata for restarting the system, the VM metadata related\r\nto non-volatile memory regions but also metadata of a\r\nfile system that usually ensures the persistence of data in the computer\r\nsystem.\r\nIn contrast to a solution\r\nwhich is cast in hardware or corresponds to a combined suspend to RAM\/disk model,\r\n<a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/pave\">PAVE<\/a> is anchored in the operating system,\r\nallows using large amounts of NVRAM just like DRAM, and\r\npursues a needs-based approach,\r\nmaking it more flexible and transferable to common computer systems\r\nof almost any size &ndash; assuming a stock paging MMU.\r\n<\/p>\r\n\r\n<h5>Transparency for legacy software<\/h5>\r\n<p>\r\n<a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/pave\">PAVE<\/a> will provide an operating-system platform\r\nthat allows billions of lines of legacy code to benefit from NVRAM immediately and transparently\r\nas high capacity main memory and I\/O-intensive applications implicitly get access\r\nto large amounts of readily available file data cached in NVRAM.\r\nModified file data held in the persistent VM page cache simultaneously doubles\r\nas an implicit redo log without extra costs when all operations on a file completed\r\nsuccessfully. In this case, the changes can be propagated from NVRAM to the underlying\r\nstorage hierarchy any time the system sees fit, such as in times of low system activity\r\nor if the amount of available NVRAM is getting low.\r\nOtherwise, all changes can simply be dropped and the persistent data are\r\nautomatically reverted to the last consistent state.\r\nThus, common <tt>open()<\/tt>\/<tt>close()<\/tt> operations receive transaction-like semantics as\r\nlong as the underlying file system does not crash and corrupt the file&#8217;s data.\r\n<\/p>\r\n\r\n<p>\r\nWe will integrate the above mentioned mechanisms and strategies into the\r\nVM subsystem of FreeBSD rather than implementing a complete new system from scratch.\r\nOur choice fell onto FreeBSD because it provides an efficient and mature VM subsystem with\r\na well documented VM design rooted in the Mach VM subsystem.\r\nThe latter offers a clear distinction between hardware dependent MMU tables\r\nand the logical structure of address spaces. This design significantly eases\r\nour approach to use minimally invasive changes to make the VM subsystem NVRAM-aware.\r\nIn particular, we intend to reuse most existing and highly-tuned mechanisms like\r\nthe gathering of access statistics and replacement strategies\r\nreadily available in FreeBSD without disturbing its complex &#8220;VM machinery&#8221;.\r\nThe VM metadata structures related to the page cache (such as data describing the resident set of pages) then needs to be kept persistent as well and manipulated in a highly-efficient, transactional manner.\r\n<\/p>\r\n\r\n<h5>Overview chart<\/h5>\r\n<p>\r\nAs just mentioned,\r\nthe approach followed with PAVE as outlined here uses FreeBSD as the base operating system.\r\nThe adjacent figure roughly outlines the PAVE function blocks.\r\nFirst of all, <strong>capacity scaling<\/strong> of the main memory, in order to maintain the growth in performance and enable new applications.\r\n\r\n<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i4wpdev.cs.fau.de\/wp-content\/uploads\/2022\/07\/out-1024x576.png\" alt=\"pave overview\" align=\"right\"\/>\r\n\r\n<!--\r\n<a href=\"https:\/\/i4wpdev.cs.fau.de\/wp-content\/uploads\/2022\/07\/out.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i4wpdev.cs.fau.de\/wp-content\/uploads\/2022\/07\/out-1024x576.png\" alt=\"pave overview\" class=\"aligncenter size-large wp-image-3910\" srcset=\"https:\/\/i4wpdev.cs.fau.de\/wp-content\/uploads\/2022\/07\/out-1024x576.png 1024w, https:\/\/i4wpdev.cs.fau.de\/wp-content\/uploads\/2022\/07\/out-300x169.png 300w, https:\/\/i4wpdev.cs.fau.de\/wp-content\/uploads\/2022\/07\/out-768x432.png 768w, https:\/\/i4wpdev.cs.fau.de\/wp-content\/uploads\/2022\/07\/out-1536x864.png 1536w, https:\/\/i4wpdev.cs.fau.de\/wp-content\/uploads\/2022\/07\/out-60x34.png 60w, https:\/\/i4wpdev.cs.fau.de\/wp-content\/uploads\/2022\/07\/out-427x240.png 427w, https:\/\/i4wpdev.cs.fau.de\/wp-content\/uploads\/2022\/07\/out-480x270.png 480w, https:\/\/i4wpdev.cs.fau.de\/wp-content\/uploads\/2022\/07\/out-836x470.png 836w, https:\/\/i4wpdev.cs.fau.de\/wp-content\/uploads\/2022\/07\/out.png 1920w\" \/><\/a>\r\n-->\r\n\r\nRight next to it is <strong>NVRAM virtualisation<\/strong>, that is, the provision of a system abstraction in order to hide the shallows of NVRAM from machine programs.\r\nBoth modules use a DRAM-based cache of the working set of pages of the processes running virtually directly in the NVRAM. A computing system is assumed that needs DRAM only to hide the higher latency or access times that still exist with NVRAM.\r\n<\/p>\r\n\r\n<p>\r\nThe physically volatile but logically non-volatile contents of this page cache together with the volatile contents of the CPU registers and hardware caches form the system state that is backed up to the NVRAM in the event of a power failure (<a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/redos-note\">REDOS<\/a>). It is to be guaranteed that the time required to back up this state must not exceed the time for which the residual energy window of the power supply unit will keep the computing system alive.\r\nFor this purpose, both the energy costs of the backup procedure and the NVRAM write bandwidth are determined and the electrical characteristics of the computer\u2019s power supply unit is measured. Not least, the FreeBSD-based system software is subjected to a power-failure sensitivity detection that identifies critical instruction sequences whose semantic integrity is endangered by a sudden power failure during write access to NVRAM. These examinations rely on own tools for compiler-based operating-system tailoring as well as static program analysis to predict time and energy costs.\r\n<\/p>\r\n\r\n<p>\r\nFor memory-hierarchy integration appropriate is a two-level hierarchy of software-managed caches that uses NVRAM as a buffer for data in conventional storage and DRAM as a buffer for NVRAM pages. The buffering of the pages in the DRAM is subject to <strong>strict time guarantees<\/strong> so that this logically persistent data can be reliably consolidated in the NVRAM again in exceptional cases. That is, in order to survive power failures, the maximum size of this DRAM page cache is to be aligned to the size of the remaining energy window in the power supply.\r\n<!--\r\nIn addition, energetic methods are designed that improve the mutual interaction of the operating system and NVRAM in such a way that the persistence properties of NVRAM are used to increase energy efficiency. Static NVRAM sleep modes are provided that actively reduce power consumption orthogonally to dynamic runtime improvements by NVRAM governors in the operating system.\r\n-->\r\n<\/p>\r\n\r\n<p>\r\nLast but not least,\r\na persistent runtime system suitable for use within operating-system kernels offers supporting functions to enable the PAVE modules of FreeBSD to be executed directly in the NVRAM. The focus is on abstractions that help critical \/ sensitive sections run non-blocking or even wait-free.\r\n<\/p>\r\n\r\n<h4>Project results and findings<\/h4>\r\n<p>\r\nThe first comprehensive <a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/pave\">PAVE<\/a> measure was the <strong>&#8220;<a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/nvux-oid\">NVRAM-ification<\/a>&#8220;<\/strong> of FreeBSD, that is, the provision of a FreeBSD that, including the machine programs run by it, operates exclusively from NVRAM. The concept for this was presented at <a href=\"https:\/\/www.dagstuhl.de\/en\/seminars\/seminar-calendar\/seminar-details\/22341\">Dagstuhl Seminar 22341<\/a> prior to the work, implementation in practice as well as an evaluation of &#8220;NVM-only&#8221; FreeBSD are documented in a <a href=\"https:\/\/doi.org\/10.25593\/issn.2191-5008\/CS-2023-01\">Technical Report<\/a> and a contribution to <a href=\"http:\/\/dx.doi.org\/10.1007\/978-3-031-42785-5_11\">ARCS 2023<\/a>. However, the FreeBSD variant described in these papers is not yet <a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/redos-note\">REDOS<\/a>-ed.\r\n<\/p>\r\n\r\n<p>\r\nFor the <strong>customization<\/strong> of FreeBSD investigations ran to the dynamic updating and specialization of programs in general and operating systems in the special one. A basic consideration for this was the question of whether the undoubtedly application-dependent strategies for <strong>capacity scaling<\/strong> and <strong>NVRAM-page caching<\/strong> &ndash; as typical candidates for dynamic reconfiguration at runtime &ndash;\r\nmay thus require dynamic adaptations to the respective call environment, since no prior knowledge is available at FreeBSD build time. Work on such update and specialization techniques could be published at <a href=\"https:\/\/www.usenix.org\/system\/files\/atc23-heinloth.pdf\">USENIX ATC 2023<\/a> and <a href=\"https:\/\/doi.org\/10.1145\/3623759.3624551\">PLOS 2023<\/a>.\r\n<\/p>\r\n\r\n<p>\r\nAs convenient as FreeBSD&#8217;s dynamic customization may be depending on the conditions defined by the particular application profile, it has been shown that such a measure is not absolutely necessary for the already demanding work on <a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/pave\">PAVE<\/a>. Rather, such a &#8220;nice to have&#8221; work for the purpose of <a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/pave\">PAVE<\/a> is now being pursued in the context of <a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/doss\">DOSS<\/a>. \r\n<\/p>\r\n\r\n<!--\r\n[cris show=publications publication=302416058]\r\n[cris show=publications publication=310058062]\r\n-->\r\n<\/p>\r\n\r\n<h4>Ongoing work<\/h4>\r\n<p>\r\n<strong>Suspend\/Resume with NVRAM<\/strong>&nbsp;&nbsp;\r\nBased on the FreeBSD kernel fully running in NVRAM, generally applicable <em>suspend-to-NVRAM<\/em> and <em>resume-from-NVRAM<\/em> functions are under development, respectively &ndash; providing the minimal subset of <a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/redos-note\">REDOS<\/a> system functions.\r\nThe mechanism will write all volatile CPU state into NVRAM, and shuts down the system upon a <em>suspend<\/em> request.\r\nWhen powered on, the system the will retrieve the saved state from NVRAM and restore the volatile hardware state.\r\nThus, execution of kernel and userland can resume as if no interruption of service has occurred.\r\n<\/p>\r\n\r\n<p>\r\n<strong>Analysis of IRQ-induced latency<\/strong>&nbsp;&nbsp;\r\nInterrupts are a mechanism by which devices can signal the occurrence of events.\r\nA possible event can be the imminent power failure caused by a power outage.\r\nIn order to estimate the response time for handling such an event, it is necessary to estimate the possible delay in the interrupt handling routine.\r\nThis work, which is required for <a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/redos-note\">REDOS<\/a>, allows an indication of the amount of time still available to save the processor state in a residual power window given at power failure.\r\nDelays can be caused by other interrupts being processed or by critical sections and can be identified by the CPU masking all incoming interrupt requests.\r\nThese sections can be identified in the source code and may even be assigned a maximum amount of cost they will incur before a power failure interrupt can be serviced.\r\n<\/p>\r\n\r\n<h4>Related projects<\/h4>\r\n<p>\r\nThe sister project <a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/NEON\">NEON<\/a> looks at stripping operating systems of persistence measures that are unnecessary if the operating system in question runs entirely from NVRAM.\r\nThe subject of the investigation is the additional effort of such measures typical for volatile main memory (e.g. DRAM) during runtime in terms of time, energy and space.\r\nThrough such <strong>purification<\/strong> of an NVM-only operating system, lower background noise (less time\/energy consumption) and a smaller trusted computing base (fewer lines of code) are expected.\r\nThis work uses Linux as the base operating system.\r\n<\/p>\r\n\r\n<p>\r\nThe <a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/respect\">ResPECT<\/a> project, which focuses on <strong>embedded communication systems<\/strong>, is developing a holistic operating system and communication protocol concept, which assumes that the transfer of information (receiving control data for actuators or sending sensor data) is the core task of almost all networked nodes.\r\n<\/p>\r\n\r\n<p>\r\nWhat all three projects have in common is the approach to a <em>residual energy dependent NVRAM-based operation shutdown<\/em> (<a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/research\/redos-note\">REDOS<\/a>).\r\n<\/p>\r\n\r\n<h4>Project staff<\/h4>\r\n<ul>\r\n<strong>Principal investigators:<\/strong>\r\n<a href=\"https:\/\/www.b-tu.de\/en\/fg-betriebssysteme\/team\/professor\">J\u00f6rg Nolte<\/a>,\r\n<a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/person\/schroeder-preikschat\">Wolfgang Schr\u00f6der-Preikschat<\/a><br>\r\n<strong>Postgraduates:<\/strong>\r\n<a href=\"https:\/\/www.b-tu.de\/en\/fg-betriebssysteme\/team\/employees\/oliver-giersch\">Oliver Giersch<\/a>,\r\n<a href=\"https:\/\/i4wpdev.cs.fau.de\/en\/person\/nguyen\">Dustin Nguyen<\/a><br>\r\n<strong>Student assistants:<\/strong>\r\nKarl Bartholom\u00e4us,\r\nOle Wiedemann<br>\r\n<\/ul>\r\n\r\n<!--\r\n<h4>Introductory Video<\/h4>\r\n[i4include]research\/pave\/introduction.html[\/i4include]\r\n-->","protected":false},"excerpt":{"rendered":"Objective The PAVE project pursues a far-reaching approach and proposes the concept of an operating system that provides a power-fail aware byte-addressable virtual non-volatile memory based on a combination of DRAM and NVRAM. The operating system manages the virtual non-volatile memory transparently for the machine programs (i.e., application processes) and tries, as far as possible, [&hellip;]","protected":false},"author":10,"featured_media":0,"parent":2103,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_crdt_document":"","_rrze_multilang_single_locale":"en_US","_rrze_multilang_single_source":"https:\/\/i4wpdev.cs.fau.de\/?page_id=6180","footnotes":""},"page_category":[],"class_list":["post-6189","page","type-page","status-publish","hentry","en-US"],"_links":{"self":[{"href":"https:\/\/i4wpdev.cs.fau.de\/wp-json\/wp\/v2\/pages\/6189","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/i4wpdev.cs.fau.de\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/i4wpdev.cs.fau.de\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/i4wpdev.cs.fau.de\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/i4wpdev.cs.fau.de\/wp-json\/wp\/v2\/comments?post=6189"}],"version-history":[{"count":56,"href":"https:\/\/i4wpdev.cs.fau.de\/wp-json\/wp\/v2\/pages\/6189\/revisions"}],"predecessor-version":[{"id":6381,"href":"https:\/\/i4wpdev.cs.fau.de\/wp-json\/wp\/v2\/pages\/6189\/revisions\/6381"}],"up":[{"embeddable":true,"href":"https:\/\/i4wpdev.cs.fau.de\/wp-json\/wp\/v2\/pages\/2103"}],"wp:attachment":[{"href":"https:\/\/i4wpdev.cs.fau.de\/wp-json\/wp\/v2\/media?parent=6189"}],"wp:term":[{"taxonomy":"page_category","embeddable":true,"href":"https:\/\/i4wpdev.cs.fau.de\/wp-json\/wp\/v2\/page_category?post=6189"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}