FREE ELECTRONIC LIBRARY - Abstracts, online materials

Pages:     | 1 || 3 | 4 |   ...   | 18 |

«Efficient, scalable consistency for highly fault-tolerant storage GARTH GOODSON August 2004 CMU-CS-04-111 Dept. of Electrical and Computer ...»

-- [ Page 2 ] --

Second, we assume that comprehensive object versioning at each metadata node is efficient. Previous studies have shown that versioning nodes can offer performance that is typically within 10% of a non-versioning node [Strunk et al. 2000]. As well, modern disks have the capacity required to version objects comprehensively [Strunk et al. 2000;

Soules et al. 2003].

· 6 Efficient, scalable consistency for highly fault-tolerant storage Third, we assume that objects exported through the protocols, designed properly, will experience low access concurrency. Most file system studies conclude that file sharing is rare. For example, our R/CW objects support conditional write operations that update multiple objects atomically. This, in turn, permits us to utilize fine-grained metadata objects, which reduces access concurrency for these objects. Thus, a separate attribute object can be maintained for each file, rather than including file attributes in directory objects.

System model

There are a number of system model assumptions that hold for all protocols developed.

The system model is more formally described in Section 3.1, but can be summarized as follows.

Each data-item is hosted by a static number of storage-nodes; i.e., once the data-item has been created, the set of storage-nodes on which that data-item can exist is fixed. There are an arbitrary number of clients in the system. Both storage-nodes and clients may suffer Byzantine faults [Lamport et al. 1982].

All protocols are developed within an asynchronous model of time (i.e., no assumptions are made about message transmission delays or execution rates). Channels are assumed to be point-to-point, authenticated, and adhere to finite duplication and fair loss properties [Aguilera et al. 2000]; see 3.1 for a complete description of the system model.

1.3.3 Applying the protocols to the PASIS storage system

–  –  –

The R/W protocol underlies the PS service. It provides block granularity read/write access to data objects. Data objects are variable length data containers named by a unique object identifier. The R/W protocol allows for the use of space-efficient data encodings.

To demonstrate that our protocol is efficient in practice, we compare its performance to BFT [Castro and Liskov 2001; 2002], the Byzantine fault-tolerant replicated state machine implementation that Castro and Liskov have made available [Castro and Rodrigues 2003]. Experiments show that the PS scales better than BFT in terms of network utilization at the server and in terms of work performed by the server. Experiments also show that response times of PASIS and BFT are comparable. Additionally, experiments show that the response time graphs of the PASIS R/W prototype are flat as the number of faults tolerated is scaled up.

Two types of metadata objects are implemented: attributes and directories. Attributes objects exist for both directory objects and for files. The attributes map directly to typical UNIX file permissions. Directory objects hold multiple directory entries. Each directory entry stores the names and access information for the files and directories stored within the storage system. The access information specifies how the named object can accessed.

If the named object is a file, the access information is specific to the PS service implementation (e.g., where the file is located, the encoding of the file, etc.).

The PMD service is evaluated in the context of a complete file system implemented as a NFS server. It can use either the PS service to store data, or it can be configured to store data locally in its local file system. When storing data locally, experiments show that the PMD service’s throughput scales as load (number of NFS servers) is increased and response time only gradually increases as the number of faults tolerated is scaled up.

As well, experiments show that the performance degrades gracefully when concurrency is introduced, even at very high concurrency levels. Finally, when the PS service is used in conjunction with the PMD service in a configuration capable of tolerating a single Byzantine fault, the run time of an OpenSSH build is within a factor of two of a non-fault tolerant user-level NFS server.

· 8 Efficient, scalable consistency for highly fault-tolerant storage

1.4 Organization

The remainder of this thesis is organized as follows. Chapter 2 describes background and related work. It is broken into a discussion of atomic read/write objects (or registers) that pertains to block based storage and a discussion of systems/protocols capable of providing consistency and fault-tolerance for operations performed on arbitrary objects. Chapter 3 develops the R/W protocol for block-based storage. The system model, constraints on the number of storage-nodes, and the implementation and evaluation of the protocol are described. Chapter 4 describes the R/CW protocol for block based storage. The protocol is developed similarly to the R/W protocol. Chapter 5 extends the R/CW protocol to provide consistency for operations performed over arbitrary objects (i.e., the Q/U protocol). As well, the chapter describes the design and implementation of the PASIS storage system that utilizes both the Q/U protocol and the R/W to provide strong consistency, fault-tolerance, and scalability to its clients. The storage system is then evaluated in terms of a distributed NFSv3 storage system. The last chapter, Chapter 6, concludes and provides future directions for this work. Finally, a set of appendices provide proofs of safety for the consistency protocols developed within.

2 Background and Related Work This chapter describes background and related work related to the construction of scalable and fault-tolerant distributed storage systems. First, the components that comprise a storage system are described. Second, data encoding schemes that can be used to improve space-efficiency are introduced. Third, consistency semantics and protocols for tolerating benign and Byzantine faults are described. Fourth, and lastly, work related to the scalability of metadata services is discussed.

2.1 Storage system overview

Traditionally, disk-based storage systems have been built around a centralized monolithic disk array or mainframe. While these systems have been shown to provide good reliability and performance, they have a number of weaknesses. First, the hardware is highly customized and very expensive to build. Second, these systems are hard to scale to very large sizes. Third, the range of faults they are able to handle is limited (e.g., benign single, or possibly double, disk failures).

This thesis describes protocols that can be used to build a Byzantine fault-tolerant, decentralized storage architecture to help solve these problems. First, by tolerating Byzantine faults cheaper, off-the-shelf, components can be used since hardware and software bugs can be masked by the fault-tolerance provided by the underlying storage protocols.

Second, these systems are more scalable in that the addition of new storage-nodes yields improvements in the capacity, throughput or fault-tolerance of the service. Third, faulttolerance is gained by designing the storage protocols to withstand arbitrary (Byzantine) · 10 Efficient, scalable consistency for highly fault-tolerant storage failures of clients and a limited number of metadata-nodes, and by requiring no timing (synchrony) assumptions for correctness. However, in this type of architecture, there is no centralized control, making it difficult to provide consistency in the face of faults and concurrency.

2.1.1 File service

This work focuses on developing protocols that can be used to construct a decentralized, fault-tolerant file based storage-system. Traditional file systems are comprised of both metadata and data services. The data service is responsible for storing file data, while the metadata service stores data about how and where the file data is stored (e.g., block pointers within inodes), as well as other metadata that describes the file (e.g., attributes, access control information, etc.). Metadata is often stored within the data service and is accessed by recursing through a set of structures rooted at a well-known location.

In these systems, fault-tolerance for both the data and metadata can be obtained by distributing the data service in a fault-tolerant manner. Frangipani [Thekkath et al. 1997] is an example of this type of system. It is a distributed file system that is built above a virtual disk interface exported by Petal [Lee and Thekkath 1996] and a distributed lock service. Petal can tolerate one or more disk or storage-node failures, as long as the majority of the storage-nodes are up and communicating, and as long as at least one replica of each data-item remains.

Other systems explicitly separate the metadata service from the data service. For example, NASD [Gibson et al. 1998] demonstrated that by separating metadata access from data access greater scalability could be achieved at a lower cost. Instead of forcing all operations through a centralized file server, NASD eliminated the file server from the data flow path by allowing clients to directly access the data storage-nodes. To increase faulttolerance the centralized metadata server can be distributed as a fault-tolerant service. For example, Farsite [Adya et al. 2002] utilizes a Byzantine fault-tolerant agreement protocol (BFT [Castro and Liskov 1998a]) to protect the integrity of its metadata, while allowing file data to be stored on a user’s desktop machine.


2.1 Storage system overview 11

Consistency semantics

Consistency semantics can differ for data versus metadata. Most block based data services, disk drives being the most common, expect whole block updates (i.e., an entire block is always overwritten). On the other hand, metadata services often allow arbitrary data regions to be updated independently (e.g., a single directory entry may be altered within a directory).

For block updates, it is sufficient to support read–write update semantics. Read-write update semantics make no guarantees about the value of the data block between the time the block was read and later written. These semantics are sufficient for block stores, since consistency is guaranteed on a block-level and blocks are usually read and written as atomic units. The PASIS read–write (R/W) protocol is described in Chapter 3 and provides the consistency semantics required for block based storage.

In order to support consistent updates to metadata, metadata objects (e.g., directories) require update operations that modify their existing contents, rather than blindly overwriting their previous contents; otherwise, their integrity may not be preserved. Read–modify– write semantics guarantee that the data region has not been modified between a read and a successive write operation to the same data region. It is also necessary to support atomic updates across multiple objects (e.g., when renaming or moving files from one directory to another). Metadata services are often built upon protocols that provide consistent access to objects that can be manipulated through arbitrary operations (i.e., not just read and write operations). In the PASIS metadata service, the underlying read–conditional write (R/CW) protocol is described in Chapter 4, while the query/update (Q/U) protocol, that extends the R/CW protocol to provide replicated state-machine semantics, and the metadata service itself is described in Chapter 5.

This thesis describes a set of protocols that provide the consistency necessary to implement fault-tolerant data and metadata services.

· 12 Efficient, scalable consistency for highly fault-tolerant storage 2.1.2 Storage-system goals One central goal in the design of storage systems is to simultaneously provide efficiency, scalability, and fault-tolerance. Current storage systems, and their underlying protocols,

fall short in one or more of the following areas:

– High fault-tolerance: To provide access to data in the event of multiple client and/or server failures (in the case of both crash and Byzantine faults), as opposed to tolerating only a single failure as can be handled by most other distributed storage systems. First, data must be spread redundantly across the set of storage-nodes.

Second, no central points of failure should exist. This can be achieved by using decentralized consistency protocols with no single points of failure.

– Strong consistency: To provide strong consistency in the face of failures (of clients or servers) and concurrent operations (e.g., read-write concurrency, write-write concurrency). In decentralized storage systems, where data is spread across multiple storage-nodes, it is usually important to ensure that readers and writers always see a consistent view of data, especially in the face of concurrency and failures.

Although this is a goal that we want of our storage systems, not all applications require strong consistency. As well, the consistency semantics required of block level storage versus metadata is different. At the metadata level, it is important to offer consistency of metadata operations which may span multiple objects.

–  –  –

the worst case generally provides support for many system and failure model assumptions, efficiency and scalability are always limited to that of the worst case environment.

2.2 Data encodings A common data distribution scheme used in distributed storage systems is replication, in which a writer stores a replica of the new data-item value at each storage-node to which it sends a write request. Since each storage-node has a complete instance of the data-item, the main difficulty is identifying and retaining the most recent instance. It is often necessary for a reader to contact multiple storage-nodes to ensure that it sees the most recent instance. Examples of distributed storage systems that use this design include Harp [Liskov et al. 1991], Petal [Lee and Thekkath 1996], BFS [Castro and Liskov 1998a], and Farsite [Adya et al. 2002].

Pages:     | 1 || 3 | 4 |   ...   | 18 |

Similar works:

«J Value Inquiry DOI 10.1007/s10790-010-9240-2 BOOK REVIEW Thomas Scanlon, Moral Dimensions: Permissibility, Meaning, Blame Harvard University Press, 2008, 246 pp., $25.60 (hbk), ISBN 9780674031784 Kevin Vallier Ó Springer Science+Business Media B.V. 2010 In Moral Dimensions: Permissibility, Meaning, Blame, Thomas Scanlon challenges moral philosophers with a subtle analysis of how permissibility, meaning and blame are to be understood. Scanlon’s challenge is significant not only because he...»

«University of Iowa Iowa Research Online Theses and Dissertations Spring 2009 Acute neural adaptations to resistance training performed with low and high rates of muscle activation Clayton Robert Peterson University of Iowa Copyright 2009 Clayton Robert Peterson This dissertation is available at Iowa Research Online: http://ir.uiowa.edu/etd/257 Recommended Citation Peterson, Clayton Robert. Acute neural adaptations to resistance training performed with low and high rates of muscle activation....»

«Testing the theoretical relationship between Taguchi’s robust design philosophy and Lean to improve manufacturing performance through continual improvement – an empirical study involving a large apparel manufacturer Pramila Gamage, Nihal P. Jayamaha, Nigel P. Grigg Massey University, School of Engineering and Advanced Technology, Palmerston North, New Zealand P.Gamage@massey.ac.nz pramila.gamage@gmail.com, N.P.Jayamaha@massey.ac.nz, N.Grigg@massey.ac.nz N. K. B. M. P. Nanayakkara Department...»

«International Journal of Advanced Multidisciplinary Research and Review Volume 1, No.:1, 2013 Winter Pages: 32 49 The Doctrinal Peculiarity of 19th Century Adventism: Teaching About The Trinity Professor Bernard Kozirog, PhD1 This article presents an approach to the Seventh-day Adventist doctrine of the Trinity. In the Adventist‟s philosophy this view evolved from antitrinitarianism to trinitarianism. In the nineteenth century, Seventh-day Adventists modeled on other religious denominations,...»


«Philosophical Perspectives, 23, Ethics, 2009 INTENTION, PERMISSIBILITY, TERRORISM, AND WAR Jeff McMahan Rutgers University 1 Introduction There are many important moral beliefs that have been comparatively stable over time and across cultures that seem to presuppose that the intention with which one acts can affect the permissibility of one’s action. Until about forty years ago, the consensus among moral philosophers was that these beliefs are indeed best explained and justified by the idea...»

«A FISTFUL OF FACTS: RECONSIDERING DZIGA VERTOV’S CINEMATIC TRUTH by Peter Salomone A Thesis Submitted to the Faculty of The Wilkes Honors College in Partial Fulfillment of the Requirements for the Degree of Bachelor of Arts in Liberal Arts and Sciences with a Concentration in Philosophy Wilkes Honors College of Florida Atlantic University Jupiter, Florida May 2008 A FISTFUL OF FACTS: RECONSIDERING DZIGA VERTOV’S CINEMATIC TRUTH by Peter Salomone This thesis was prepared under the direction...»

«Tol, Xeer, and Somalinimo: Recognizing Somali and Mushunguli Refugees as Agents in the Integration Process A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Vinodh Kutty IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY David M. Lipset July 2010 © Vinodh Kutty 2010 Acknowledgements A doctoral dissertation is never completed without the help of many individuals. And to all of them, I owe a deep debt of gratitude....»

«Journal of Cosmology (2012), Vol. 17, No. 23, pp 7612-7750. 1 COASTAL MICROSTRUCTURE: FROM ACTIVE OVERTURN TO FOSSIL TURBULENCE A Dissertation by PAK TAO LEUNG Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY December 2011 Major Subject: Oceanography Journal of Cosmology (2012), Vol. 17, No. 23, pp 7612-7750. 2 Coastal Microstructure: From Active Overturn to Fossil Turbulence Copyright 2011 Pak...»

«MECH2YNICAL PERFORMANCE OF LANDFILL LEACHATE COLLECTION PIPES Richard W. 1. Brachman Graduate Program in Engineering Science Department of Civil and Environmental Engineering Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy Faculty of Graduate Studies The University of Western Ontario London, Ontario Apnl, 1999 O R. W.I. Brachman 1999 1+1 Bibliothèque nationale National Library du Canada of Canada Acquisitions and Acquisitions et Bibliographie Services...»

«Magneto-hydrodynamics simulation in astrophysics by Bijia Pang A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Physics University of Toronto Copyright c 2011 by Bijia Pang Abstract Magneto-hydrodynamics simulation in astrophysics Bijia Pang Doctor of Philosophy Graduate Department of Physics University of Toronto Magnetohydrodynamics (MHD) studies the dynamics of an electrically conducting fluid under the influence of a...»

«KAROLINSKA INSTITUTET Institution NEUROTEC Division of Occupational Therapy Master thesis in Occupational Therapy, 20 credits D-level Spring 2008 Cross cultural validation of The Perceived Efficacy and Goal Setting System PEGS Author: Kristina Vroland Nordstrand Supervisor: Lena Krumlinde Sundholm Client-centred practice is strongly supported philosophically by the profession of occupational therapy. Client-centred practice is the ability to listen to clients, understand their priorities and...»

<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.