WWW.ABSTRACT.DISLIB.INFO
FREE ELECTRONIC LIBRARY - Abstracts, online materials
 
<< HOME
CONTACTS



Pages:   || 2 | 3 | 4 | 5 |   ...   | 18 |

«Efficient, scalable consistency for highly fault-tolerant storage GARTH GOODSON August 2004 CMU-CS-04-111 Dept. of Electrical and Computer ...»

-- [ Page 1 ] --

Efficient, scalable consistency for

highly fault-tolerant storage

GARTH GOODSON

August 2004

CMU-CS-04-111

Dept. of Electrical and Computer Engineering

Carnegie Mellon University

Pittsburgh, PA 15213

Submitted in partial fulfillment of the requirements

for the degree of Doctor of Philosophy.

Thesis committee

Prof. Gregory R. Ganger, Chair

Prof. Michael K. Reiter

Prof. Priya Narasimhan

Richard Golding, IBM Research

c 2004 Garth Goodson · ii Efficient, scalable consistency for highly fault-tolerant storage Keywords: survivable storage systems, Byzantine fault-tolerance, erasure codes To my parents, for their unwavering support.

· iv Efficient, scalable consistency for highly fault-tolerant storage Abstract Fault-tolerant storage systems spread data redundantly across a set of storage-nodes in an effort to preserve and provide access to data despite failures. One difficulty created by this architecture is the need for a consistent view, across storage-nodes, of the most recent update. Such consistency is made difficult by concurrent updates, partial updates made by clients that fail, and failures of storage-nodes.

This thesis demonstrates a novel approach to achieving scalable, highly fault-tolerant storage systems by leveraging a set of efficient and scalable, strong consistency protocols enabled by storage-node versioning. Versions maintained by storage-nodes can be used to provide consistency, without the need for central serialization, and despite concurrency.

Since versions are maintained for every update, even if a client fails part way through an update, concurrency exists during an update, the latest complete version of the data-item being accessed still exists in the system—it does not get destroyed by subsequent updates.

Additionally, versioning enables the use of optimistic protocols.

This thesis develops a set of consistency protocols appropriate for constructing blockbased storage and metadata services. The block-based storage protocol is made spaceefficient through the use of erasure codes and made scalable by offloading work from the storage-nodes to the clients. The metadata service is made scalable by avoiding the high costs associated with agreement algorithms and by utilizing threshold voting quorums.

Fault-tolerance is achieved by developing each protocol in a hybrid storage-node faultmodel (a mix of Byzantine and crash storage-nodes can be tolerated), capable of tolerating crash or Byzantine clients, and utilizing asynchronous communication.

· vi Efficient, scalable consistency for highly fault-tolerant storage Acknowledgements I would first like to thank my advisor, Greg Ganger,without whose support and insight this work would not be possible. Next, I would like to thank Mike Reiter, whose patience and wealth of distributed systems knowledge guided us through this project. I would also like to thank the rest of my thesis committee members, Richard Golding, and Priya Narasimhan for their valuable comments, feedback, and time.

From my group, at the PDL, my warmest thanks goes to Jay Wylie, my research partner and friend. Without him, I would still be filling white boards with meaningless scribble. Next, I would like to thank Mike Abd-El-Malek, our newest PASIS member. He came up to speed on the complicated code-base in an amazingly short amount of time and provided invaluable help in implementing the query/update protocol. Finally, from my group, I would like to thank those that made my time at CMU a very enjoyable experience. Especially those from previous projects: John Strunk and Craig Soules; as well

as Andy Klosterman, Jiri Schindler, Shuheng Zhou; and everyone else from my group:

Steve Schlosser, John Griffin, Adam Pennington, Eno Thereska, Brandon Salmon. A special thanks goes to the PDL staff, especially to Karen Lindenfelser and Linda Whipkey for their fantastic organization and caring, positive attitude.

I would like to thank my family, especially my parents for providing their unending support. Last, but not least, I would like to thank my wonderful wife Vianey. I thank her for her patience and the sacrifices she made as she waited for me to finish my thesis. But, most of all, I would like to thank her for her support, friendship, and undying love.

I would like to thank the members and companies of the PDL Consortium (includ· viii Efficient, scalable consistency for highly fault-tolerant storage ing EMC, Engenio, Hewlett-Packard, HGST, Hitachi, IBM, Intel, Microsoft, Network Appliance, Oracle, Panasas, Seagate, Sun, and Veritas) for their interest, insights, feedback, and support. This material is based on research sponsored in part by the National Science Foundation, via grant #CNS-0326453, by the Air Force Research Laboratory, under agreement number F49620–01–1–0433 and F30602–99–2–0539–AFRL, and by the Army Research Office, under agreement number DAAD19–02–1–0389.

Finally, I would like to thank Miguel Castro and Rodrigo Rodrigues for making the implementation of BFT publicly available.

Contents

–  –  –

1.1 Problem definition Fault-tolerant storage systems (e.g., Petal [Lee and Thekkath 1996], Myriad [Chang et al.





2002], SwiftRAID [Long et al. 1994], and Cheops [Amiri et al. 2000a]) spread data redundantly across a set of storage-nodes in an effort to preserve and provide access to data despite failures. Figure 1.1 illustrates the

Abstract

architecture of a fault-tolerant, or survivable, distributed storage system. In these types of systems it is common to break the system (at least logically) into two components or services: a metadata service; and a data service. To access or update a data-item, a client must obtain metadata about the data-item (e.g., where and how it is stored) before it is able to access the data-item itself. In order for a storage-system to tolerate failures, both the data and metadata must be duplicated across a set of storage-nodes. Thus, an update is only complete once it has completed successfully at a subset of the storage-nodes. While this scheme provides access to data-items and their metadata even when subsets of the storage-nodes have failed, it does create the difficulty of maintaining a consistent view, across the storage-nodes, of the most recent update. In decentralized systems, this is problem is exacerbated since, due to the lack of a central serialization point, updates may not be issued to the same subset of storage-nodes as are being read. Without consistency across the set of storage-nodes, data loss is possible or even likely.

Although protocols exist for achieving such consistency, they generally fall short in a number of areas including fault-tolerance, efficiency, and scalability. The easiest solution to this problem is to introduce a point of serialization. This is typically done by serializing · 2 Efficient, scalable consistency for highly fault-tolerant storage

–  –  –

Figure 1.1: High-level architecture for survivable storage.

Spreading data and metadata redundantly across storage-nodes improves its fault-tolerance. Clients write and (usually) read data from multiple storage-nodes and may contact multiple storage-nodes to perform metadata operations.

requests through a primary. However, this typically reduces the scalability of the system and requires additional protocols to tolerate the failure of the primary. Other more complicated protocols exist, however they generally require a significant amount of overhead in the common case of little or no concurrency. Most studies of distributed storage systems (e.g., [Baker et al. 1991; Noble and Satyanarayanan 1994]) indicate that concurrency is uncommon (i.e., writer-writer and writer-reader sharing occurs in well under 1% of operations). As well, many protocols do not scale in terms of messaging or protocol overhead, or storage-node CPU utilization as number of faults tolerated increases.

This thesis demonstrates a novel approach to achieving scalable, highly fault-tolerant (the ability to tolerate more than a single fault) storage systems by leveraging a set of efficient and scalable, strong consistency protocols enabled by storage-node versioning.

Versioning storage-nodes keep a version of every update they receive (for some period of time). These versions can be used to provide consistency, without the need for central serialization, and despite concurrency. Since versions are maintained for every update, even if a client fails part way through an update, or a reader performs a query during an update, the latest complete version of the data-item being accessed still exists in the system—it does not get destroyed by subsequent updates. The problem with versioning becomes one ·

1.2 Thesis statement 3

of the client locating the version it is interested in. The consistency protocols developed in this thesis all use logical time as a means of naming versions. Additionally, versioning enables the use of optimistic protocols. Since older versions are not overwritten by new updates, there is no need to lock the data-item before performing an update. However, in these protocols, concurrency may require the client to perform extra work to find the set of versions in which they are interested (e.g., those that comprise the latest complete update).

In particular, this thesis develops a set of consistency protocols appropriate to building block-based storage and metadata services. The block-based storage protocol is made space-efficient through the use of erasure codes and made scalable by offloading work from the storage-nodes to the clients. The metadata storage protocol is made scalable by avoiding the high costs associated with agreement algorithms and by utilizing threshold voting quorums. Fault-tolerance is achieved by developing each protocol in a hybrid storage-node fault-model (a mix of Byzantine and crashed storage-nodes can be tolerated), capable of tolerating crash or Byzantine clients, and utilizing asynchronous communication.

1.2 Thesis statement Versioning storage-nodes enable the design of a set of scalable, efficient consistency protocols that provide a foundation for constructing scalable, highly fault-tolerant, distributed storage systems.

1.2.1 Validation This work is validated through the design and evaluation of three consistency protocols

that have been enabled by versioning storage-nodes. More precisely:

(1) It develops and demonstrates a read/write block storage consistency protocol that enables highly fault-tolerant storage through the use of erasure coded data and versioning storage-nodes. Its correctness is shown through proof sketches.

· 4 Efficient, scalable consistency for highly fault-tolerant storage (2) It develops and demonstrates a read/conditional-write block protocol that allows for stronger read–modify–write consistency semantics. Additionally, tradeoffs between tolerating Byzantine clients and erasure coding, as well as tradeoffs between tolerating Byzantine storage-nodes and liveness are discussed.

(3) It extends the read/conditional-write block protocol to support operations on multiple, arbitrary objects and implements a scalable metadata service based upon this and the read/write protocol.

(4) It evaluates a distributed file system that utilizes the scalability and fault-tolerance of the developed consistency protocols in terms of the number of faults tolerated, the maximum throughput the system can sustain, and its performance in degraded operation modes (i.e., with concurrency and faults).

1.3 Overview 1.3.1 Consistency protocols First, the Read/Write protocol (R/W) is developed. It provides strong consistency and fault-tolerance for read/write block storage. Block storage-systems (e.g., SCSI, fibrechannel) provide the backbone for most current storage solutions.

The R/W protocol works roughly as follows. To perform a write, clients write timestamped fragments to at least a write threshold of storage-nodes. Storage-nodes keep all versions of fragments they are sent. To perform a read, clients fetch the latest fragment versions from a read threshold of storage-nodes. The client determines whether the fragments comprise a consistent, complete write, based on timestamp ordering; usually, they do. If they do not, additional fragments or historical fragments are fetched, or repair is performed, until a consistent, complete write is observed. Only in cases of failures (storage-node or client) or read-write concurrency is additional overhead incurred to maintain consistency.

The second protocol, the Read/Conditional Write protocol (R/CW), extends the R/W ·

1.3 Overview 5 protocol by only allowing write operations to complete if the object has not changed since the last time it was read. Thus, the R/CW protocol is able to provide stronger consistency semantics—similar to read–modify–write, rather than just read–write semantics.

Finally, a Query/Update protocol (Q/U) is developed. It is very similar to the R/CW protocol but adds a few optimizations and extensions. Most notably, it provides strict serializability of arbitrary operations through the use of replicated state machines.

These protocols are developed in detail, evaluated individually, and used as a basis for building a fault-tolerant, scalable storage-system.

1.3.2 Guiding assumptions

These consistency protocols achieve efficiency and scalability via a combination of optimistic operation, versioning, and quorum-style redundancy. As such, there are a number of assumptions that guide the use of versioning and optimism. As well, there are a number of assumptions in the system model for which these protocols are designed.

Optimism and versioning

The scalability features of quorums are well-known [Malkhi et al. 2000; Naor and Wool 1998], however the use of versioning and optimism is guided by three high-level assumptions.

First, we assume that client failures within the duration of an access protocol (on the order of milliseconds) should be rare. That is, while we design to tolerate client failures (indeed, arbitrary ones; see below), our protocols optimistically presume they will not occur, and exploit this assumption heavily in order to improve throughput when it holds.



Pages:   || 2 | 3 | 4 | 5 |   ...   | 18 |


Similar works:

«THE KENYON REVIEW Vol. XVII W I N T E R, 1955 No. 1 Walter Kaufman NIETZSCHE AND RILKE T HIS STUDY of Nietzsche and Rilke, and particularly of what they have in common, is meant to throw light on both and also on the relation of philosophy and poetry. To those who are used to comparisons of Nietzsche with Nero and of Rilke with St. Francis any insistence on common elements will come as a surprise; but it may also disabuse them of some misapprehensions. There are, of course, obvious differences...»

«Scribes and Singers: Latin Models of Authority and the Compilation of Troubadour Songbooks by Christopher J. Davis A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Comparative Literature) in The University of Michigan 2011 Doctoral Committee: Professor Peggy S. McCracken, Chair Professor Alison Cornish Professor Yopie Prins Professor Elizabeth L. Sears Associate Professor Catherine Brown Dedication For my parents, William Davis and...»

«Degrees of Causation Matthew Braham Faculty of Philosophy, University of Groningen, The Netherlands m.braham@rug.nl Martin van Hees Faculty of Philosophy, University of Groningen, The Netherlands Martin.van.Hees@rug.nl (April 15, 2008) Abstract The primary aim of this paper is to analyze the concept of degrees of causal contribution for actual events and examine the way in which it can be formally defined. This should go some way to filling out a gap in the legal and philosophical literature...»

«SBORNÍK PRACÍ FILOZOFICKÉ FAKULTY BRNĚNSKÉ UNIVERZITY STUDIA M1N0RA FACULTATIS PHILOSOPHICAE UNIVERSITATIS BRUNENS1S S 1 (1995) — BRNO STUDIES IN ENGLISH 21 JITKA VLČKOVÁ SOCIAL IDENTITY AND ITS REFLECTION IN COMMUNICATION Jimmie Blacksmith in Thomas Keneally's Novel The Chant of Jimmie Blacksmith The language people use in interactions is primarily determined by their culture; here the patterns of thought and patterns of behaviour are probably the most important (cf. Allwood, 1990)....»

«IN THE FACE OF BLINDNESS: NEGOTIATING IDENTITY AND RELATIONSHIP IN BLIND/SIGHTED INTERACTION A Dissertation submitted to the Faculty of the Graduate School of Arts and Sciences of Georgetown University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Linguistics By Elisa L. Everts, M.A. Washington, DC August 1, 2012 Copyright 2012 by Elisa L. Everts All Rights Reserved ii IN THE FACE OF BLINDNESS: NEGOTIATING IDENTITY AND RELATIONSHIP IN BLIND/SIGHTED...»

«AN INVESTIGATION OF NOVEL APPROACHES FOR OPTIMISING RETAIL SHELF SPACE ALLOCATION by Ruibin Bai, BEng, MSc Thesis Submitted to The University of Nottingham for the Degree of Doctor of Philosophy School of Computer Science & Information Technology The University of Nottingham, Nottingham, UK September 2005 CONTENTS Contents. List of Figures List of Tables Abstract. Acknowledgements Declaration. Chapter 1. Introduction 1.1 Background and Motivations 1.2 Scope and Aims 1.3 Overview of the Thesis...»

«University of Tartu Faculty of Philosophy Department of Philosophy Indrek Lõbus In Defence of Logical Omniscience MA Thesis Supervised by Daniel Cohnitz and Luis Estrada González Tartu 2016 TABLE OF CONTENTS Introduction 1. Logical Omniscience and the Problem of Deduction 1.1. The Pragmatic Picture 1.2. Logical Omniscience 1.2.1. The Argument from Intuitions? 1.2.2. Support from Formal Semantics 1.2.3. The Problem of Deduction 2. Stalnaker’s Two-Part Solution 2.1. The Metalinguistic Theory...»

«Chapter 2 The Tudor Dynasty: Perfecting Absolutism in the Era of Renaissance and Reformation, 1485–1603 2.1 The Renaissance and the Reformation As the sixteenth century dawned, extraordinary changes affected the whole of Europe. For some 200 years, the Renaissance had driven thought in Italy. Now, it flourished in a revival of the traditions of ancient Greece and Rome (Churchill 1956, p. 3). Literature, philosophy, and art opened up under classical inspiration and a new era of questioning and...»

«XV INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND OPERATIONS MANAGEMENT The Industrial Engineering and the Sustainable Development: Integrating Technology and Management. Salvador, BA, Brazil, 06 to 09 October 2009 TOWARDS A MODEL OF “PROCESSORIENTED” CORE COMPETENCE MANAGEMENT Fábio Walter (UFPB) fwalter.br@gmail.com Uwe Götze (TU Chemnitz) u.goetze@wirtschaft.tu-chemnitz.de The objective of this article is to outline a model of core competence management that is closely related...»

«PERMAFROST DISTRIBUTION MAPPING AND TEMPERATURE MODELING ALONG THE ALASKA HIGHWAY CORRIDOR, INTERIOR ALASKA By Santosh K. Panda RECOMMENDED: Advisory Committee Chair Chair, Department of Geology and Geophysics APPROVED: Dean, College of Natural Science and Mathematics Dean of the Graduate School Date PERMAFROST DISTRIBUTION MAPPING AND TEMPERATURE MODELING ALONG THE ALASKA HIGHWAY CORRIDOR, INTERIOR ALASKA A THESIS Presented to the Faculty of the University of Alaska Fairbanks in Partial...»

«Article The « Barber » Paradox Pierre H. Conway Laval théologique et philosophique, vol. 18, n° 2, 1962, p. 161-176.Pour citer cet article, utiliser l'information suivante : URI: http://id.erudit.org/iderudit/1020024ar DOI: 10.7202/1020024ar Note : les règles d'écriture des références bibliographiques peuvent varier selon les différents domaines du savoir. Ce document est protégé par la loi sur le droit d'auteur. L'utilisation des services d'Érudit (y compris la reproduction) est...»

«Humanicus, issue 9 / 2014 MANAGING CULTURAL PLURALISM IN AN EDUCATIONAL SETTING: INSIGHTS FROM GIANNI VATTIMO AND THOMAS KUHN MATTHEW EDWARD HARRIS Abstract. This article looks at implications of the ideas of postmodern philosopher Gianni Vattimo for how management deals with cultural conflict in an educational setting. Vattimo’s ideas point towards the development of less prescriptive, more performative, means of resolving conflict in a setting where truth claims are of reduced importance....»





 
<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.