FREE ELECTRONIC LIBRARY - Abstracts, online materials

Pages:     | 1 | 2 || 4 | 5 |   ...   | 18 |

«Efficient, scalable consistency for highly fault-tolerant storage GARTH GOODSON August 2004 CMU-CS-04-111 Dept. of Electrical and Computer ...»

-- [ Page 3 ] --

Alternately, more space-efficient encoding schemes can be used. This section provides an overview of some of the more well-known schemes, such as RAID, and some other more general, more space-efficient erasure coding schemes. With these schemes, reads require fragments from multiple servers. Moreover, to decode the data-item, the set of fragments read must correspond to the same write operation; thus write–write concurrency can be problematic.

2.2.1 RAID

In order to increase the performance of disk subsystems, data can be striped across a set of disks. However, as the number of disks in each stripe is increased, the likelihood of a single disk dying and the probability of data loss increases. In 1988 Paterson, et al. [Patterson et al. 1988] designed Redundant Arrays of Inexpensive Disks (RAID) to overcome these reliability challenges. They solved the problem by storing redundant data in the form of parity on one or more of the disks in the array.

At the present time there are a number of different RAID levels. The most common · 14 Efficient, scalable consistency for highly fault-tolerant storage are RAID 0, 1, and 5. Combinations of these levels also exist (e.g., RAID 10). RAID 0 is the simplest, increasing performance by striping data across a set of devices, so that they can be read and written in parallel. However, RAID 0 provides no extra redundancy.

RAID 1 provides mirroring of data onto two devices. This scheme can tolerate a single device failure, however it pays a huge cost in storage capacity—only half of the space is usable to store data. RAID 5 uses parity with striping to improve space-efficiency. Like in RAID 0, data is striped across a set of devices. A parity code is calculated simply by performing an XOR over the blocks within each stripe and is stored on separate device.

In RAID 5 the parity block is rotated among the set of the storage devices.

2.2.2 Erasure-coding

The use of erasure codes can greatly improve the space-efficiency of replicating data.

Erasure codes were originally developed for communication channels in the networking community and are sometimes known as forward-error correcting codes. Erasure codes encode a data-item into a set of fragments and have the property that any subset of a certain size can be used to reconstruct the original data-item. They have the nice property that they can tolerate m simultaneous failures with only m extra data-fragments. RAID schemes can typically only support m = 1, 2; i.e., they can only tolerate a single or double disk failure.

In this work the focus is on systematic threshold erasure codes in which any m of the n encoded data-fragments can decode the data-item. The first m data-fragments are stripes of the data-item. The remaining n − m code-fragments are generated using polynomial interpolation within a Galois field. As such, each fragment is the total data size.

m n Thus, the total size-blowup is m. Replication can be thought of a subset of these m-of-N erasure codes, as could the different RAID schemes; m = 1 for replication. Examples of such codes are Reed-Solomon codes [Berlekamp 1968], secret sharing [Shamir 1979], information dispersal (IDA) [Rabin 1989], short secret sharing [Krawczyk 1994], and “tornado” codes [Luby et al. 2001]. The tradeoff in using erasure codes over RAID like schemes is the performance cost in generating the code fragments.


2.3 Consistency semantics 15

–  –  –

An example 2-of-5 erasure code scheme is shown in Figure 2.1. The original dataitem is striped into 2 fragments, with 3 code fragments being generated. Each fragment is written to a storage-node. Any 2 fragments can be used to decode the original data-item.

There exists much prior work (e.g., [Agrawal and El Abbadi 1990; Herlihy and Tygar 1987; Mukkamala 1994]) that combines erasure coded data with quorum systems to improve the confidentiality and/or integrity of data along with its availability. However, these systems do not provide consistency (i.e., a synchronization mechanism is required) and do not cope with Byzantine clients. Concurrently with our own work, Frølund et al. [Frølund et al. 2004] have developed a decentralized protocol for linearizable erasure coded read/write registers that utilize a variant of threshold-quorums.

2.3 Consistency semantics To provide reasonable semantics, storage systems must guarantee that readers see consistent data-item values.

2.3.1 Linearizability The linearizability of operations is desirable for block-based read-write storage. Since linearizability is only defined for single object operations, it is not suitable for describing multi-object operations that are sometimes required for metadata updates. Linearizability is described by Herlihy and Wing in [Herlihy and Wing 1990]. Operations are lineariz· 16 Efficient, scalable consistency for highly fault-tolerant storage able if their return values are consistent with an execution in which each operation is performed instantaneously at a distinct point in time between its invocation and completion. Frølund et al. [Frølund et al. 2004] have recently developed a block-based protocol that provides a variant of linearizability they term strict linearizability [Aguilera and Frolund 2003], in which an operation that crashes either takes effect within some limited time frame or not at all.

The R/W protocol, described in Chapter 3, tolerates Byzantine faults of any number of clients and a limited number of storage nodes while implementing linearizable [Herlihy and Wing 1990] and wait-free [Herlihy 1991] read-write objects. As well, the R/CW protocol, described in Chapter 4 also implements linearizable objects. In this protocol, linearizability is adapted appropriately for Byzantine clients, and wait-freedom as in the model of Jayanti et al. [Jayanti et al. 1998]. Since operations performed by Byzantine clients have no clear start time, they are excluded from the set of linearizable operations.

2.3.2 Serializability

The consistency semantic of serializability can pertain to multi-object operations that are required for updating metadata objects atomically. Traditionally serializability has been defined for transactions within database systems. A sequence of transactions are serializable if their outcome is equivalent to some sequential execution of the individual transactions [Papadimitriou 1979]. Strict serializability extends serializability to ensure that transactions already in the history in serial order (i.e., they have completed), remain in that relative order. This provides a consistency semantic similar to that of linearizability.

A serializable execution satisfies the ACID properties [Haerder and Reuter 1983] (i.e., atomicity, consistency, isolation, and durability). The serializability of transactions can be ensured through a number of techniques. Typical techniques include: two-phase locking [Gray et al. 1976], in which locks are acquired in one phase and released in a separate phase; optimistic concurrency control [Kung and Robinson 1981], in which operations within a transaction are performed optimistically, with no locking, and validation of serializability is done at commit time; and timestamp ordering [Bernstein et al. 1980], in these ·

2.3 Consistency semantics 17 protocols timestamps are used to order operations. The query/update protocol described in Chapter 5 implements metadata objects that conform to strict serializability.

2.3.3 Tolerating benign faults To provide operation atomicity, concurrency and client failures must be tolerated. A challenge introduced by concurrency and client failures is partially completed write operations. Partial writes arise from both write operations in progress and write operations that never completed (e.g., failed client).

Common approaches to dealing with partial writes in non-Byzantine-tolerant systems are two-phase commit [Gray 1978] and repair (write-back). Two-phase commit provides failure atomicity (although such protocols may block). Three phase commit protocols [Skeen and Stonebraker 1983] provide failure atomicity without blocking by utilizing failure detectors and/or recovery mechanisms. Alternately, many non-Byzantine-tolerant systems (e.g., Harp [Liskov et al. 1991] and Petal [Lee and Thekkath 1996]) serialize their actions through a primary storage-node, which becomes responsible for completing the update.

A common approach to dealing with concurrency is to suppress it, either via leases [Gray and Cheriton 1989] or optimistic concurrency control [Kung and Robinson 1981]. Ensuring operation atomicity in the face of Byzantine failures of clients requires additional work.

An alternate approach to handling both partial writes and concurrency is to have the data stored on storage-nodes be immutable [Reed and Svobodova 1980; Reed 1983]. By definition, this eliminates the difficulties of updates for existing data. In doing so, it shifts the problem up one level; an update now consists of creating a new data-item and modifying the relevant name to refer to it. Decoupling the data-item creation from its visibility simplifies both, but making the metadata service fault-tolerant often brings back the same issues.

For example, SWALLOW [Reed and Svobodova 1980] utilizes immutable object version logs (or histories) ordered by pseudo time to guarantee strong consistency (i.e., serial· 18 Efficient, scalable consistency for highly fault-tolerant storage izability) of arbitrary sets of read/write operations performed on a set of objects. As well, the Amoeba File Server [Mullender 1985] utilizes immutable data versions to implement optimistic concurrency control, such that the file system is always kept in a consistent state. More recently, peer-to-peer systems (e.g., Past [Rowstron and Druschel 2001] and CFS [Dabek et al. 2001]), Farsite, and the archival portion of OceanStore [Kubiatowicz et al. 2000] use immutable versions of data to simplify serialization of access to data.

Other systems, such as Ivy [Muthitacharoen et al. 2002], use immutable version logs containing both data and metadata, however Ivy does not implement strong consistency guarantees for its metadata (or data) in this fashion.

Frølund et al. [Frølund et al. 2004] recently developed a decentralized consistency protocol for erasure coded data. Their algorithm relies on a quorum construction similar to threshold-quorums that they call “m-quorums” (any two quorums intersect in m processes). They utilize client generated timestamps to totally order updates and utilize server-side logs to track outstanding requests. Also, as described earlier, their protocol provides a variant of linearizability they call strict linearizability [Aguilera and Frolund 2003]. However, their protocol does allow for read and write operations to abort, as such they forgo strong liveness guarantees.

2.3.4 Tolerating Byzantine faults

Byzantine fault-tolerant protocols for implementing read-write objects using quorums are described in [Herlihy and Tygar 1987; Malkhi and Reiter 1997; Martin et al. 2002; Pierce 2001]. Of these related quorum systems, only Martin et al. [Martin et al. 2002] achieve linearizability in our fault model, and this work is also closest to ours in that it uses a type of versioning. In our protocol, a reader may retrieve fragments for several versions of the data-item in the course of identifying the return value of a read. Similarly, readers in [Martin et al. 2002] “listen” for updates (versions) from storage-nodes until a complete write is observed. Conceptually, our approach differs by clients reading past versions, versus listening for future versions broadcast by servers. In our fault model, especially in consideration of faulty clients, our protocol has several advantages. First, our protocol works for ·

2.3 Consistency semantics 19 erasure-coded data, whereas extending [Martin et al. 2002] to erasure coded data appears nontrivial. Second, ours provides better message efficiency: [Martin et al. 2002] involves a Θ(N 2 ) message exchange among the N servers per write (versus no server-to-server exchange in our case) over and above otherwise comparable (and linear in N) message costs. Third, ours requires less computation, in that [Martin et al. 2002] requires digital signatures by clients, which in practice is two orders of magnitude more costly than the cryptographic transforms we employ. Advantages of [Martin et al. 2002] are that it tolerates a higher fraction of faulty servers than our protocol, and does not require servers to store a potentially unbounded number of data-item versions. Our prior analysis of versioning storage, however, suggests that the latter is a non-issue in practice [Strunk et al.

2000], and even under attack this can be managed using a garbage collection mechanism we describe in Section 3.6.

A metadata service, like any deterministic service, can be implemented in a survivable fashion using state machine replication [Schneider 1990], whereby all operations are processed by server replicas in the same order (atomic broadcast). While this approach supports a linearizable, Byzantine fault-tolerant implementation of any deterministic object, such an approach cannot be wait-free [Fischer et al. 1985; Herlihy 1991; Jayanti et al. 1998]. Instead, such systems achieve liveness only under stronger timing assumptions, such as synchrony (e.g., [Cristian et al. 1995; Pittelli and Garcia-Molina 1989; Shrivastava et al. 1992]) or partial synchrony [Dwork et al. 1988] (e.g., [Castro and Liskov 2002; Kihlstrom et al. 2001; Reiter and Birman 1994]), or probabilistically (e.g., [Cachin et al. 2001]). An alternative is Byzantine quorum systems [Malkhi and Reiter 1997], from which our protocol inherits techniques (i.e., our protocol can be considered a Byzantine quorum system that uses the threshold quorum construction). Protocols for supporting a linearizable implementation of any deterministic object using Byzantine quorums have been developed (e.g., [Malkhi et al. 2001]), but also necessarily forsake wait-freedom to do so. Additionally, most Byzantine quorum systems utilize digital signatures which are computationally expensive.

· 20 Efficient, scalable consistency for highly fault-tolerant storage

2.3.5 Metadata scalability

Pages:     | 1 | 2 || 4 | 5 |   ...   | 18 |

Similar works:

«Viewpoints and Visions in 1792: The Vancouver Expedition encounters Indians of Western Washington Columbia Magazine, Summer 1990: Vol. 4, No. 2 By Delbert J. McBride In 1792, when Captain George Vancouver sailed and charted the verdant western shorelines of what is now the state of Washington, the sciences of anthropology and ethnology were developed hardly at all in any systematic fashion. It remained for the later nineteenth-and early twentieth-century scientists with their calipers,...»

«Ó Springer 2007 Law and Philosophy (2007) 26: 31–65 DOI 10.1007/s10982-005-5917-2 TAMAR MEISELS COMBATANTS – LAWFUL AND UNLAWFUL (Accepted 27 December 2005) The September 11 attacks led many Americans to believe that Al-Qaeda had plunged the U.S. into a new type of war, already familiar to some of the country’s closest allies. Subsequent debates over modern terrorism often involve a sort of lamentation for the passing of old-fashioned wars.1 Paul Gilbert’s New Terror, New Wars suggests...»

«Social Ontology, Philosophically In the spirit of both the workshop and the provocation piece that its organizers circulated in May, I have not written a proper paper, that is, one replete with introduction, development, and conclusion. Instead, following the organizers’ lead, and responding to select comments in their piece, I have produced a series of points, elucidations, and arguments. These, together, articulate my philosophical conception of social ontology. I. What is Social Ontology?...»

«Get into Parallel Processing with the Parallax PropellerTM Introducing the Propeller Every now and again something different comes along. Microcontroller chip development has proceeded down the same paths for many years now: either the same basic ‘core’ processor being surrounded by more and peripherals or the processor itself being made more and more powerful. Examples of the former include 8051-core devices, the latter ARM-based microcontrollers. The common feature is a single processor...»

«Overfishing, uncertainty, and ocean governance: Lord Perry’s question revisited Jonathan C. Nevill B.Mech.Eng.(Hons) Monash University 1972 M.Env.Sc. (Wildlife Cooperative Area Legislation) Monash University 1977 B.A. (sociology major) Monash University 1980 A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy, in the School of Government THE UNIVERSITY OF TASMANIA October 2009 Authority of Access This thesis may be made available for loan and limited...»

«FREEDOM OF CHOICE: A PRAGMATIC ARGUMENT FOR THE PERMISSIBILITY OF ASSISTED SUICIDE BY Michael Joseph Accavitti Jr. Thesis Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of MASTER OF ARTS in Philosophy August, 2011 Nashville, Tennessee Approved: Professor John Lachs Professor Michael Hodges Professor Mark Bliton ii To my Father and Mother ii The focus of this thesis will be an arguement for the permissibility of...»

«Boonin on the Future-Like-Ours Argument against Abortion Pedro Galvão Centro de Filosofia da Universidade de Lisboa David Boonin’s recent book1 is an impressively deep and detailed attempt to establish the moral permissibility of abortion on terms that the critics of abortion already accept. In order to show on such terms that the moral case against abortion is unsuccessful, one must defend abortion without committing oneself to the moral permissibility of infanticide. I am going to argue...»

«Forthcoming in the Australasian Journal of Philosophy 93(2): 389-394. Non-Agential Permissibility in Epistemology∗ Luis R. G. Oliveira University of Massachusetts, Amherst Abstract: Paul Silva has recently argued that doxastic justification does not have a basing requirement. An important part of his argument depends on the assumption that doxastic and moral permissibility have a parallel structure. I here reply to Silva’s argument by challenging this assumption. I claim that moral...»

«PHENOMENOLOGY ONTOLOGY AND John J. Drummond Fordham University Ontologie ist nur als Phänomenologie möglich. — Martin Heidegger ([1927] 1967, 35) David W oodruff Smith’s Husserl (2007)1 sculpts Husserl, along with Aristotle and Kant, on the Mount Rushmore of W estern philosophy. He does so because all three were great systematic philosophers whose systems radically changed and improved earlier ones (5).2 Smith says, “Integrating theories in logic, ontology, phenomenology, epistemology,...»

«EUROPEAN JOURNAL OF PRAGMATISM AND AMERICAN PHILOSOPHY COPYRIGHT © 2009 ASSOCIAZIONE PRAGMA Matteo Falomi Perfectionism and Moral Reasoning Abstract. Stanley Cavell presents Moral Perfectionism as a set of methods of selfknowledge, aiming at the clarification of one’s understanding of oneself. Cavell also claims that Moral Perfectionism is a form or a dimension of moral reasoning. One might wonder, in this perspective, what relation can be drawn between perfectionist methods of...»

«Hegel’s Philosophy of Right 1 Space for Notes ↓ Excerpts from Hegel’s Philosophy of Right (1821) concerning education. Introduction § 20 The reflection which is brought to bear upon impulses, placing them before itself, estimating them, comparing them with one another, and contrasting them with their means and consequences, and also with a whole of satisfaction, namely happiness, brings the formal universal to this material, and in an external way purifies it of its crudity and...»

«SKEW-TOLERANT CIRCUIT DESIGN A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY David Harris February 1999 (c) Copyright 1999 by David Harris All Rights Reserved ii I certify that I have read this dissertation and that in my opinion it is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. _...»

<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.