FREE ELECTRONIC LIBRARY - Abstracts, online materials

Pages:     | 1 |   ...   | 6 | 7 || 9 | 10 |   ...   | 18 |

«Efficient, scalable consistency for highly fault-tolerant storage GARTH GOODSON August 2004 CMU-CS-04-111 Dept. of Electrical and Computer ...»

-- [ Page 8 ] --

Byzantine storage-nodes can fabricate high timestamps that must be classified as incomplete by read operations. Worse, in each subsequent round of a read operation, Byzantine storage-nodes can fabricate more high timestamps that are just a bit smaller than the previous. In this manner, Byzantine storage-nodes can “attack” the performance of the read operation, but not its safety. To protect against such denial-of-service attacks, the read operation can consider all unique timestamps, up to a maximum of b + 1, present in a ResponseSet as candidates before soliciting another ResponseSet. In this manner, each “round” of the read operation is guaranteed to consider at least one candidate from a correct storage-node and no more than b candidates from Byzantine storage-nodes.

3.7.3 Garbage collection

–  –  –

could then delete all previous complete writes. If this occurs, the read operation’s next round will observe an incomplete write with no previous history. Effectively, the read operation has “missed” the complete write operation that it would have classified as such.

When it discovers this fact, the read operation retries (i.e., restarts by requesting a new ResponseSet). Thus, in theory, a read operation faced with perpetual write concurrency and garbage collection may never complete. In practice, such perpetual interaction of garbage collection and read-write concurrency for a given data-item is not realistic.

3.8 Summary

This chapter has developed an efficient Byzantine-tolerant protocol for reading and writing blocks of data by leveraging the versioning capabilities of storage-nodes. This protocol provides read–write semantics of full data blocks. As such, it is suitable as the basis for the data storage component within a survivable storage system. The subsequent chapters develop protocols that can provide more powerful read–modify–write semantics which are more suitable for constructing metadata services.

The R/W protocol is made space-efficient through the use of erasure codes and made scalable (in terms of faults tolerated) by offloading work from the storage-nodes to the clients. The protocol is work-efficient, since additional overheads only occur in cases of failures or read-write concurrency. Experiments demonstrate that PASIS, a prototype block storage system that uses the R/W protocol, scales well in the number of faults tolerated, supports 60% greater write throughput than BFT, and requires significantly less server computation than BFT.

· 56 Efficient, scalable consistency for highly fault-tolerant storage 4 Read/Conditional Write Block Protocol Unlike data blocks that support read and write (R/W) operations only, metadata objects (e.g., directories), in order to preserve their integrity, require update operations that modify their existing contents, rather than those that blindly overwrite their previous contents.

For example, two concurrent insertions into a directory using write operations can result in one being overwritten by the other. To support such operations, this chapter develops a conditional write (CW) operation that performs a write to an object only if the value of the object has not changed since the client last read it. As such, we refer to these objects as read/conditional write (R/CW) objects (vs. R/W). Moreover, read and conditional write operations are linearizable, thus ensuring atomicity for those that succeed.

The focus in this chapter is on techniques we have employed in the design of R/CW objects. The protocol is developed in the context of reading and writing full objects, or blocks (as was the R/W protocol). Since storage system data rarely requires the consistency provided by R/CW objects, the next chapter develops a metadata service based on an extension to this protocol. The extension provides a more general operation based interface that allows for finer-granularity access to objects.

Our R/CW protocols are designed around a hybrid fault model, in which different tolerances for Byzantine and benign (crash-recovery) failures can be specified, so that the cost of the protocol can be tuned to the number of each type of failure anticipated. The R/CW protocol, like the R/W protocol, is extremely optimistic: it is optimized for the common file system workload in which concurrent sharing is low and failures are rare.

This optimism leads to a design for the R/CW protocols that, in the common case, requires · 58 Efficient, scalable consistency for highly fault-tolerant storage just a single round of communication to perform a read operation and one additional round of communication for a conditional write (that can usually be optimized away). If client failures are encountered, or concurrency is observed, more rounds of communication may be necessary. As well, expensive cryptography, notably digital signatures, is avoided.

4.1 Overview

We describe the system in terms of N storage-nodes and an arbitrary number of clients and objects. Clients perform operations on objects. Storage-nodes host object replicas. Note, this is different from R/W objects which can use erasure coding for space-efficiency. The impact of using erasure coding is discussed in Section 4.6.1.

The R/CW protocol is comprised of read operations and CW operations. Both read operations and CW operations issue requests to sets of storage-nodes. CW requests are executed by storage-nodes. As in the R/W protocol, logical timestamps are used to totally order all CW operations and to identify CW requests from the same CW operation across storage-nodes. Each storage-node maintains a replica history, denoted ReplicaHistory, for each object it hosts. The replica history contains the entire set of CW requests executed on the object replica, ordered by logical timestamp (pruning the replica history is discussed in Section 4.3).

4.1.1 R/CW semantics

–  –  –

In this operation, the register X is read, an operation f is performed on X, and the resulting transformation is stored back in X. It has been shown that many other more powerful operations can be implemented as an RMW operation; e.g., test-and-set, fetchand-add, etc. In a CW operation, an update of X is performed only if the value of X has not changed since it was previously read; thus, the write is conditioned-on the previously read value not having changed.

As in the R/W protocol, the protocol is optimistic, thus an operation may be complete, incomplete, or repairable. Every CW operation is preceded by a read operation that identifies the latest complete candidate. A CW operation is conditioned-on the latest complete candidate. R/CW semantics ensure that a single conditioned-on chain exists from any candidate back through all previous complete candidates to the initialized object. Therefore, no two complete writes can be conditioned on candidates such that the links formed between the complete write and the conditioned-on candidate overlap. Figure 4.1 more clearly illustrates the conditioned-on chain. R/CW semantics ensure that the conditionedon time of the current candidate “points” at a complete candidate or at a well-known initial value. As well, since complete CW operations may only be observed as repairable, all repairable candidates must be repaired to maintain the integrity of the conditioned-on chain.

One of the main challenges of the R/CW protocol is to protect the integrity of the conditioned-on chain, especially from Byzantine clients. Byzantine clients can not be trusted to follow the protocol, thus they may attempt to break the conditioned-on chain by conditioning on an incorrect value (e.g., an incomplete CW operation or not the latest complete CW operation), see Figure 4.1(b). Briefly, to prevent these Byzantine attacks, · 60 Efficient, scalable consistency for highly fault-tolerant storage

–  –  –

Figure 4.1: Examples illustrating the conditioned-on chain.

For each example, the complete object history and each version’s classification for a CW object is shown. As well, the logical timestamp for each version is given, as is the logical time for the version on which it conditions (shown as a subscript to the logical timestamp). Note, the CW operation at logical time 3 is incomplete. Since both the version at logical time 2 and 3 condition on the version at time 1 (e.g., the may have been concurrent), only one version can complete successfully. Example (a) shows a valid conditioned-on chain. The version at logical time 3 is ignored. Example (b) shows an invalid conditioned-on chain. At logical time 4, a (Byzantine) client incorrectly conditions on the version at time 3 even though it is incomplete, thus corrupting the conditioned-on chain.

clients must send “proof” with each CW operation supporting their action. Each client sends an object history set for each CW object being updated. The object history set contains the replica history of each storage-node that replied during the read phase. As well, each of the replica histories is “signed” (digital signatures may be used, but for performance we use authenticators, see Section 4.2.2) and acts as proof that the client is acting correctly. The “signed” object history set can then be validated by individually by each storage-node (this validation is discussed in detail in Section 4.3.3).

4.1.2 Read operation overview

–  –  –

the replica history in response to a read request.

The client combines the replica histories returned by the storage-nodes into the object history set (denoted ObjectHistorySet). For example, ObjectHistorySet[S] contains the replica history returned from storage-node S. Classification is performed on the timestamps within the object history set. The purpose of classification is to determine the timestamp of the latest complete (successful) update. A CW operation is complete once a threshold number of benign storage-nodes have executed CW requests. This threshold permits the R/CW protocol to ensure that no subsequent operation can return (or condition-on) a previous object value; as well, it defines classification. Classification identifies a candidate—a candidate is either complete or repairable. If the candidate is complete, then the read operation returns the object value associated with the candidate.

If the candidate is repairable, the client performs a CW operation to repair the candidate.

Once the repairable candidate is complete, its value is returned.

Aspects of the read operation are illustrated in Figure 4.2. In response to read requests, storage-nodes return object histories to the clients. In the example, each client constructs · 62 Efficient, scalable consistency for highly fault-tolerant storage a different object history set since each client received responses from a different subset of storage-nodes. Classification is performed on the object history set. In the example, client A and B classify the candidate with logical timestamp 5 as repairable and complete respectively. Client B classified the candidate with logical timestamp 6 as incomplete prior to classifying 5 as complete. The exact rules for classification are given in Section 4.4.

4.1.3 CW operation overview

All CW operations are preceded by a read operation that identifies the candidate. Recall, a CW operation is conditioned-on the latest complete candidate (actually, on the object history set for which classification yields the candidate). R/CW semantics ensure that a single conditioned-on chain exists from any candidate back through all previous complete candidates to the initialized object. Each entry in a replica history is a logical timestamp, conditioned-on logical timestamp, value tuple. The elements of this tuple are denoted LT, LT conditioned, Data and replica histories are initialized to 0, 0, ⊥.

The largest timestamp in the object history set is used by the client to create a timestamp for the CW operation. As discussed later, hashes of the object history set and the object value are also placed in the timestamp (these hashes ensure that all CW operations have a unique timestamp and protect against Byzantine entities). The client sends CW requests to all storage-nodes. The CW request contains the timestamp of the CW operation, the object history set constructed by the preceding read operation, the candidate (found from classification of the object history set), and the object value.

Correct storage-nodes execute a CW request only if the timestamp, value, and replica history can all be validated. Validation ensures failure atomicity, concurrency atomicity, and, as discussed below, protects against Byzantine entities. If sufficient storage-nodes execute CW requests, the CW operation completes; otherwise, it aborts. CW operations may abort due to concurrency. Since repair is a CW operation, and may be part of a read operation, read operations may also abort.

To ensure that only one CW operation completes that is conditioned upon another CW operation, a barrier may be written to ensure that outstanding concurrent CW operations ·

4.2 Mechanisms 63 cannot complete. Barriers mark a point in time without inserting a value into the system.

Thus, a completed barrier written infront of an incomplete value prevents (or bars) the incomplete operation from ever completing; storage-node validation will fail since the barrier’s timestamp is larger than the timestamp being conditioned on by the incomplete operation. Barriers allow CW operations to be issued that may complete in the face of concurrency or client failure.

4.2 Mechanisms This section describes various mechanisms employed to guarantee safety within the the R/CW protocol.

4.2.1 Validating timestamps Logical timestamps are structured values with three members: the primary timestamp (Time), the object history set verifier (Verifier OHS), and the value verifier (Verifier Data).

The verifiers are collision-resistant hashes over the object history set and the object’s value respectively. In comparing two logical timestamps, to determine which is greater, the major timestamps are first compared, and then the verifiers are compared. Since verifiers are guaranteed to be unique (for unique object values) all timestamps are guaranteed to be unique.

Pages:     | 1 |   ...   | 6 | 7 || 9 | 10 |   ...   | 18 |

Similar works:

«The Appropriated Racial Oppression Scale Development and Initial Validation Rebecca Rangel Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive Committee of the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY © 2014 Rebecca Rangel All rights reserved ABSTRACT The Appropriated Racial Oppression Scale Development and Initial Validation Rebecca Rangel The present study examined appropriated racial oppression in a sample of 656...»

«HIPPOCAMPAL OVERSHADOWING: EXPLORING THE UNDERLYING MECHANISMS TINE LANDEHAGEN GULBRANDSEN Bachelor of Science Kinesiology, University of Lethbridge, 2009 A Thesis Submitted to the School of Graduate Studies Of the University of Lethbridge in Partial Fulfilment of the Requirements of the Degree DOCTOR OF PHILOSOPHY Department of Neuroscience University of Lethbridge LETHBRIDGE, ALBERTA, CANADA © Tine Landehagen Gulbrandsen, 2015 HIPPOCAMPAL OVERSHADOWING: EXPLORING THE UNDERLYING MECHANISMS...»

«The Pernicious Problems of the Scheme/Content Distinction Brendan Harrington Doctor of Philosophy University of York Department of Philosophy October 2012 Abstract This thesis offers a solution to the problems caused by a particular sort of philosophical project – that of attempting to identify the roots of normativity; of showing what it is that allows that we have norms which can be applied to the empirical world. Transcendental idealism sets the mould for the form of such projects as I...»

«This is a preprint of an article whose final and definitive form will be published in the Australasian Journal of Philosophy 93(2): 389-394. Please cite the published version. Non-Agential Permissibility in Epistemology∗ Luis R. G. Oliveira University of Massachusetts, Amherst January 2015 Paul Silva has recently argued that doxastic justification does not have a basing requirement. An important part of his argument depends on the assumption that doxastic and moral permissibility have a...»

«Hegel’s Philosophy of Right 1 Space for Notes ↓ Excerpts from Hegel’s Philosophy of Right (1821) concerning education. Introduction § 20 The reflection which is brought to bear upon impulses, placing them before itself, estimating them, comparing them with one another, and contrasting them with their means and consequences, and also with a whole of satisfaction, namely happiness, brings the formal universal to this material, and in an external way purifies it of its crudity and...»

«Cost Benefit Analysis for Software Process Improvements Sapan K Shah Submitted for the Degree of Master of Philosophy from the University of Surrey UniS Department of Computing School of Electronics and Physical Sciences University of Surrey Guildford, Surrey GU2 7XH, UK Oct 2008 Supervised by: Professor Paul Krause, Dr Bill Mitchell ©Sapan K Shah 2008 Abstract Software Process Improvement (SPI) techniques have repeatedly proven to be effective in removing defects from software artefacts and...»

«Religious Studies (2012) 48, 15–34 f Cambridge University Press 2011 doi:10.1017/S0034412510000740 God and the ontological foundation of morality WES MORRISTON Department of Philosophy, University of Colorado, Boulder, CO 80309-0232 email: Wes.Morriston@Colorado.EDU Abstract: In recent years, William Lane Craig has vigorously championed a moral argument for God’s existence. The backbone of Craig’s argument is the claim that only God can provide a ‘sound foundation in reality’ for...»

«A Mathematical Model of Divine Infinity Prof. Eric Steinhart, Department of Philosophy, William Paterson University, Wayne NJ 07470. Email: esteinhart1@nyc.rr.com, steinharte@wpunj.edu. Published in 2009 in Theology and Science 7 (3), 261 – 274. ABSTRACT: Mathematics is obviously important in the sciences. And so it is likely to be equally important in any effort that aims to understand God in a scientifically significant way or that aims to clarify the relations between science and theology....»

«FACTA UNIVERSITATIS Series: Linguistics and Literature Vol. 2, No 10, 2003, pp. 403 414 BEYOND BINARY OPPOSITION: DE-GENDERING AND REDEFINING GENDER 1 UDC 81'276.3 Ljiljana Marković English Department, Faculty of Philosophy Niš Abstract. The paper gives a review of the recent linguistic research in the study of gender at the end of 20th century, particularly in the light of studying the performativity of gender (the concept introduced by Judith Butler in Gender Trouble: Feminism and the...»


«Moral Testimony Pessimism and the Uncertain Value of Authenticity1 Abstract: Many philosophers believe that there exist distinctive obstacles to relying on moral testimony. In this paper, I criticize previous attempts to identify these obstacles and offer a new theory. I argue that the problems associated with moral deference can’t be explained in terms of the value of moral understanding, nor in terms of aretaic considerations related to subjective integration. Instead, our uneasiness with...»

«The Philosopher and the Literary Critic: Eric Voegelin and Northrop Frye Copyright 2004 David Palmieri My presentation is an introduction to the doctoral dissertation I am writing at the University of Montreal in which I am combining Voegelin's theory of consciousness and Northrop Frye's mythological universe in order to analyze a corpus of about thirty Qu�b�cois and American poems. In the past, critics have attempted to combine Voegelin's insights with those of a complementary literary...»

<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.