FREE ELECTRONIC LIBRARY - Abstracts, online materials

Pages:     | 1 |   ...   | 10 | 11 || 13 | 14 |   ...   | 18 |

«Efficient, scalable consistency for highly fault-tolerant storage GARTH GOODSON August 2004 CMU-CS-04-111 Dept. of Electrical and Computer ...»

-- [ Page 12 ] --

For all of these systems, scalability and fault-tolerance of the metadata service are key challenges. The most common fault-tolerance solutions are agreement algorithms that perform state machine replication (e.g., using a protocol like [Bracha and Toueg 1985]). Unfortunately, such an approach does not scale as replicas are added. To make this approach scale, it is common to partition metadata across separate metadata servers (or replica sets). Unfortunately, unlike with data, this solution often comes with a visible change in semantics: loss of ability to perform atomic operations, such as rename, across directories stored on distinct metadata servers.

This chapter develops an alternate design for survivable, scalable, metadata services that maintains strong semantics. The architecture of this system is shown in Figure 5.1.

Our PASIS metadata service (PMD) is “survivable” in that it relies on few assumptions about the environment in which it runs: it is designed to withstand arbitrary (Byzantine [Lamport et al. 1982]) failures of clients and a limited number of metadata-nodes, and requires no timing (synchrony) assumptions for correctness. In addition, it is “scalable” in that the addition of new metadata-nodes yields improvements in the capacity, · 90 Efficient, scalable consistency for highly fault-tolerant storage

–  –  –

systems has been scaling read–write storage to improve throughput, capacity, or fault-tolerance.

The PASIS metadata service scales in a similar fashion: its throughput, capacity, or fault-tolerance can be improved by adding more metadata-nodes. Note that the metadata and storage processes can execute on the same hardware, even though the picture and most designs have them logically separated.

throughput or fault-tolerance of the service; we refer to this as “horizontal scalability”.

The PASIS metadata service is constructed of metadata objects that utilize the R/CW protocol described in Chapter 4. While the R/CW protocol focused on reading and writing entire objects, this chapter extends the R/CW protocol to allow for more general query and update operations. These operations provide access to objects at a finer-granularity (e.g., reading/inserting directory entries vs. reading/writing full directories) (Sections 5.2.1 and 5.2.2). In addition, atomic updates across multiple objects are required: e.g., moving a file from one directory to another requires that the removal and insertion be performed atomically on the source and destination directories. Thus, a single conditional write operation that can modify multiple objects and can be conditioned on a superset of these objects being unchanged is introduced in Section 5.2.3. However, these extensions do not fundamentally alter how the R/CW protocol behaves. The bounds in terms of N and the thresholds remain the same, as does the optimistic nature of the protocol and its ·

5.1 Overview 91 guarantees, all while avoiding expensive cryptography.

Moreover, by avoiding heavyweight agreement protocols, the PASIS metadata service offers horizontal scalability that has not yet been achieved for such a service. Our protocols derive from threshold voting protocols [Gifford 1979; Thomas 1979]. In such an approach, only subsets (i.e., a majority) of metadata-nodes need be accessed to complete an operation. As such, metadata-nodes can be added to improve capacity, throughput, or fault-tolerance. Moreover, the threshold voting approach employed can be extended to quorum systems that offer greater throughput scalability [Malkhi et al. 2000; Naor and Wool 1998].

5.1 Overview

Metadata objects are a type of R/CW object that provide metadata-specific interfaces.

Read operations of R/CW objects are extended to be query operations of metadata objects; read operations read the R/W object, whereas query operations return the result of a deterministic read-only function performed on the metadata object. CW operations of R/CW objects are extended to be update operations of metadata objects; CW operations send an object replica in each request, whereas update operations invoke a deterministic function on the metadata object.

Since each metadata node performs the operation on its replica, metadata objects provide replicated state machine [Schneider 1990] semantics. These semantics prevent Byzantine clients from corrupting the state of metadata objects, since all updates are veried by the metadata servers. For example, metadata-nodes can prevent a Byzantine client from inserting an existing name into a directory object, because the metadata-nodes can only be manipulated by the appropriate operations (and the results can be verified by the metadata-node).

Two optimizations have been implemented to improve the efficiency of metadata objects. First, operations can be performed on metadata objects optimistically by sending only the operation and object history set to metadata-nodes; entire objects need not be · 92 Efficient, scalable consistency for highly fault-tolerant storage transmitted across the network, thus reducing bandwidth. This is optimistic because, if the metadata-node does not host the candidate version on which the operation is to be performed, the metadata-node must re-sync its replica of the object (requiring an additional round-trip). Second, large metadata objects are broken into blocks, this aids in the reduction of bandwidth when syncing large replicas. When replica values must be fetched, only modified portions of the replica need be sent.

5.2 Metadata operations

To perform operations correctly, metadata-nodes must perform the operation on the version of the object replica that corresponds to the latest complete candidate. As in the R/CW protocol, the metadata-node requires the object history set to classify the complete candidate. As such, metadata operations build closely upon the R/CW protocol developed in the previous chapter. However, instead of shipping the data values (the results of client-side operations), the operations themselves are transmitted.

Metadata operations can be performed atomically on multiple objects. Since some operations span metadata objects, to provide failure atomicity it is necessary to perform these operations on multiple objects atomically. For example, rename removes a file from one directory object and adds it to another directory object.

This subsection describes two classes of metadata operations: query operations and update operations. As well, multi-object operations are discussed.

5.2.1 Query operations

–  –  –

QUERY(Operation) :

100: ObjectHistorySet, QueryResultSet := DO QUERY(Operation, QUERY LATEST, ⊥) 101: Candidate, Status := CLASSIFY(ObjectHistorySet) 102: /∗ Iterate through returned object history set ∗/ / 103: CandidateResultSet := 0 104: for all (Mi ∈ MetadataNodeSet) do 105: /∗ Add results from responses whose latest element matches the candidate ∗/ 106: if MAX[ObjectHistorySet[Mi ]] = Candidate then 107: CandidateResultSet := CandidateResultSet ∪ QueryResultSet[Mi ] 108: end if 109: end for 110: /∗ Perform voting on the set of matching data results, need b+1 matching responses ∗/ 111: Count, Data := VOTE(CandidateResultSet) 112: if (Count b + 1) then 113: /∗ If less than b+1 results match, redo the query at the candidate’s timestamp ∗/ 114: CandidateResultSet := DO QUERY(Operation, QUERY LTIME, Candidate.LT) 115: Count, Data := VOTE(CandidateResultSet) 116: end if 117: /∗ If classification yields a complete candidate, return any of the matching votes ∗/ 118: if (Status = CLASSIFIED COMPLETE) then 119: return (SUCCESS, Candidate.LT, Data ) 120: else 121: /∗ Status = CLASSIFIED REPAIRABLE, perform repair ∗/ 122: return (REPAIR OPERATION(Operation, Candidate, Candidate.LT conditioned, ObjectHistorySet)) 123: end if

Figure 5.2: Client query pseudo-code.

puted over the entire object, it can not be used to validate the data associated with a read response; instead, a voting scheme must be used.

The pseudo code for a read operation is shown in Figure 5.2. To perform a query operation, the metadata-node returns its replica history as well as the result of the query operation applied to the latest version of the object replica (cf. line 100). The client identifies the candidate by performing classification on the object history set, on line 101.

Once the candidate is classified as complete, the client must determine the result of the query operation. Since only results pertaining to the latest timestamp in a replica’s history are returned, the set of results corresponding to the candidate’s timestamp must be constructed (cf. line 107).

The client then counts the votes in this set of results, see line 111. Matching results from b + 1 metadata-nodes are sufficient “votes” for a client to use the result. Of course, more than b + 1 object histories are required to identify the latest complete candidate.

Since query operation results are returned optimistically based on the latest version hosted by the metadata-node, it is possible that no response attains a sufficient number of “votes”.

· 94 Efficient, scalable consistency for highly fault-tolerant storage If this is the case, the client performs the query metadata operation at a specific timestamp (the candidate’s timestamp); see line 114. Finally, if the candidate is classified as complete, the result of the “voting” can be returned. Otherwise, repair is needed. Since the client does not hold a full copy of the object, repair is more complicated than described in the R/CW protocol and will be discussed later in the context of multi-object operations.

As an optimization if the query operation results are large, some metadata-nodes can act as witnesses by returning a hash of the operation’s result. Voting can then be performed over the resultant set of hashes. Since the result of most query operations are small, the tradeoff between the computation time required to perform the hashing and the transmitting and comparison of the data value is in favor of the latter. (An exception may be the readdir operation as it returns a large number of directory entries).

5.2.2 Update operations

The CW operation of the R/CW protocol is extended for metadata objects to include update operations (e.g., setattr would update the attributes for an object). As in query operations, update operations do not transmit the object’s new data value; only the operation to be applied to the replica and the object history set is sent. Allowing the metadata-nodes to perform update operations locally ensures the validity of the update. As in the R/CW protocol, updates are conditioned on the latest complete candidate, which is determined through classification of the object history set. Similar to query operations, client count “votes” on the results returned from the update operations.

If a metadata-node does not host the candidate, then it cannot safely perform the operation. In this case, the metadata-node must synchronize its object replica by fetching the state associated with the latest candidate. Object replica synchronization is discussed in Section 5.3.2.

Since the client does not have a local copy of the metadata object to update, it cannot construct the timestamp of the update operation (specifically, the verifiers in the timestamp). However, the timestamp can be constructed deterministically from the object history set and the resulting value of the updated metadata object. Since metadata-nodes have ·

5.2 Metadata operations 95 all of this information, they can be relied upon to deterministically construct the timestamp of the update operation (and it can be returned as part of the result). The calculated timestamp is then inserted into the replica’s history.

5.2.3 Multi-object operations Since all metadata object replicas are stored on the same set of metadata-nodes, the metadata-nodes can locally lock the set of object replicas being operated upon. Thus, a metadata-node can perform validation for each object replica accessed by the operation, and then, only if validation passes for all objects, execute the operation. This approach of validation has similarities to the validation phase performed in optimistic concurrency control [Kung and Robinson 1981]. However, one extra step is required in the validation of multi-object update operations. To prevent malicious clients from executing different operations across different objects at different metadata-nodes, the hash of the operation (including it’s arguments and the set of objects the operation operates on) is included in the logical timestamp. This fixes the result of a multi-object update operation to a specific timestamp.

Multi-object operations complicate repair. Pseudo-code for the repair of multi-object operations is shown in Figure 5.3. If a repairable candidate is identified, then the client must request the operation that resulted in the repairable candidate (cf. line 200). Note that, since barrier-writes are always followed by an update operation, they need not be repaired. In response to the READ OPERATION query, a metadata-node returns the operation and the object replica histories for all objects updated by the operation at the specified logical timestamp. Recall, the QUERY operation requires b + 1 votes to return a result. The client constructs an object history set for each object updated by the operation by issuing a READ HISTORIES query operation containing the object history sets of interest (cf.

line 206).

Next, a check is made to verify if a barrier-write across the sets of objects is required (cf. line 207). If a barrier is required then the client performs a barrier-write conditionedon these object history sets (cf. line 208). If the barrier-write completes, the client re· 96 Efficient, scalable consistency for highly fault-tolerant storage


200: RepairOperation := QUERY( READ OPERATION, Object, LT ) 201: /∗ Iterate through all objects in the returned operation ∗/ / ObjectSet := 0 202:

Pages:     | 1 |   ...   | 10 | 11 || 13 | 14 |   ...   | 18 |

Similar works:

«International Journal of Advanced Multidisciplinary Research and Review Volume 1, No.:1, 2013 Winter Pages: 32 49 The Doctrinal Peculiarity of 19th Century Adventism: Teaching About The Trinity Professor Bernard Kozirog, PhD1 This article presents an approach to the Seventh-day Adventist doctrine of the Trinity. In the Adventist‟s philosophy this view evolved from antitrinitarianism to trinitarianism. In the nineteenth century, Seventh-day Adventists modeled on other religious denominations,...»

«1 Caspar Hare June 2009 Forthcoming in Philosophical Perspectives Perfectly Balanced Interests1 One major challenge in moral theory has been to account for some intuitively striking moral differences between decision problems that involve conflicts of interest, and decision problems that do not. These differences come out clearly in rescue cases. Consider: One Person, One Island While idly steaming through an almost-deserted South Sea archipelago, I receive a distress call. Agatha has recently...»

«WHAT DOES HOLISM HAVE TO DO WITH MORAL PARTICULARISM? Sean McKeever and Michael Ridge Abstract Moral particularists are united in their opposition to the codification of morality, and their work poses an important challenge to traditional ways of thinking about moral philosophy. Defenders of moral particularism have, with near unanimity, sought support from a doctrine they call “holism in the theory of reasons.” We argue that this is all a mistake. There are two ways in which holism in the...»

«Appeal to the Clergy by Leo Tolstoy Transcriber’s note – This is Tolstoy’s most biting and inflammatory criticism of the Church ant its clergy. While I do not agree completely with his evaluation of Christianity (although I do agree with much of it), and while this piece refers to nonviolence only tangentially, I have included it in this collection because it gives unique insight into his philosophy. I would ask those of you who would strongly disagree with Tolstoy’s claims: how would...»

«NEW PUZZLES ABOUT DIVINE ATTRIBUTES Forthcoming in European Journal for Philosophy of Religion MOTI MIZRAHI St. John's University, New York Abstract: According to traditional Western theism, God is maximally great (or perfect). More explicitly, God is said to have the following divine attributes: omnipotence, omniscience, and omnibenevolence. In this paper, I present three puzzles about this conception of a maximally great (or perfect) being. The first puzzle about omniscience shows that this...»

«The CONSPIRACY To Destroy All Existing Governments And Religions William Guy Carr Books by Commander C a r r By Guess and By God Hell's Angels of the Deep High and Dry Good Hunting Out of the Mists Checkmate in the North Brass Hats and Bell Bottomed Trousers Red Fog Over A m e r i c a Pawns in the Game The Conspiracy to Destroy all Existing Governments and Religions Final Work (title yet to be disclosed) Printed in the U n i t e d States of A m e r i c a In 1796 John Robison, Professor of Human...»

«AN INVESTIGATION OF NOVEL APPROACHES FOR OPTIMISING RETAIL SHELF SPACE ALLOCATION by Ruibin Bai, BEng, MSc Thesis Submitted to The University of Nottingham for the Degree of Doctor of Philosophy School of Computer Science & Information Technology The University of Nottingham, Nottingham, UK September 2005 CONTENTS Contents. List of Figures List of Tables Abstract. Acknowledgements Declaration. Chapter 1. Introduction 1.1 Background and Motivations 1.2 Scope and Aims 1.3 Overview of the Thesis...»

«A critical discussion of Jonathan Dancy’s Moral Particularism by Philipp Schwind Dissertation submitted for the degree of M.Phil. University of St. Andrews School of Philosophical, Anthropological and Film Studies December 2006 Declarations I, Philipp Schwind, hereby certify that this thesis, which is approximately 40,000 words in length, has been written by me, that it is the record of work carried out by me and that it has not been submitted in any previous application for a higher degree....»

«e Philosophical Theology for Mormons: Some Suggestions from an Outsider by Stephen T. Davis I I n this paper I want to ask what philosophical theology might be able to do for Latter-day Saints. I will begin with a discussion of what philosophy is and how it contrasts with theology. Then I will describe what philosophical theology is and what it typically can do for faith communities, especially Christian ones. Then I will suggest four issues in contemporary Mormon thought where, as it seems to...»

«Ó Springer 2007 Law and Philosophy (2007) 26: 31–65 DOI 10.1007/s10982-005-5917-2 TAMAR MEISELS COMBATANTS – LAWFUL AND UNLAWFUL (Accepted 27 December 2005) The September 11 attacks led many Americans to believe that Al-Qaeda had plunged the U.S. into a new type of war, already familiar to some of the country’s closest allies. Subsequent debates over modern terrorism often involve a sort of lamentation for the passing of old-fashioned wars.1 Paul Gilbert’s New Terror, New Wars suggests...»

«BRIEF, ONLINE INTERVENTIONS FOR PERFECTIONISTS By MIRELA A. ALDEA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULLFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA © 2007 Mirela A. Aldea ACKNOWLEDGEMENTS I am appreciative and profoundly indebted to Dr. Ken Rice who shared his time and knowledge unselfishly and who encouraged and inspired me to aim high. I am grateful to my parents who instilled in me the belief...»

«Writing Philosophically T HE INTRODUCTIONS, SUMMARIES, AND QUESTIONS for reading and thought found throughout Traversing Philosophical Boundaries are intended to help you get more out of your reading and to aid in class discussion. They should also prove useful in studying for the exams that you will probably be required to take. Since most philosophy classes require papers in addition to exams, what follows is an overview of how to write philosophically. It is intended to help you with each of...»

<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.