FREE ELECTRONIC LIBRARY - Abstracts, online materials

Pages:     | 1 |   ...   | 11 | 12 || 14 | 15 |   ...   | 18 |

«Efficient, scalable consistency for highly fault-tolerant storage GARTH GOODSON August 2004 CMU-CS-04-111 Dept. of Electrical and Computer ...»

-- [ Page 13 ] --

203: for all (Oi ∈ RepairOperation) do 204: ObjectSet := ObjectSet ∪ Oi 205: end for 206: ObjectHistorySets := QUERY( READ HISTORIES, ObjectSet ) 207: if (BARRIER NEEDED(ObjectHistorySets) = TRUE) then 208: ObjectHistorySets, Status := UPDATE( BARRIER OPERATION, ObjectSet, ObjectHistorySets) 209: if (Status = FAIL) then 210: return FAIL 211: end if 212: end if 213: Candidate, Status := REPAIR NEEDED(ObjectHistorySets) 214: if (Status = FALSE) then 215: return (FAIL) 216: end if 217: Status := UPDATE(RepairOperation, ObjectHistorySets) 218: return (Status)

Figure 5.3: Client multi-obj repair pseudo-code.

constructs the object history sets and performs reclassification to ensure that repair is still required. A few cases exist in which repair is not required—the most obvious is when the operation being repaired has completed (or has been repaired by another client).

More subtle cases are discussed a little later. If the client determines that repair is still required, an update operation corresponding the operation that is to be repaired is performed, line 217.

As mentioned repair may not be necessary for a few reasons. Usually, it will be the case that the histories returned from the barrier-write indicate that the RepairOperation has completed (or been repaired by another client). However, it is possible for some objects involved in a multi-object operation to be classified as repairable and others to be classified as incomplete (depending on the client’s system view). If this is the case, it is not possible to repair all the candidates involved in the multi-object operation. The client can deduce, since some candidates involved in the multi-object operation are incomplete, that no candidate involved in the multi-object operation is complete (thus safely reclassifying the repairable candidate as incomplete). Likewise, by sending the set of object history sets for all objects updated by the multi-object operation to the metadata-nodes, the metadata-nodes can reach the same conclusion and allow such repairable candidates ·

5.2 Metadata operations 97

–  –  –

Figure 5.4: Example of multi-object repair.

For this setup: N = 4, COMPLETE = 3, INCOMPLETE = 2. (a) Initially an update operation is performed on Objects A and B. However, it fails part-way through. (b) A read history operation is issued to read the object history set associated with Object A. Logical timestamp 1 is classified as incomplete. (c) An update is performed, and completes, on Object A at logical time 2 (conditioned on timestamp 0). (d) A query operation is performed on Object B. Logical timestamp 1 is classified as repairable, thus repair must be performed. First, the operation resulting in the version at timestamp 1 must be read. It returns the original operation that updated Objects A and B. The history of Object A is then read and classified. Object A’s timestamp 2 is classified as complete. From this, the client can deduce that Object B’s version at time 1 could never have completed (otherwise Object A’s version at 2 would have conditioned on time 1).

to be over-written.

Similarly, it may be the case that some objects (in a multi-object operation) appear as complete, while others appear incomplete or repairable. Again, the client and metadatanodes can come to the same conclusion by examining the object history sets of each object involved in the repair. Figure 5.4 shows an example of how multi-object repair works.

5.2.4 Summary

This section as described a number of extensions to the R/CW protocol that enables query and update operations to be performed against metadata objects. Instead of clients transmitting entire objects, query and update operations can be used to perform operations that · 98 Efficient, scalable consistency for highly fault-tolerant storage

–  –  –

Figure 5.5: Example metadata operations.

Metadata objects support query and update operations. Both types of metadata operations may need to perform repair. The requests a client sends to metadata-nodes to perform metadata operations are shown. (a) Query operations may complete in a single round trip. If insufficient results are returned at the candidate’s timestamp, a second request is sent to collect results at the candidate’s timestamp. (b) For update operations, a client first requests replica histories to identify the candidate. Then, the client issues the update operation with an object history set constructed from replica histories returned by a recent query. (c) To perform repair a client requires the operation that resulted in the candidate. Since operations may span multiple objects, metadata-nodes potentially return many replica histories.

are executed by each metadata-node. When metadata-nodes perform these operations they are able verify the integrity of the request and the result. Query operations require an object history set constructed from the replica histories returned by a recent query operation.

To increase efficiency, object history sets can be cached by clients. Both query and update operations may require repair.

Figure 5.5 describes query, updates, and repair operations.

Figure 5.5(a) illustrates the requests and replies a client exchanges with a metadata-node to perform a query operation. Figure 5.5(b) illustrates the requests and responses a client exchanges with a metadata-node to perform an update operation. Figure 5.5(c) illustrates the requests and responses a client exchanges with a metadata-node to perform repair.


5.3 Improving efficiency 99

5.3 Improving efficiency Two additional considerations for metadata objects are presented within this section. The first is an optimization to handle large metadata objects. The second details an approach to efficiently synchronize object replicas.

5.3.1 Breaking large objects into blocks Metadata objects could be very large (e.g., a directory object with thousands of files). To efficiently handle large metadata objects, metadata object replicas can be broken into fixed sized blocks. Even though the metadata object replica is broken into blocks, metadata operations still occur atomically on the metadata object.

If metadata objects can be stored in a structured fashion, update operations can be implemented to be considerate of block boundaries (e.g., not allowing directory entries to span blocks). In so doing, the number of blocks modified by an update operation can be minimized. If the state of metadata objects cannot be stored in a structured fashion, then techniques like those used in the Low Bandwidth File System [Muthitacharoen et al.

2001] could be employed to minimize the number of “chunks” that a metadata update operation modifies.

The value verifier of the logical timestamp for a CW operation on a large metadata object is a collision-resistant hash of the list of the replica’s block hashes. The cost of generating the verifier is linear in the number of blocks that comprise the object. For extremely large objects, Merkle hash trees [Merkle 1987] should be considered.

5.3.2 Object synchronization

Since update operations only execute at a subset of metadata-nodes, it is possible for some metadata-nodes to become “out-of-sync” (i.e., to not host the most recent complete candidate). To perform an update operation, the metadata-node requires the version of the object at the candidate’s timestamp. A metadata-node can “sync” its object replica by fetching the value corresponding to the latest complete candidate directly from another · 100 Efficient, scalable consistency for highly fault-tolerant storage metadata-node. There is enough information in the object history set for the metadatanode to know which other metadata-nodes host the candidate. As well, the metadata-node can validate the correctness of the object value received with the verifier in the candidate’s timestamp.

To sync a large metadata object, a metadata-node requests the hash list (or tree) for the candidate object from another metadata-node that hosts the candidate. The value verifier in the timestamp validates the correctness of the hash list returned. Given the hash list for the candidate object version, the metadata-node can request only the out-of-date blocks.

The hashes in the hash list validate each block of the metadata object.

5.4 PASIS metadata objects

The PASIS metadata service (PMD service) exports a number of metadata objects. Each type of metadata object consists of internal state and provides a set of deterministic operations that can be performed on the object, as described in Section 5.3. Some operations span multiple objects—for example, a rename operation is performed on a pair of directory objects. Others may be read-only. This subsection describes the design of four types of metadata objects: directory objects, attribute objects, lock/lease objects, and authorization objects. Directory and attribute objects are fully implemented. Lock and authorization objects are designed but not yet implemented. Implementation details of directory and attribute objects are described within this section. The implementation of the PMD service, in the context of a distributed NFS framework, is described in Section 5.5.

The design of the PASIS metadata objects focuses on minimizing the access concurrency experienced by any one metadata object. Reducing the amount of update concurrency experienced by metadata objects improves the efficiency of the underlying R/CW protocol actions.


5.4 PASIS metadata objects 101

5.4.1 Attribute objects

An attribute object exists for each file stored in the PASIS storage service. The attribute object contains the per-file information expected by the clients (e.g., the NFS server and the NFS clients). In our implementation, these attributes map directly to typical UNIX file attributes (e.g., mode, link count, uid, gid, size, mtime, ctime, etc.).

There is a tradeoff between storing attributes in separate objects versus storing them within their parent directory entry. If stored within the directory, operations that access attributes and directories need only access a single metadata object. However, storing attributes within directory entries increases the false sharing of the directory object for any operation that operates on the attributes without operating on the directory (e.g., setattr and getattr). Since there may be many files managed by each directory object, the read and update traffic for these attributes could generate frequent concurrent accesses to the directory object. Additionally, hard links (i.e., multiple names for the same object) cannot be easily supported if attributes are stored within directory entries.

5.4.2 Directory objects

Directory objects store the names and access information for files and other directories.

Access information specifies how the named object can be accessed (e.g., where the objects are located and their encodings, not access control information). The access information for directories is PMD service specific. The access information for files is storage service specific. For example, if the R/W protocol is being used as the protocol underlying the storage-service, the access information will contain the protocol parameters (e.g., N, m, QC, etc.) and the encoding scheme being used (e.g., replication, IDA, etc.).

The attributes of a directory are stored in the directory object itself—a separate attribute object is not used. Since most operations that access a directory object also access the directory’s attributes, this design decision does not contravene the design goal of separating objects to minimize access concurrency. Indeed, directories maintaining their own attribute information allows for greater efficiency at the storage-node and over the net· 102 Efficient, scalable consistency for highly fault-tolerant storage work: object histories need only be maintained (and returned) for the directory objects.

On the other hand, since files can use a separate storage-service protocol, attributes and data must be updated independently.

The directory object, since it may be large, is stored as a collection of blocks (see Section 5.3.1). The directory object block size is 4 KB, in our implementation. A simple structure is used in the implementation of the directory objects; it is just a list of directory entries. Each directory entry is a name, access pair. The access information encodes the object’s ID, the set of node IDs that host the named object, and the scheme which describes the encoding of the object (e.g., the replication factor). The object encoding is specific to the service owning the named object (i.e., either the PMD or the PASIS storage-service).

To look up a name in the directory object, a linear search of its directory entries is performed. When entries are added to the directory object, care is taken to avoid splitting them across block boundaries. When entries are deleted, no compaction is performed, but the free space created may be used for future entry insertions. Standard improvements to directory implementations, such as using b-trees to avoid linear searching, could be applied.

5.4.3 Lock (lease) objects

Lock objects provide serialization points. Locks are not needed for metadata object consistency, since the Q/U protocol ensures that all metadata operations occur atomically.

However, lock objects may be desired by clients wishing to control access to data. As such the design and implementation of the lock objects is dependent on their use. By providing the ability to implement lock objects using the Q/U protocol, locks are guaranteed to have the same fault-tolerance, consistency, and scalability guarantees as the metadata.

If lock objects are implemented in this manner, the storage service must be able to validate locks presented to it. Recall, the storage service is implemented by a distinct set of storage-nodes. We envision three possible scenarios for lock validation. First, capabilities are generated as the result of lock operations; these capabilities can be verified by storage·

Pages:     | 1 |   ...   | 11 | 12 || 14 | 15 |   ...   | 18 |

Similar works:

«Chapter 2: Philosophical Foundations: Critical Realism 2.1 Introduction and Context Over the last four hundred years science has been incredibly successful in generating knowledge about the world and producing the vast array of technology that now shapes every minute of our lives. It is not surprising therefore that (natural) scientific knowledge came to be seen as the only valid form of knowledge. Modern-day science began in the days of the Enlightenment with the then novel idea that we could...»


«FACTA UNIVERSITATIS Series: Linguistics and Literature Vol. 2, No 10, 2003, pp. 403 414 BEYOND BINARY OPPOSITION: DE-GENDERING AND REDEFINING GENDER 1 UDC 81'276.3 Ljiljana Marković English Department, Faculty of Philosophy Niš Abstract. The paper gives a review of the recent linguistic research in the study of gender at the end of 20th century, particularly in the light of studying the performativity of gender (the concept introduced by Judith Butler in Gender Trouble: Feminism and the...»

«This is a preprint of an article whose final and definitive form will be published in the Australasian Journal of Philosophy 93(2): 389-394. Please cite the published version. Non-Agential Permissibility in Epistemology∗ Luis R. G. Oliveira University of Massachusetts, Amherst January 2015 Paul Silva has recently argued that doxastic justification does not have a basing requirement. An important part of his argument depends on the assumption that doxastic and moral permissibility have a...»

«JANICE MOULTON A PARADIGM OF PHILOSOPHY: THE ADVERSARY METHOD THE UNHAPPY CONFLATION OF AGGRESSION WITH SUCCESS It is frequently thought that there are attributes, or kinds of behavior, that it is good for one sex to have and bad for the other sex to have. Aggression is a particularly interesting example of such an attribute. This paper investigates and criticizes a model of philosophic methodology that accepts a positive view of aggressive behavior and uses it as a paradigm of philosophic...»

«Centre for Philosophy of Natural and Social Science ABSTRACTS OF PAPERS Note that in many instances the following abstracts summarise works that are in various stages of completion. Please do not quote without consulting the author(s). Claus Beisbart How to Make a Difference – Measures of Voting Power Revamped Claus Beisbart and Luc Bovens Voting power (i-power) measures the extent to which a vote can make a difference to the outcome of a collective decision. And a voter has the opportunity...»

«Реpublic of Serbiа OFFICE OF THE WAR CRIMES PROSECUTOR Belgrade, 9 November 2007 DISTRICT COURT IN BELGRADE War Crimes Chamber Belgrade Pursuant to my authority under articles 46 §2 (3) and 266 of the Criminal Procedure Act re articles 3 and 4 §2 of the Law on Organisation and Competence of State Authorities in War Crimes Procedure, I hereby bring the INDICTMENT Against Ilija JURIŠIĆ, from Tuzla, married and father of two majors; trained schoolteacher and philosophy graduate, now retired...»

«Hegel’s Philosophy of Right 1 Space for Notes ↓ Excerpts from Hegel’s Philosophy of Right (1821) concerning education. Introduction § 20 The reflection which is brought to bear upon impulses, placing them before itself, estimating them, comparing them with one another, and contrasting them with their means and consequences, and also with a whole of satisfaction, namely happiness, brings the formal universal to this material, and in an external way purifies it of its crudity and...»

«ADAPTATION OF THAI TRADITIONAL PLAYS IN A CONTEMPORARY CONTEXT Submitted by SAWITA DITEEYONT, to the University of Exeter as a thesis for the degree of Doctor of Philosophy in Performance Practice, February 2012. This thesis is available for Library use on the understanding that it is copyright material and that no quotation from the thesis may be published without proper acknowledgement. I certify that all material in this thesis which is not my own work has been identified and that no...»

«Myths and Meanings of Voting Power: Comments on a Symposium Dan S Felsenthal Mosh´ Machover e University of Haifa King’s College, London August 5, 2000 Please address all correspondence to: Mosh´ Machover e Department of Philosophy King’s College Strand London WC2R 2LS England Phone: +44 20 8969 5356 (home) Fax: +44 20 7848 2270 (office) E-mail: moshe.machover@kcl.ac.uk ABSTRACT These are comments on the Symposium Power Indices and the European Union in the July 1999 issue of this...»


«6th Advanced Seminar on Peirce’s Philosophy and Semiotics, 2012. http://estudospeirceanos.wordpress.com Comment on Vincent Colapietro’s Article: The Tones, Tints, and Textures of Temporality: Putting Peirce’s Categories to Work Ivo A. Ibri Center for Pragmatism Studies – PUC/SP It is always a pleasure to take part in this Seminar, for which I thank and congratulate Lucia and her organizing team. It is also a great opportunity to be with great scholars of Peirce and, particularly, to...»

<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.