WWW.ABSTRACT.DISLIB.INFO
FREE ELECTRONIC LIBRARY - Abstracts, online materials
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 12 | 13 || 15 | 16 |   ...   | 18 |

«Efficient, scalable consistency for highly fault-tolerant storage GARTH GOODSON August 2004 CMU-CS-04-111 Dept. of Electrical and Computer ...»

-- [ Page 14 ] --

5.4 PASIS metadata objects 103 nodes (as in NASD [Gibson et al. 1998]). Second, all accesses to the storage service are serialized through the metadata service. Third, each storage-node verifies each access before performing it; i.e., each storage-node issues a query operation to the PMD service that returns the lock status.

There are two basic uses of lock/lease objects in distributed file systems: to maintain client cache consistency within the storage service and to provide application locking of data (i.e., file locking). To maintain client cache consistency, clients must be notified of changes to cached data. In such an approach, callbacks from the metadata service would be needed to notify holders of cached data that the data is stale. To maintain the fault-tolerance of the system, the application server ought to wait for b + 1 callbacks before acting; however, since caching is done for performance, not correctness, it is safe to invalidate cache entries based on a single callback.

Since fault-tolerant systems should not rely on potentially faulty clients to release locks, lock objects should provide lease semantics. Achieving lease semantics requires that locks timeout. The R/CW protocol is developed in an asynchronous model of time, so that invalid timing assumptions cannot break the properties provided by the R/CW protocol. In practice, loosely synchronized clocks are common and, if used wisely, can expire acquired locks.

5.4.4 Authorization objects

Authorization objects manage the privileges associated with metadata objects. There are two standard approaches to managing privileges: access control lists (ACLs) and capabilities. ACLs manage privileges on a per-object basis whereas capabilities manage privileges on a per-client/user basis. Either approach to privilege management can be implemented with authorization objects. An authorization object can be associated with each metadata object, and operations on the metadata object will only be performed if authorized.

Authorization objects may be needed for the storage service as well. Validation of authorization objects can occur similarly to the validation of locks. For example, the storage service can perform a read of the authorization object before permitting data to be read or · 104 Efficient, scalable consistency for highly fault-tolerant storage Figure 5.6: PASIS storage system. Components of the PASIS storage system are shown above.

The PASIS storage system is split into two components: a client and a set of storage-nodes. The client implements an NFSv3 server. The NFS server consists of a PASIS metadata (PMD) component and a PASIS storage (PS) component. A single NFS server is able to support multiple concurrent NFS clients. Alternatively, the NFS server may be mounted via loop-back on the same machine as the NFS client.

written. Or, the application server can provide a capability to the storage service to read or write specific data.

5.5 Storage-system implementation

–  –  –

5.5.1 Metadata operations Table 5.1 lists the set of metadata operations that are currently implemented by the PMD service. The operations are inspired by NFS, but are generic enough to support many file system instances. The Type field specifies whether the operation is an update or query operation. Example query operations include: getattr, lookup, readdir, and readlink.

The Object field specifies the number and types of metadata objects on which that operation operates. In the case of operations that span multiple objects, more than one metadata object is listed. For example, remove modifies the parent directory object and the link count attribute stored within the file’s attribute object.

As can be seen, many operations operate on directory objects. Many of these operations modify directory attributes as well as modifying directory entries (e.g., create, remove, etc.), thus justifying our design decision to encapsulate attributes within the directory object.

5.5.2 PMD metadata-nodes

The metadata-nodes use the Comprehensive Versioning File System (CVFS) [Soules et al.

2003] to store data objects and their versions. The query/update extensions to the R/CW protocol, as described in Section 5.3, have been implemented, as have object synchronization and multi-object repair. Additionally, each metadata operation described in Table 5.1 · 106 Efficient, scalable consistency for highly fault-tolerant storage has been fully implemented.

CVFS objects On the metadata-node, each metadata object is associated with three CVFS objects. One CVFS object is used to store the metadata object’s internal state (e.g., the directory structure). Attributes are stored within the extended attribute field of this CVFS object’s attributes. Another CVFS object stores the metadata object’s history, while the third stores a hash tree computed over the object’s internal state (to support large objects, see Section 5.3.1). The metadata object’s history and internal state are versioned on every update.

These versions can be garbage collected once the metadata-node classifies a later update operation as complete (i.e., on the next successful update of the metadata object). Note, completed barrier operations do not result in this version history compaction.





Object histories

Along with the metadata object’s history, query operations optimistically return the result of the operation performed on the latest version of the metadata’s internal state (as described in Section 5.2.1). A special query operation,readhist, is used to read only an object’s history. Batching of readhist results is supported (i.e., history from multiple objects can be returned by a single call). As well, all update operations also return the history associated with each object present in the operation. This history can be cached by clients to reduce the number of read history queries. Each metadata-node generates N authenticators over the object histories using HMACs based on pair-wise symmetric keys.

We use a publicly available implementation of MD5 for all hashes [Rivest 1992]. Each HMAC is 16 bytes long.

Object locking

–  –  –

operation ordering at that storage-node. This can help prevent unnecessary object syncing from occurring when objects are executed out-of-order, as is described in the following example.

Imagine the following sequence of operations pending at a single metadata-node at the same time: 1) create (a, /), 2) create (b, /), 3) setattr (b). It should be noted, that, if a correct client performed the operations, it is only possible for operations (2) and (3) to be pending concurrently if operation (2) has completed successfully and operation (3) is conditioned on (2); this can occur on a slow storage-node, since only a subset of the updates need to complete for the operation to complete, but updates are transmitted

everywhere. If only object locking is performed without preserving operation ordering:

operation (1) locks the ’/’ directory; operation (2) blocks on the lock held by the ’/’ directory; operation (3) attempts the setattr although the create has not yet completed on this metadata-node—in this case object syncing would attempt the create.

Update operation validation

After each replica within the operation has been locked, each object history set is validated. Once validation has successfully completed (for all objects), the update operation is performed. Validation is the same as for the R/CW protocol, with two exceptions. First, since the conditioned-on timestamp is calculated from the object history set (passed in by the client), no validation is performed on the condition-on timestamp (line 735 and 741 of Figure 4.8). Second, since update operations are transmitted, as opposed to full data objects in the R/CW protocol, there is no Verifier Data to validate. However, if repair is being performed, metadata-nodes must validate that the correct operation is being performed. To do this the operation hash is compared to the repairable candidate’s operation hash; recall, the operation hash is stored within the timestamp.

If the operation completes successfully, a hash is generated over the replica’s updated contents and is added to the object’s hash tree. Each replica history is updated with the new timestamp computed from a hash of the object’s hash tree, the operation’s hash, and the hash of the object history set (which was used to validate the operation—as described · 108 Efficient, scalable consistency for highly fault-tolerant storage in Section 5.2.2).

Object name uniqueness Each object within the PMD service is given a unique object ID (OID). Likewise, each file stored by the storage service is also identified by an OID. Object IDs are stored within directory entries to uniquely identify the file or directory to which the entry is linked.

Within the PASIS storage system, OIDs are similar to the inode numbers used by traditional file systems (or filehandles used by NFS). However, unlike traditional file systems, OIDs are not be centrally assigned. This complicates the validation performed during object creation.

In the PASIS storage system, the client is responsible for generating a 256 bit OID.

The client generates a 256 bit random number that it uses as the OID. The client then performs a read history query operation on the newly generated OID. If a metadata-node hosts the OID, it returns the replica history associated with the OID, if not, the metadatanode returns a special null replica history (a history with a single timestamp of 0). As well, the history of the parent directory object is also read.

When performing a create or a mkdir operation, the metadata-node validates the object history set to ensure that the create OID’s latest complete timestamp is 0. If a create operation succeeds (i.e., it receives successful responses from QC + b metadata-nodes), the client is ensured that the OID it generated is globally unique. If a create operation fails (i.e., is classifiable as incomplete), the metadata-node is free to accept a create operation from a different client of the same OID; since the latest complete timestamp is still 0. The null history entry remains part of the replica’s history until it is pruned by a subsequent update operation that observes a completed create. Validation is similar for the repair of a create operation: 0 must be the latest complete timestamp; and the operation hash of the repair operation must match the operation hash stored within the repairable candidate’s timestamp.

To remove an OID (e.g., through a unlink or a rmdir operation), the replica history associated with the OID must be reset to the initial null value. Thus, the OID is only free ·

5.5 Storage-system implementation 109 once a remove operation has completed successfully.

5.5.3 PMD clients A client library has been implemented to facilitate interfacing with the PMD service.

The library’s interface consists of the set of metadata operation service calls (with the exception of readhist, which is not exported externally). The implementation of the query and update operations follows the presentation in Section 5.3.

NFS server

A NFSv3 server has been implemented that uses the client library. All NFS metadata operations have been mapped to PMD service operations. NFS data operations (file read/write) are mapped to calls within the storage-service. There is a one-to-one mapping between NFS filehandles and PASIS OIDs.

Some NFS operations require multiple PMD operations. For example, there is a disconnect between the arguments of the NFS unlink operation and the PMD unlink operation. The NFS unlink operation takes a filename and a directory file handle as arguments, while the PMD unlink operation requires an additional argument, the filename’s OID. The filename’s OID maps to the attributes of the file, which may be updated by the unlink (e.g., the link count would be decremented). In order to perform this update operation, validation must be performed over the object’s history set. Thus, the OID of the filename’s attributes is required to construct its object history set. Therefore, a PMD lookup is performed prior to the unlink operation. Additionally, during the PMD unlink operation, the metadata-node validates that the filename matches the OID passed in.

Client history caching

To reduce the number of read history operations, object history sets are cached by the client. Every metadata operation request in the PMD service returns a replica history from each metadata-node executing the request. Histories are returned even if the request · 110 Efficient, scalable consistency for highly fault-tolerant storage fails to execute. Since histories are cached, they may become out-of-date, or stale. A stale replica history will cause the request to fail validation at the metadata-node from which the replica history originated (see line 728 in Figure 4.8). An up-to-date replica history is returned by the metadata in response to receiving a stale history; thus, the client can update its cache and retry the operation.

Retry and concurrency

Although the NFS server locks each filehandle associated with each operation at the PMD client, operations may still abort due to concurrency. Thus, operation retry is necessary.

Upon retry, new object histories must be obtained and classified. The operation is based upon these new histories. Many different policies regarding backoff and retry may be implemented to avoid retrying operations concurrently. This is particularly relevant in the face of repair, since repairs issued concurrently may cause livelock if they execute at metadata-nodes in an interleaved order that prevents any repair from completing successfully. This work does not focus on the policies regarding backoff and retry, however it is discussed further in the evaluation section.

5.5.4 Storage service



Pages:     | 1 |   ...   | 12 | 13 || 15 | 16 |   ...   | 18 |


Similar works:

«PARENT HANDBOOK OPEN MONDAY THROUGH FRIDAY HOURS: MONDAY THROUGH THURSDAY 9 AM TO 2 PM AND FRIDAY 9 AM TO 12 PM TEACHING CHILDREN 18 MONTHS TO 5 YEARS OF AGE TABLE OF CONTENTS • Parent Handbook Philosophy and Goals Accreditation and Licensing Enrollment Requirements Parents, thank you for your interest in St. Classroom Assignments Martin’s Episcopal School. We understand • Tuition (Tuition & Late Fee Policies) that enrolling your child at a new school can Communication be a confusing...»

«RECONCILING SEARLE AND DREYFUS Closing the Gap: Phenomenology and Logical Analysis By Sean Dorrance Kelly I I AGREED TO WRITE THIS ARTICLE ON T IS WITH BOTH PRIDE AND HESITATION THAT phenomenology and logical analysis. John Searle and Bert Dreyfus are for me two of the paradigm figures of contemporary philosophy, so I am extremely proud to have been offered the opportunity to engage with their work. The editors of The Harvard Review of Philosophy, it seems to me, have shown a keen sense of what...»

«OPTIMIZATION PROBLEMS IN TELECOMMUNICATIONS AND THE INTERNET By CARLOS A.S. OLIVEIRA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA To my wife Janaina. ACKNOWLEDGMENTS The following people deserve my sincere acknowledgments: • My advisor, Dr. Panos Pardalos; • Dr. Mauricio Resende, from AT&T Research Labs, who was responsible for introducing me to this...»

«Philosophy in Construction: understanding the development of expertise David Boyd BSc, MSc, PhD, CEng, MCIBSE, ACIOB and Mark Addis MA, MSc, PhD Birmingham City University Birmingham, B42 2SU, United Kingdom Construction appears to have nothing to do with philosophy as it is a practical activity. This paper introduces a project funded by the UK, Arts and Humanities Research Council, that has supported a philosopher in residence in three construction companies. The project has taken up the...»

«A God-Centered Approach to the Big Questions REDEEMING PHILOSOPHY \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ V ERN S. POY THRES S “In the author’s characteristically judicious and winsome fashion, this volume makes a timely and welcome contribution to the age-old debate on the relationship between Christian theology and philosophy. In doing so it provides a resolute and compelling case that the basic questions philosophy...»

«1 Caspar Hare June 2009 Forthcoming in Philosophical Perspectives Perfectly Balanced Interests1 One major challenge in moral theory has been to account for some intuitively striking moral differences between decision problems that involve conflicts of interest, and decision problems that do not. These differences come out clearly in rescue cases. Consider: One Person, One Island While idly steaming through an almost-deserted South Sea archipelago, I receive a distress call. Agatha has recently...»

«Boonin on the Future-Like-Ours Argument against Abortion Pedro Galvão Centro de Filosofia da Universidade de Lisboa David Boonin’s recent book1 is an impressively deep and detailed attempt to establish the moral permissibility of abortion on terms that the critics of abortion already accept. In order to show on such terms that the moral case against abortion is unsuccessful, one must defend abortion without committing oneself to the moral permissibility of infanticide. I am going to argue...»

«Introduction: Film-Philosophy and a World of Cinemas David Martin-Jones, University of Glasgow (David.Martin-Jones@glasgow.ac.uk) This Special Section arose out of the 2014 Film-Philosophy conference, held at the University of Glasgow, which took as its theme: A World of Cinemas. The aim of both the conference theme and this resulting Special Section is to broaden the debate in film-philosophy, both in terms of a world of cinemas and a world of philosophies. This introduction considers the...»

«HIPPOCAMPAL OVERSHADOWING: EXPLORING THE UNDERLYING MECHANISMS TINE LANDEHAGEN GULBRANDSEN Bachelor of Science Kinesiology, University of Lethbridge, 2009 A Thesis Submitted to the School of Graduate Studies Of the University of Lethbridge in Partial Fulfilment of the Requirements of the Degree DOCTOR OF PHILOSOPHY Department of Neuroscience University of Lethbridge LETHBRIDGE, ALBERTA, CANADA © Tine Landehagen Gulbrandsen, 2015 HIPPOCAMPAL OVERSHADOWING: EXPLORING THE UNDERLYING MECHANISMS...»

«OPPORTUNISTIC PRIVATE INVESTMENT PROGRAM STRATEGIC PLAN BOARD APPROVED: 02/18/2011 ARIZONA STATE RETIREMENT SYSTEM 3300 N CENTRAL AVENUE PHOENIX, AZ 85012 ARIZONA STATE RETIREMENT SYSTEM TABLE OF CONTENTS OPPORTUNISTIC PRIVATE INVESTMENT PROGRAM STRATEGIC PLAN P age |i TABLE OF CONTENTS Introduction Investment Philosophy Mission Statement Investment Themes and Considerations Objectives Return Objectives Risk Mitigation Governance Structure Defined Roles for Decision-Making Bodies Appendix A...»

«Centre for Philosophy of Natural and Social Science ABSTRACTS OF PAPERS Note that in many instances the following abstracts summarise works that are in various stages of completion. Please do not quote without consulting the author(s). Claus Beisbart How to Make a Difference – Measures of Voting Power Revamped Claus Beisbart and Luc Bovens Voting power (i-power) measures the extent to which a vote can make a difference to the outcome of a collective decision. And a voter has the opportunity...»

«Copyrighted Material CHAPTER I Progress and the Enlightenment’s Two Conflicting Ways of Improving the World T hat notions concerning “progress,” “improvement of society,” and what one now-forgotten radicalminded novelist of the 1790s termed the “amelioration of the state of mankind” were central to the Enlightenment is scarcely surprising.1 Four out of six of the Enlighten­ ment’s philosophical founding figures—Descartes, Hobbes, Spinoza, and Bayle—held that most...»





 
<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.