FREE ELECTRONIC LIBRARY - Abstracts, online materials

Pages:     | 1 |   ...   | 13 | 14 || 16 | 17 |   ...   | 18 |

«Efficient, scalable consistency for highly fault-tolerant storage GARTH GOODSON August 2004 CMU-CS-04-111 Dept. of Electrical and Computer ...»

-- [ Page 15 ] --

The PMD service is one part of a complete system, the storage service and the application server complete the system. In the case of a file server application, there is much flexibility in how the metadata objects are used to provide file services. For example, locks, access privileges, and client caching of stored files involve the PMD service and the storage service. This subsection briefly describes the selection of a storage service.

The interface and access protocol used by the storage service is independent of the protocol that underlies the metadata service. For example, the storage service may use either a block based (e.g., iSCSI or Fibre Channel) or an object based protocol to access storage. However, some coordination is required in the design of the metadata objects and the interfaces provided by the storage service. For example, if objects (i.e., files in a flat ·

5.5 Storage-system implementation 111 namespace rather than blocks) are exported by the storage service, the metadata objects need not implement inodes or other structures to track block allocations.

Although we reject partitioning the namespace as a means to scale the metadata service, partitioning data for the storage service is reasonable. A storage system need not provide any guarantees about operations performed across multiple data objects; as such, partitioning is an appropriate technique for stored data. Partitioning allows different files to have different performance and reliability properties (e.g., /tmp need not be highly replicated).


The PASIS read–write protocol, described in Chapter 3, underlies our storage service. It provides block granularity read/write access to objects. The R/W protocol provides strong consistency (linearizability of block read/write operations) and fault-tolerance of erasure coded data (e.g., data encoded with Rabin’s information dispersal [Rabin 1989]). A PS service, implemented using the R/W protocol, can be relied upon to serialize all accesses to stored data. Such an approach is suitable for an application that controls concurrency itself. Alternately, locks could be provided by the PMD service so that the application does not need to provide concurrency control. Our PS service implementation also uses CVFS as its backing store; storage-nodes can either run collocated with metadata-nodes or not.

Another option is to use the R/CW protocol. The R/CW protocol offers stronger consistency semantics in that writes not based on the most current version will be rejected.

This has the nice property that the application server can implement very weak cache consistency (since writes based on stale reads will be rejected by the storage service).

However, these semantics come at an increased cost in terms of N and space-efficiency.

Instead, the versioning capability of the R/W protocol could be used to provide strong consistency across the entire file system through the use of immutable files. Clients that write files through the PS service could be required to update the associated metadata attribute object with the version of the latest completed write operation; thus ensuring · 112 Efficient, scalable consistency for highly fault-tolerant storage consistency between the data and the attributes. However, this requires all data transfers to be serialized through the metadata.

5.6 Evaluation 5.6.1 Experimental setup All experiments are performed on a rack of 30 Intel P4 2.66 GHz machines with 1 GB of memory. Each computer has two 33.6G Seagate Cheetah 10K RPM SCSI disk drives and an Intel Gb Ethernet NIC. The computers are connected with a 24-port Gb switch.

Debian testing Linux kernel 2.4.22 is installed.

Many experiments use NFS servers as clients to the PMD service, while others communicate directly to through the PMD library interface. Multiple NFS servers are able to access the same PMD service simultaneously. The NFS servers are mounted via loopback on the same machines as the NFS client. The NFS servers implement the NFSv3 protocol. The NFS servers use buffer cache of 128 MB. Unless otherwise specified, the buffer cache is write-through and data is expired after 10 seconds. No attributes or metadata is cached by the NFS servers.

The storage-nodes use CVFS as the backing data store. Each storage-node has a frontend that communicates with CVFS over IPC. Each CVFS instance uses a 512 MB buffer cache. All experiments show results using write-back caching at the storage nodes, mimicking availability of 16 MB of non-volatile RAM. This allows us to focus experiments on the overheads introduced by the protocol and not those introduced by the disk subsystem.

5.6.2 Cryptography performance

–  –  –

Directory objects are implemented with 4 KB blocks (the large objects optimization).

On update operations, MD5 hashes of modified blocks are taken. Such hashes take 24.4 µs to generate over a full block. However, the hash is only taken over the utilized portion of the block.

5.6.3 PMD micro-benchmarks This subsection describes a number of micro-benchmarks performed against the PMD service. No file data is involved for any of these experiments, they only test the PMD service. The first set of experiments examine NFS micro-benchmarks. The second set examines the impact of concurrency on PMD create operations and the third set examines the impact of a fault on the response time distribution of a run of create operations.

PMD NFS micro-benchmarks

All the PMD operations described in Table 5.1 have been implemented. Most of the NFSv3 operations map to corresponding PMD service operations, although a few require multiple PMD operations. NFS micro-benchmarks were performed against some of the NFS metadata operations. The mean response times for these operations are listed in Table 5.2. The PMD service was configured to tolerate one Byzantine fault; therefore six storage nodes were used. In addition to the end-to-end response time for the PMD service, the response times as observed by the PMD storage-nodes are also listed.

The response time for the create operation represents a create that occurs within a directory comprised of a single block. Due to the implementation of directory objects, a linear search is performed to ensure the name being inserted into the directory does not exist; thus, the larger the directory is in size, the longer the create takes. Likewise, the performance of the readdir operation is also dependent upon the size of the directory (since each directory block is being transmitted back to the client). Thus, two results for readdir are shown: one for a small directory containing 5 entries and one for a large directory containing 500 entries.

· 114 Efficient, scalable consistency for highly fault-tolerant storage

–  –  –

Concurrency The impact of concurrency is examined in the context of PMD create operations. Three graphs show the results of performing PMD create operations with varying degrees of concurrency with t = 1, b = 0, N = 4. Two clients simultaneously perform create operations within a set of shared directories. Recall, create is a multi-object operation. In this experimental setup, the parent directory is the source of concurrency. To increase the likelihood of concurrency the set of shared directories is decreased between each run. In each run, each client randomly picks a directory to use as the parent by the create operation.

Each client has only one outstanding request at a time. Care was taken to fully overlap the execution of both clients.

The first graph, Figure 5.7(a), shows the mean response time and standard deviation as concurrency is increased. The “None” bar represents a run with sixteen directories and only one client (i.e., there is no concurrency). It is not surprising that as the amount of concurrency increases, so does the response time as does the standard deviation. As concurrency is increased, repair and barrier operations become more common, as do the number of operations that must be retried due to stale object histories. Even at high concurrency levels (two clients sharing two directories), the mean response time and standard deviations are within a factor of two or three of a run with no concurrency.

The second graph, Figure 5.7(b), shows the total number of barrier and repair operations attempted at each concurrency level. These counts are normalized to the total number of create operations performed. Again, as concurrency is increased, the number · 5.6 Evaluation 115

–  –  –

Figure 5.7: Concurrency experiments.

These three graphs show the results of performing create operations with varying degrees of concurrency. Two clients simultaneously perform create operations into a fixed set of directories. For each operation each client randomly picks a parent directory from the directory set. To increase the degree of concurrency the directory set size is decreased between each run. (a) Shows the mean response time and its standard deviation as concurrency is increased. (b) Shows the total number of barrier and repair operations performed at each concurrency level normalized to the number of create operations issued by the client. (c) Shows the total number of times an operation was rejected due to stale object histories, again normalized to the number of create operations.

of barrier and repair operations also increase. It is interesting to note that there is a large step increase from the very low concurrency configurations (“None” and “16”) to the higher concurrency levels (“4” and “2”).

The third graph, Figure 5.7(c), shows the number of total operations (including repairs and barriers) that are rejected due to stale object histories. Recall, the client caches replica histories returned by recent operations to construct the object history set for a subsequent update operation. As well, every operation returns an updated replica history (even if that operation failed). Examining the steep increase in stale object histories at the very high concurrency levels, one notices that there is often a race between the two clients trying to repair the same parent directory. Both clients try to write barriers, but neither client quite succeeds in completing a barrier (since the barrier is rejected from a subset of the nodes because the other client just wrote its barrier there). This observation requires the designer to carefully consider the back-off and repair policies when using optimistic protocols that can result in livelock. More work is required in this area.

· 116 Efficient, scalable consistency for highly fault-tolerant storage

Tolerating faults

This experiment shows the impact of a fault occurring at a random point during a run of create operations. The system was configured with t = 1, b = 0, N = 4 and a single client that performed unique create operations continuously into one of sixteen directories (picked at random). The client had only a single request outstanding, so there is no concurrency. Each run lasted for 10 seconds. Randomly during each fault-induced run, one of the storage-nodes was killed. Seven fault-induced runs were performed. There were no correctness problems present in any of the fault-induced runs. The remainder of this subsection quantifies the performance consequences of running with a failed server.

Figure 5.8 shows the mean response time and standard deviation of a single fault-free run and the accumulation of the response times from the seven fault-induced runs.

The mean response time for the fault-free run is 2.17ms with a standard deviation of 0.34ms.

The mean response time across all fault-induced runs is 2.47ms with a standard deviation of 0.35ms. Although the standard deviations of the fault-free and the set of fault-induced runs are similar, the standard deviation of each individual fault-induced run was between

0.52ms and 0.66ms. In general, fault-induced runs with a higher mean response time also had a higher standard deviation. Runs with higher mean response times also generally have a larger number of outlier response times (response times 3ms).

In a fault-free run, the client only waits for the fastest N − t responses. Once a failure has occured, the client still waits for N − t responses, but there are now only N − t servers, so it is waiting for all responses (rather than the fastest subsets). Thus, variations in individual storage-node response times are not masked. This accounts for the observed increases in the averages and standard deviations of response times within the fault-induced runs.

5.6.4 PMD service macro-benchmarks

–  –  –

Figure 5.8: Response time distributions of a fault-free and multiple fault-induced runs.

This figure shows the mean and standard deviation of response times for a fault-free run and multiple fault-induced runs for the create operation. At a random point during each of the other runs, the PMD process on one of the storage-nodes is killed. As can be seen, the mean response time for the fault-induced runs is higher than in the fault-free run. Although the standard deviations are almost the same (between the fault-free and the all fault-induced runs), the standard deviation in a single fault-induced run is higher than the standard deviation in the fault-free run.

mation about the performance of the PMD service. Postmark was designed to measure the performance of a file system used for electronic mail, netnews, and web based services. Postmark is comprised of two phases: (i) in the creation phase, it creates a large number of small randomly-sized files (between 512 B and 9 KB); and, (ii) in the transaction phase, it performs a specified number of transactions. Each transaction consists of two sub-transactions, with one being a create or delete and the other being a read or append. Three Postmark configuration parameters are important to our experiment: files (determines the creation phase), transactions (determines the transaction phase), and directories (determines degree of access contention). Results from Postmark experiments are given in transactions per second over both phases.

· 118 Efficient, scalable consistency for highly fault-tolerant storage

PMD service base throughput

The first experiment determines the maximum throughput of the PMD service. The PMD

Pages:     | 1 |   ...   | 13 | 14 || 16 | 17 |   ...   | 18 |

Similar works:

«Magneto-hydrodynamics simulation in astrophysics by Bijia Pang A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Physics University of Toronto Copyright c 2011 by Bijia Pang Abstract Magneto-hydrodynamics simulation in astrophysics Bijia Pang Doctor of Philosophy Graduate Department of Physics University of Toronto Magnetohydrodynamics (MHD) studies the dynamics of an electrically conducting fluid under the influence of a...»

«Perceived Experiences of Persons with Visual Impairment who have transitioned from School into Work A Case Study of Five Persons with Visual Impairment that have Remunerable Employment Fred Nsumikambi MUGABI Master’s Thesis Master of Philosophy in Special Needs Education Department of Special Needs Education Faculty of Educational Sciences UNIVERSITY OF OSLO Spring 2012 II Perceived experiences of Persons with Visual Impairment who have transitioned from School into Work A Case Study of five...»

«Kierkegaard & Nietzsche: Two Different Passions Nothing fascinates man more than man himself. Of all the problems he has had to face, none have been more elusive and hard to solve than the problem of what it means to be human. It seems that no two philosophers have the same answer. It would not be hard to believe that no answer exists, but that does not stop men from seeking one. The philosophy that seems best suited for the task is existentialism, which primarily concerns itself with the...»


«Introduction: Film-Philosophy and a World of Cinemas David Martin-Jones, University of Glasgow (David.Martin-Jones@glasgow.ac.uk) This Special Section arose out of the 2014 Film-Philosophy conference, held at the University of Glasgow, which took as its theme: A World of Cinemas. The aim of both the conference theme and this resulting Special Section is to broaden the debate in film-philosophy, both in terms of a world of cinemas and a world of philosophies. This introduction considers the...»

«de vrijwillige slavernij Étienne de La Boétie Vertoog oVer de Vrijwillige SlaVernij vertaald door Charlotte Bauwens met een inleiding van dr. Jos Verhulst en een nawoord door dr. Frank van Dun Democratie.Nu vzw 2007 | www.democratie.nu Ontwerp cover en lay-out: Michaël Bauwens Dit werk heeft een Creative Commons Attribution-NonCommercialNoDerivs 2.0 Licentie. Dit betekent dat u de teksten vrij mag gebruiken, zolang u de bron en de auteur vermeldt, de tekst intact laat en er geen commercieel...»

«Ethics and Imperialism in Livy by Joseph Viguers Groves A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Classical Studies) in the University of Michigan Doctoral Committee: Professor David Potter Professor Sara Forsdyke Professor Katherine French Associate Professor Celia Schultz Professor Nicola Terrenato © Joseph Groves For my grandparents ii Acknowledgments Finishing this project would never have been possible without the support...»

«NEW PUZZLES ABOUT DIVINE ATTRIBUTES Forthcoming in European Journal for Philosophy of Religion MOTI MIZRAHI St. John's University, New York Abstract: According to traditional Western theism, God is maximally great (or perfect). More explicitly, God is said to have the following divine attributes: omnipotence, omniscience, and omnibenevolence. In this paper, I present three puzzles about this conception of a maximally great (or perfect) being. The first puzzle about omniscience shows that this...»


«Precision VLBI astrometry: Instrumentation, algorithms and pulsar parallax determination Adam Travis Deller Presented in fulfillment of the requirements of the degree of Doctor of Philosophy January 2009 Faculty of Information and Communication Technology Swinburne University Abstract This thesis describes the development of DiFX, the first general–purpose software correlator for radio interferometry, and its use with the Australian Long Baseline Array to complete the largest Very Long...»

«PARENT HANDBOOK OPEN MONDAY THROUGH FRIDAY HOURS: MONDAY THROUGH THURSDAY 9 AM TO 2 PM AND FRIDAY 9 AM TO 12 PM TEACHING CHILDREN 18 MONTHS TO 5 YEARS OF AGE TABLE OF CONTENTS • Parent Handbook Philosophy and Goals Accreditation and Licensing Enrollment Requirements Parents, thank you for your interest in St. Classroom Assignments Martin’s Episcopal School. We understand • Tuition (Tuition & Late Fee Policies) that enrolling your child at a new school can Communication be a confusing...»

«THE PHILOSOPHIC PRACTITIONER John Tribe Abstract This paper develops principles for the ordering of the curriculum for tourism higher education. The framework proposed comprises four key domains of vocational action, vocational reflection, liberal reflection and liberal action. The framework enables the problems of curricula that are over-focused in one part of curriculum space to be surfaced. It also enables the case to be made, and the content outlined, for a tourism higher education which...»

<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.