WWW.ABSTRACT.DISLIB.INFO
FREE ELECTRONIC LIBRARY - Abstracts, online materials
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 14 | 15 || 17 | 18 |

«Efficient, scalable consistency for highly fault-tolerant storage GARTH GOODSON August 2004 CMU-CS-04-111 Dept. of Electrical and Computer ...»

-- [ Page 16 ] --

service is configured to tolerate a single benign metadata-node fault and Byzantine clients:

t = 1, b = 0, and N = 4. Postmark is run on an increasing number of NFS servers to determine the maximum throughput of the PMD service in this configuration. To ensure that the PMD service is as loaded as possible, the NFS server uses local storage for the storage service. Postmark is designed to benchmark a single NFS server. However, it is being used to benchmark a decentralized service behind an NFS interface. As such, we “scale” the number of files, transactions, and directories for each Postmark/NFS server down, as we scale the number of Postmarks/NFS servers up. This is done to maintain a consistent working set across runs. Each NFS server runs Postmark in a different directory of the PMD service. The working set fits within the cache on the metadata-nodes.

Figure 5.9 shows the throughput of the PMD service with up to 16 distinct NFS servers.

For a single NFS server, Postmark is configured for 32768 transactions, 1024 files, and 64 directories. Each NFS server has a single Postmark benchmark run against it. The Postmark configuration is scaled down as the number of clients is scaled up, thus keeping the working set size the same. For example, with 16 NFS servers, Postmark scales down to 2048 transactions, 64 files, and 4 directories. The PMD service saturates just below 350 transactions per second.

Scaling fault-tolerance

–  –  –

Figure 5.9: Postmark throughput vs.

client load. This graph compares the total system throughput of a Postmark workload as the number of NFS servers (PMD clients) increase in fault-free operation. Each client runs a single instance of Postmark against a NFS server mounted via loopback on the same machine. The sets of bars represent a configuration with b = 0,t = 1, N = 4 and b = 1,t = 1, N = 6.

tolerated is scaled from b = t = 1 to b = t = 3. Figure 5.10 demonstrates that as the number of failures tolerated scales, the responsiveness of the PMD service is fairly flat for the benign configuration and degrades only moderately for the Byzantine configuration. This degradation is expected, for a number of reasons. First, more cryptography is being performed by metadata-nodes (e.g., for b = 3 authenticators are comprised of 16 entries, since N = 5b + 1). Second, all storage-nodes are being communicated with, thus the communication costs grow as N increases. Additionally, this experiment shows the performance cost of a fully Byzantine-tolerant system is not prohibitive (at least for low values of b).

· 120 Efficient, scalable consistency for highly fault-tolerant storage

–  –  –

Figure 5.10: Postmark run time vs.

total failures tolerated (t). This graph compares the runtime of a Postmark workload as the number of tolerated faults (t) is scaled upward. The two lines represent a wholly crash environment with b = 0, while the other represents a wholly Byzantine environment with b = t.

Scaling throughput using threshold quorums

Recall, Section 4.5 describes techniques that can be used to scale the system’s throughput by adding storage-nodes. For example, the smallest configuration with t = 1, b = 1 is N = 6, COMPLETE = 5, and INCOMPLETE = 3. It is possible to increase the bounds on the R/CW constraints in the following way: by adding 3∆ to N, 2∆ must be added to QC, COMPLETE, and INCOMPLETE. Thus, as ∆ increases the lower bound on the load of the system is 2.

–  –  –

Table 5.3: Threshold quorum experiment parameters (b = t = 1).

This table shows the derived parameters when using theshold quorums. The first column shows ∆. The second column shows the value of N. The third column shows the size of each threshold quorum (|Quorum| = QC + b).

The fourth column gives the quorum load of each storage-node (i.e., the fraction of operations for which a storage-node must execute a request). Lastly, the fifth column shows the calculated system throughput normalized to the throughput of ∆ = 0.

–  –  –

shows the expected system throughput normalized to ∆ = 0.

Figure 5.11 shows the throughput of the PMD service when using a threshold quorum construction, as ∆ increases.

The throughput is normalized to the throughput obtained at ∆ = 0. Two curves are plotted on the graph. The first line shows the calculated throughput (see column 5 in Table 5.3). The second line shows the maximum throughput attained by running a heavy weight synthetic update operation containing a 4 KB argument and a 10 ms storage-node think time.

To measure the throughput, many clients, each with multiple outstanding queries or updates are employed. The reported throughput measurements are for a saturated system (i.e., adding more clients does not increase throughput) We employ an access strategy based on a deterministic function of the object ID. Such an access strategy results in updates for a given object preferentially accessing the same quorum (i.e., the preferred · 122 Efficient, scalable consistency for highly fault-tolerant storage 1.2 1.15

–  –  –





Figure 5.11: Throughput of threshold quorum system vs.

∆. This graph shows the normalized system throughput as ∆ is increased. Throughput is normalized to that of ∆ = 0. See Table 5.3 for a description of the relationship between ∆, N, and throughput. Two curves are shown. The first line shows the calculated throughput as described in Table 5.3. The second line shows the throughput attained when running a 4KB update operation.

quorum). Accessing a service that is implemented by an ensemble of Q/U objects, via each objects preferred quorum, approximates a traditional quorum access strategy. As can be seen, the synthetic update line closely follows the calculated throughput curve.

An additional experiment was run using a recursive threshold construction [Malkhi et al. 1997]. For the recursive threshold construction (with t = b), N = (5b + 1)∆+1 and |Quorum| = (4b + 1)∆+1 (i.e., ∆ indicates recursion depth). With t = b = 1 (∆ = 1, N = 36, |Quorum| = 25), the achieved throughput, normalized to ∆ = 0, was 1.19 versus 1.20 for the normalized theoretical throughput.

5.6.5 PASIS file system: SSH-build

–  –  –

the PMD service operating in unison with the PS service. This benchmark consists of three phases: unpacking the OpenSSH archive, running configure, and compiling the OpenSSH binaries. The unpack phase stresses metadata operations on files of varying sizes by uncompressing and untaring the OpenSSH (v3.8p1) tar archive. The configure phase consists of the automatic generation of header files and Makefiles, which involves building various small programs that check the existing system configuration. The build phase compiles, links, and removes temporary files. This last phase is the most CPU intensive, but it also generates a large number of object files and a few executables. Figure 5.12 shows the runtime of the SSH-build benchmark for four configurations.

None of the NFS configurations use the synchronous mount option. However, all NFS configurations use a 128 MB write-back cache with no data expiration time. The first set of bars show a user-level NFSv3 server that stores files in the local ext3 file system.

The second set of bars show the performance of the same user-level NFSv3 server just described, however the data is stored across the network to a single CVFS storage-node.

This shows the overhead of using CVFS across an additional network link. However, · 124 Efficient, scalable consistency for highly fault-tolerant storage in this configuration metadata is treated as data (i.e., attributes and directory entries are stored within data blocks). The third set of bars show the cost of using the PASIS metadata service (with t = b = 1, N = 6) storing file data in the local file system. The fourth, and last, set of bars show the SSH-build benchmark run against the complete PASIS metadata and storage service. Both the metadata and storage service are configured with t = b = 1;

N = 6 for the metadata service and N = 5 for the storage service. As can be seen there is almost a 2x performance difference between an user-level NFS server with no faulttolerance and an NFS server backed by a Byzantine fault-tolerant metadata and storage service that provides strong consistency. The majority of the overhead is due to the extra communication required (11 storage-nodes in the PMD+PS case vs. 1 without), as well there is a non-zero cost in our CVFS and storage-node implementations.

5.7 Summary

This chapter describes the PASIS metadata (PMD) service. It uses a novel quorum-style query/update (Q/U) protocol to provide horizontal scalability for metadata, as is enjoyed for data in scalable storage systems. The PMD service extends the read/conditional write protocol, described in Chapter 4, to support more general query and update operations.

These operations provide access to objects at a finer-granularity than do block-based protocols (e.g., reading/inserting directory entries vs. reading/writing full directories). In addition, atomic updates across multiple objects are supported.

Similar to the other protocols developed thus far, the Q/U protocol uses optimism and versioning to achieve efficiency while tolerating asynchronous communications and Byzantine failures of clients and servers. Experiments with a decentralized NFS file service demonstrate feasibility and efficiency. As well, performance under concurrency and faults is examined. Experiments also show that threshold quorum constructions can be used to significantly increase throughput without requiring the partitioning of the metadata service.

6 Conclusions and Future Work

6.1 Conclusion

This thesis has demonstrated a novel approach to achieving scalable, highly fault-tolerant storage systems by leveraging a set of efficient and scalable, strong consistency protocols enabled by storage-node versioning. These consistency protocols achieve efficiency and scalability via a combination of optimistic operation, versioning, and quorum-style redundancy.

Three consistency protocols have been developed that offer varying semantics useful for building different components within a survivable, decentralized storage-system.

The first protocol, the read/write protocol (R/W), provides read–write semantics of full data blocks. This protocol is suitable as the basis for the data storage component within a survivable storage system, since most block based data services expect whole block updates.

The second protocol, the read/conditional-write (R/CW) protocol, provides read– modify–write semantics of full data blocks. While this protocol also assumes blocks (or data objects) are read and written as atomic units, it offers stronger consistency guarantees. These semantics guarantee that the data region has not been modified between a read and a successive write operation to the same data region.

The third protocol, the query/update (Q/U) protocol, extends the R/CW protocol to more fully support the semantics required by metdata. In order to preserve the consistency of metadata, metadata objects (e.g., directories) require update operations that modify existing contents (such as inserting a new directory entry), rather than overwriting their · 126 Efficient, scalable consistency for highly fault-tolerant storage previous contents. As well, metadata usually requires atomic update operations across multiple metadata objects (e.g., when performing a rename, or moving files). The Q/U protocol provides for the serializability of multiple, arbitrary operations through the use of replicated state machines.

These protocols were developed in detail, evaluated individually, and used as a basis for building a fault-tolerant, scalable storage-system. Results show that the PASIS file system configured to tolerate one Byzantine fault is within a factor of two in the response time unpacking, configuring, and building OpenSSH as compared to an unreplicated userlevel NFS server. The storage service component of the file system, using the R/W protocol, was shown to scale well in terms of both throughput and response time as number of faults tolerated is scaled up. As well, it performs well when compared to a Byzantine fault-tolerant agreement protocol (BFT) and by offloading work from storage-nodes to clients increases its scalability. Results also show that the PASIS metadata service, using the query/update protocol, scales with as the number clients is increased and reponse time increases slightly as the number of faults tolerated is scaled up. Additionally, the use of quorum thresholds enables the system’s throughput to scale close to its theoretical bounds and it is expected that other quorum constructions can further increase the system’s scalability.

6.2 Contributions This main contribution of this thesis is the design and evaluation of three consistency

protocols that have been enabled by versioning storage-nodes. These contributions are:

(1) The development and demonstration of a read/write block storage consistency protocol that enables highly fault-tolerant storage through the use of erasure coded data and versioning storage-nodes. Its correctness is shown through proof sketches.

–  –  –

offs between tolerating Byzantine clients and erasure coding, as well as tradeoffs between tolerating Byzantine storage-nodes and liveness have been discussed.

(3) The extention of the read/conditional-write block protocol (in the query/update protocol) to support operations on multiple, arbitrary objects and the implementation of a scalable metadata service based upon the query/update and the read/write protocol.

(4) The evaluation of a distributed file system that utilizes the scalability and faulttolerance of the developed consistency protocols in terms of the number of faults tolerated, the maximum throughput the system can sustain, and its performance in degraded operation modes (i.e., with concurrency and faults).

6.3 Future directions



Pages:     | 1 |   ...   | 14 | 15 || 17 | 18 |


Similar works:

«Journal of Reformed Theology 7 (2013) 181-203 brill.com/jrt Simply Impossible: A Case against Divine Simplicity R. T. Mullins University of Notre Dame, South Bend, IN, USA rtmullins@gmail.com Abstract Within contemporary philosophical theology the doctrine of divine simplicity has regained attention.1 The pertinent literature has increased by several new defenses of the doctrine.2 One of the more surprising, and troubling, aspects of the contemporary defenses amongst Christian philosophers and...»

«Degrees of Causation Matthew Braham Faculty of Philosophy, University of Groningen, The Netherlands m.braham@rug.nl Martin van Hees Faculty of Philosophy, University of Groningen, The Netherlands Martin.van.Hees@rug.nl (April 15, 2008) Abstract The primary aim of this paper is to analyze the concept of degrees of causal contribution for actual events and examine the way in which it can be formally defined. This should go some way to filling out a gap in the legal and philosophical literature...»

«th ANNUAL REPORT 11PART I UNIVERSITY OF DELHI 26 th September, 2012 PREFACE I am happy to present the 89th Annual Report of the University of Delhi for the year ending 31st March, 2012. The Report consists of two parts. Part-I contains the information about University’s general infrastructure including Faculties, Departments, Centres and Colleges while Part-II carries Information and Data. The Editorial Board, which prepared this Report, comprised of the following members:Prof. Malashri Lal...»

«Non-Agential Permissibility in Epistemology∗ Luis R. G. Oliveira This is a preprint of an article whose final and definitive form will be published in the Australasian Journal of Philosophy. The Australasian Journal of Philosophy is available online at: http://www.tandf.co.uk/journals/ Abstract Paul Silva has recently argued that doxastic justification does not have a basing requirement. An important part of his argument depends crucially on the assumption that doxastic and moral...»

«How the eastern aesthetics affect the British potters’ works Yixin Lin Course: MA Designer Maker Email: linyixin000000@gmail.com Abstract: In the start of twentieth-century, as the British ceramic precursor Bernard Leach introduced the Japanese aesthetics of pottery and the new ideas to appreciate beauty, there is an arising study of Eastern aesthetics and philosophies of the British potters. Specially, the „Zen Art‟ which provided „timeless and simplistic‟ aesthetic value. This essay...»

«THE KENYON REVIEW Vol. XVII W I N T E R, 1955 No. 1 Walter Kaufman NIETZSCHE AND RILKE T HIS STUDY of Nietzsche and Rilke, and particularly of what they have in common, is meant to throw light on both and also on the relation of philosophy and poetry. To those who are used to comparisons of Nietzsche with Nero and of Rilke with St. Francis any insistence on common elements will come as a surprise; but it may also disabuse them of some misapprehensions. There are, of course, obvious differences...»

«The Ethics of Integrity: Educational Values Beyond Postmodern Title Ethics Author(s) Mason, MB Citation Journal of Philosophy of Education, 2001, v. 35 n. 1, p. 47-69 Issued Date 2001 URL http://hdl.handle.net/10722/54293 Journal of Philosophy of Education. Copyright © Blackwell Rights Publishing Ltd. The Ethics of Integrity: Educational Values Beyond Postmodern Ethics Mark Mason ABSTRACT I address the problems of diminished moral responsibility and of moral relativism, typically associated...»

«AN APPROACH TO INTEGRATE LIGHTING CONCEPTS INTO INTERIOR DESIGN STUDIOS: A CONSTRUCTIVIST EDUCATIONAL FRAMEWORK A THESIS SUBMITTED TO THE DEPARTMENT OF INTERIOR ARCHITECTURE AND ENVIRONMENTAL DESIGN AND THE INSTITUTE OF FINE ARTS OF BİLKENT UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN ART, DESIGN, AND ARCHITECTURE By Mehmedalp Tural January 2006 I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in...»

«TRANSCRIPT OF DEVELOPMENT DRUMS [EPISODE 29 – TOBY ORD: GIVING WHAT WE CAN] Host: Owen Barder. Guest: Toby Ord Listen to the podcast: http://developmentdrums.org/484 Owen Barder Thanks for downloading Development Drums. My name is Owen Barder at the Center for Global Development. And my guest today is Toby Ord, British Academy Postdoctoral Fellow in Philosophy at Balliol College. He is also a Research Associate at two of Oxford’s research centers, the Uehiro Centre for Practical Ethics and...»

«J Value Inquiry DOI 10.1007/s10790-010-9240-2 BOOK REVIEW Thomas Scanlon, Moral Dimensions: Permissibility, Meaning, Blame Harvard University Press, 2008, 246 pp., $25.60 (hbk), ISBN 9780674031784 Kevin Vallier Ó Springer Science+Business Media B.V. 2010 In Moral Dimensions: Permissibility, Meaning, Blame, Thomas Scanlon challenges moral philosophers with a subtle analysis of how permissibility, meaning and blame are to be understood. Scanlon’s challenge is significant not only because he...»

«Hand, David John (1988) Regulation of curd initiation in the summer cauliflower. PhD thesis, University of Nottingham.Access from the University of Nottingham repository: http://eprints.nottingham.ac.uk/27674/1/384336.pdf Copyright and reuse: The Nottingham ePrints service makes this work by researchers of the University of Nottingham available open access under the following conditions. This article is made available under the University of Nottingham End User licence and may be reused...»

«Communication as Symbiogenesis On the Relationality of Mobile Phoning in Korea Namsook Park Thesis submitted for the Degree of Doctor of Philosophy Centre for Cultural Studies Goldsmiths College University of London Declaration I declare that the work presented in this thesis is entirely my own. Namsook Park London, August 2011 Abstract This study understands communication as parasitic and symbiogenetic. It recognizes an object or technology no less and no more important than a subject, and...»





 
<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.