FREE ELECTRONIC LIBRARY - Abstracts, online materials

Pages:     | 1 ||

«FT-MPI: Fault Tolerant MPI, supporting dynamic applications in a dynamic world Graham E. Fagg and Jack J. Dongarra Department of Computer Science, ...»

-- [ Page 2 ] --

Current MPI debuggers and visualization tools such as totalview, vampir, upshot etc do not have a concept of how to monitor MPI jobs that change their communicators on the fly, nor do they know how to monitor a virtual machine. To assist users in understanding these the author has implemented two monitor tools. Hostinfo which displays the state of the Virtual Machine. Cominfo which displays processes and communicators in colour coded fashion so that users know the state of an applications processes and communicators. Both tools are currently built using the X11 libraries but will be rebuilt using the Java SWING system to aid portability. Example Lecture Notes in Computer Science displays during a SHRINK communicator rebuild operation is shown in figures 2 to 4.

Fig. 2. Cominfo display for a healthy three process MPI application. The colours of the inner boxes indicate the state of the processes and the outer box indicates the communicator state.

Fig. 3. Cominfo display for an application with an exited process. In this case the rank 1 process has exited. Note the communicator is maked as having an error and that the number of processes and size of the communicator are different.

–  –  –

7. Conclusions FT-MPI is an attempt to provide application programmers with different methods of dealing with failure within MPI application than just check-point and restart. It is hoped that by experimenting with FT-MPI, new applications methodologies and algorithms will be developed to allow for both high performance and the survivability required for the next generation of terra-flop and beyond machines.

FT-MPI in itself is already proving to be a useful vehicle for experimenting with selftuning collective communications, distributed control algorithms and improved sparse data handling subsystems, as well as being the default MPI implementation for the HARNESS project.

8. References

1. Beck, Dongarra, Fagg, Geist, Gray, Kohl, Migliardi, K. Moore, T. Moore, P. Papadopoulous, S. Scott, V. Sunderam, "HARNESS: a next generation distributed virtual machine", Journal of Future Generation Computer Systems, (15), Elsevier Science B.V., 1999.

2. G. Stellner, “CoCheck: Checkpointing and Process Migration for MPI”, In Proceedings of the International Parallel Processing Symposium, pp 526-531, Honolulu, April 1996.

3. Adnan Agbaria and Roy Friedman, “Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations”, In the 8th IEEE International Symposium on High Performance Distributed Computing, 1999.

4. Graham E. Fagg, Keith Moore, Jack J. Dongarra, "Scalable networked information processing environment (SNIPE)", Journal of Future Generation Computer Systems, (15), pp.

571-582, Elsevier Science B.V., 1999.

5. Mauro Migliardi and Vaidy Sunderam, “PVM Emulation in the Harness MetaComputing

Pages:     | 1 ||

Similar works:

«APR 04 2005 REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 Public reporting burden tor this collection ot information is estimated to average 1 hour per response, including the time tor reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this...»

«LESSON 39 “For the Perfecting of the Saints” Ephesians OVERVIEW: The dispensation of the fullness of times. Jesus Christ is our cornerstone. Unity between husband and wife and between parents and children. Putting on the new man and the whole armor of God. Paul had labored about 30 years and was under house arrest. Nero was Caesar. Ephesians is one of the letters of imprisonment, probably about 61-62AD. It is Paul’s summation of the plan of salvation, encompassing the premortal, mortal,...»

«Incorporating any amendments approved at subsequent Board meetings Minutes of Board Meeting of 25 February 2015 Present Provost (Dr P J Prendergast), Vice-Provost/Chief Academic Officer (Professor L Hogan), Bursar (Professor G J Lacey), Senior Lecturer/Dean of Undergraduate Studies (Professor G Martin), Registrar (Professor S P A Allwright), VicePresident for Global Relations (Professor J Hussey), Dr O Braiden, Mr F Cowzer, Professor W J Dowling, Professor S Draper, Professor E Drew, Mr D...»

«3 Genome Informatics 17(2): 3{13 (2006) Hybrid Gibbs-Sampling Algorithm for Challenging Motif Discovery: GibbsDST Kazuhito Shida shida@cir.tohoku.ac.jp TUBERO (Tohoku University Biomedical Engineering Research Organization), Sendai 980-8575, Japan )IJH=?J The diculties of computational discovery of transcription factor binding sites (TFBS) are well represented by (l, d) planted motif challenge problems. Large d problems are dicult, particularly for prole-based motif discovery algorithms....»

<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.