FREE ELECTRONIC LIBRARY - Abstracts, online materials

Pages:     | 1 || 3 | 4 |   ...   | 8 |

«Real-time systems are one of the most important applications of computers, both in commercial terms and in terms of social impact. Increasingly, ...»

-- [ Page 2 ] --

The operating environment affects the charged particle flux. On the ground, in a benign environment, the most significant source of alpha particles is probably the small amount of radioactivity that is naturally present in the casing of the chip. In space applications, the particle flux can be much higher. The sun is the principal source of such particles: the solar wind regularly carries a large flux (the earth’s magnetic field shields the ground from most of this). The sun is, of course, a dynamic entity: it often generates solar flares, which may involve the ejection in short order of many tonnes of charged particles. When especially severe solar flares occur and where the earth’s magnetic field is not such a strong shield (near the poles), a significant increase in particle flux is seen even on the ground, affecting ground-based systems. For systems in space, the flux is far greater and the rate of transient failures goes up proportionately. Some relief can be found in shielding; however, even with such precautions, failure rates are high [Petersen, 1997].

One final piece of terminology is appropriate to mention here. In fault-tolerance, it is common to distinguish between a fault and an error. A fault is the underlying defect; an error is the external manifestation of that defect. Faults can exist for a long time before generating an error. In some cases, a transient fault may never generate an error. For example, a transient fault occurs in a memory cell when its value is changed; if the cell is updated before the next time it is read, that fault will never have generated an error.

2.4 Fault-Tolerant Structures Hardware, time, information, and software redundancy are used for fault-tolerance [Koren and Krishna, 2007, Pradhan, 1996, Sieworek and Swarz, 1999]. Of these types, time and hardware redundancy will be of relevance to this paper.

Of the many structures used in hardware fault-tolerance, two stand out. One is the Triple Modular Redundancy (TMR) structure; the other is the Primary/Backup (PB) approach. In TMR, three processors run redundant copies of the same workload and mask errors by voting on their outputs. Since it requires triplicated hardware, TMR is expensive and used only in the most critical core of fault-tolerant systems.

In the PB approach, a processor executes code and its output is checked for correctness by an acceptance test. One common check is to see if the output is in the expected range. Another is to verify that it satisfies certain conditions: for example, to check that the square root of x has been correctly calculated, one can square the output to ensure that we get back a result close to x. (This is only practical because the squaring operation is of far lower complexity than the square-root operation.) Acceptance tests are not perfect: they may miss incorrect outputs, or falsely flag as erroneous a correct output. If the output fails the acceptance test, the execution of a backup copy of the failed task is initiated.

The principal difference between the TMR and PB approaches is that TMR entails forward error masking: multiple copies of the task are always executed. In PB, a backup copy is only executed if the acceptance test flags a failure. Theoretically, there is no limit to the number of backup copies in order to respond to multiple failures of the same task execution.

It is worth pointing out that in the PB approach, the backup copies are not always clones of the original task. Backups may instead be lighter copies of the original, producing output of lower (but still acceptable) quality. The justification for this is that the backups are only occasionally invoked; it imposes lighter constraints on the scheduling algorithm if backups take less time; also a more rudimentary implementation tends to be more reliable.

Fault-tolerant scheduling techniques generally assume the PB approach; if we use TMR, we have forward error masking and do not need to schedule backup copies.

The second type of redundancy that we need to consider is time redundancy. This is the slack that exists in a schedule between when tasks finish and their respective deadlines; this slack (during which a processor would normally be idle) can be exploited to provide a reserve of time during which to execute backup copies as necessary.

A good example of fault-tolerant structures is the Time-Triggered Architecture [Kopetz, 1997, Kopetz and Bauer, 2003]. Events are scheduled in advance and an a priori communication schedule can therefore be established. Fault-tolerance is applied hierarchically. The system consists of a number of nodes, interconnected by means of a communication network. The network is redundant for fault-tolerance; for example, if a bus structure is used, multiple buses are provided. The nodes are connected to the network through a communications interface and their communication profile is monitored by means of guardians. These guardians look for evidence of malfunction and disconnect the node from the network as needed.

Each node is a fault-tolerant unit, typically consisting of replicated processors and other hardware to ensure fault-masking [Kopetz and Millinger, 1999]. The clocks are synchronized by an algorithm that can tolerate Byzantine faults [Kopetz and Ochsenreiter, 1987]. As long as individual unit failures do not overwhelm the fault-management capabilities of the node, they will be invisible to the rest of the system.

Replica determinism [Poledna, et al., 2000] is maintained hierarchically [Kopetz and Bauer, 2003]. The fault-tolerance at the nodes ensures this at the interface between the node and the network. The Time Triggered Protocol ensures replica determinism across the network. As mentioned earlier, this supports transmissions that are scheduled a priori: messages are pulled off the interface queue and transmitted according to this schedule. The identity of a message can be discovered by reference to the very same schedule. This clearly requires effective clock synchronization across the network. The clocks can, for example, be synchronized by an algorithm that can tolerate Byzantine faults [Kopetz and Ochsenreiter, 1987]. Since even the best synchronization mechanism cannot guarantee zero clock-skew, each transmission time slot is flanked on each side by brief intervals of silence. Clearly, as long as the intervals of silence are longer than the maximum clock skew, all transmissions will be guaranteed to be seen by each receiver as occurring in the same clock-tick.

When a system using a priori communication scheduling is used, enough slack must be provided in the schedule so that the system has the time to deal with a failure and still meet its communication schedule. Another option is to provide the system with alternative communication schedules which it can invoke whenever a fault has to be dealt with [Pop, et al., 2007].

2.5 Voltage Scaling The time taken to execute a task depends on the clock rate of the processor. The clock rate, in turn, is constrained by the circuit delays. It is well known that circuit delays are inversely proportional to the supply voltage. As a result, by changing the supply voltage (within certain bounds), we can change the execution time of a task. The price we pay for this is power consumption, which is proportional to the square of the supply voltage and is linearly proportional to the clock frequency; in the aggregate, if we scale the frequency as allowed by the lower circuit delays that follow an increase in the frequency, the power consumption is a third power of the supply voltage. In real-time systems, the focus has been on how to reduce the supply voltage in order to minimize consumption while still meeting critical task deadlines [Unsal and Koren, 2003]; in fault-tolerant scheduling, briefly raising the supply voltage (within the bounds tolerable by the processor) may give the processor the boost of speed it needs to complete the backup. Keep in mind that failure is usually a comparatively rare event: occasionally spending additional energy to tide over a rare event is usually acceptable, even in power- or energy-constrained systems.

3 Basics of Fault-Tolerant Scheduling

At their foundation, all fault-tolerant scheduling procedures are similar. They aim to use time or processor redundancy to ensure that, despite the (permanent or transient) failure of a processor, the hard deadlines of critical real-time tasks will continue to be met.

The fault-tolerant scheduling approach we describe here is meant to provide the system’s first, emergency, response to a processor failure. When a transient failure occurs, such an emergency response may be all that is needed: the failed processor may soon recover and resume its duties. If the processor failure is permanent, then if the scheduling is online, with tasks being either guaranteed or rejected as they come in, the system will take account of the reduced computational resources in guaranteeing any future task arrivals. If the failure is permanent and the scheduling is offline, then the system should generate another offline schedule (complete with backup tasks), targeted at the system with one fewer processor. Fault-tolerant scheduling algorithms provide the system with a means to continue meeting deadlines until this can be completed; their role is therefore to provide the system with some breathing room before it makes any longer-term adjustments to the task assignment and scheduling that may be appropriate.

While there are some variations from one approach to another, the general method of responding to a failure is as follows:

• Transient Failures: If the system is designed only to withstand transients that go away quickly, reexecution of the failed task or of a shorter, more basic, version of that task is carried out. The scheduling problem reduces to ensuring that there is always enough time to carry out such an execution before the deadline.

• Software Failure: Here, the failure is that of the software, not of the processor. Software diversity is used: backup software which is different from the failed software is invoked.

Again, we have to make sure, in preparing for software faults, that there is enough time for the backup version to meet the original task deadline.

• Permanent Failure: Backup versions of the tasks assigned to the failed processor must be

invoked. The steps are:

– Provide each task with a backup copy.

– Place the backups in the schedule, either prior to operation for offline scheduling or before guaranteeing the task for online scheduling.

– If a processor fails, activate one backup for each of the tasks that have been affected.

In the rest of this paper, we denote by πi and Bi the primary and backup copies, respectively, of task Ti.

There are three general principles with respect to scheduling backups when permanent

failures may occur:

–  –  –

trying to protect against permanent processor failures or transients of significant duration. For approaches whose sole aim is to protect against brief transient failures or software bugs, this does not apply.

– Principle A2: The backup copies are conditionally transparent in the schedule [Krishna and Shin, 1986]; i.e., they can be overloaded [Ghosh, et al., 1997]. By this we mean that, unlike primary copies, which cannot be allowed to overlap other tasks in their schedule (after all, the processor can only execute one thing at a time; note that for multicore chips, each of the cores counts as a processor), we can overlap backups in the schedule so long as their primary copies have not been scheduled on the same processor. This is because all we are trying to protect against is a single failure. (This is like having just one spare tyre in your car: you don’t know ahead of time which one of the four “primary” tyres is going to fail, but just one spare will be enough to replace any one of them.) Figure 1 illustrates this for a three-task system.

Backups B1, B2 whose primary copies have been scheduled on different processors cannot both need to be activated to react to the same processor failure.

– Principle A3: A backup copy may not overlap any primary copy in the schedule. The reason for this should be obvious.

Throughout, unless otherwise stated, we assume that when processors fail, this is detected.

In particular, it is detected by the time any task which is currently running on that processor is scheduled to complete its execution. Further, we assume that, in a multiprocessor, one processor failure cannot induce another. Failure of the communication network is not considered, since the focus here is on rapid recovery from processor failures. Network failures can be guarded against by providing multiple paths connecting any pair of processors.

4 Transient Hardware Failure

4.1 Periodic Tasks: Static Approach The failure model we start with is that of brief transients affecting a single processor running a workload comprised of periodic tasks [Burns, et al. 1996]. It is assumed that we do not have transient failures occurring more than once every τF seconds. These tasks are independent of one another and have no precedence constraints, i.e., one does not provide input to another.

The Rate Monotonic (RM) scheduling policy is used [Liu and Layland, 1973]. RM is perhaps the best-studied of all real-time scheduling algorithms; it is a preemptive algorithm meant for a workload comprising only periodic tasks4 where the priority of a task is inversely related to its period. That is, if we have two tasks T1, T2 with periods P1, P2 respectively such that P1 P2, then T1 has higher priority than T2. If two tasks have identical periods, then the tie can be broken arbitrarily. RM is a preemptive algorithm: a task only executes when there is no higherpriority task waiting to execute. It is assumed that the cost of preemption is negligible. To begin with, we will assume that transients are of negligible duration, i.e., a processor crashes and comes back up again very quickly. Later, we will relax this assumption.

Under these assumptions, there is just one processor; if a task is affected by failure, it is reexecuted in the form of a backup.

Pages:     | 1 || 3 | 4 |   ...   | 8 |

Similar works:

«Parse Forest Diagnostics with Dr. Ambiguity H. J. S. Basten and J. J. Vinju Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands {Jurgen.Vinju,Bas.Basten}@cwi.nl Abstract In this paper we propose and evaluate a method for locating causes of ambiguity in context-free grammars by automatic analysis of parse forests. A parse forest is the set of parse trees of an ambiguous sentence. Deducing causes of ambiguity from observing parse forests is hard for grammar...»

«The Cameron Family Of Mataura, New Zealand by Geoff Hore Introduction This is a composite of two articles – one, called A.A.CAMERON the Strathspey: Some further information, was written in December 1996 and published in the Piping Times Vol 49, No 11, August 1997, page 46 and the other written in May 2010 for a series on New Zealand Composers of Bagpipe Music for the New Zealand Pipe Band magazine. Much of the information overlaps and it has become expedient to compress it into one article....»

«Creative Construction Conference 2016 Time and Cost Overrun in Public Construction Projects in Qatar Ahmed Senouci1, Alaa Ahmad Ismail2, Neil Eldin1* University of Houston, Department of Construction Management, Houston, TX 77204, USA Qatar University, Department of Civil and Architectural Engineering, Doha, Qatar Abstract This paper investigate time delays and cost overruns in public construction projects in the State of Qatar using statistical relationships between project contracted costs or...»

«Prepared For Bull November 2008 Open Source Paves The Way For The Next Generation Of Enterprise IT A European Survey Finds That Open Source Becomes The Hidden Backbone Of The Software Industry, And Leads To A Paradigm Change In Enterprise IT Open Source Paves The Way For The Next Generation Of Enterprise IT Table Of Contents Executive Summary Today’s Open Source Adoption In Enterprise IT Tomorrow’s Impact On Enterprise IT Open Source Paves The Way For The Next Generation Of Enterprise IT...»

«No. 10-16 Affective Decision Making: A Theory of Optimism Bias Anat Bracha and Donald J. Brown Abstract: Optimism bias is inconsistent with the independence of decision weights and payoffs found in models of choice under risk, such as expected utility theory and prospect theory. Hence, to explain the evidence suggesting that agents are optimistically biased, we propose an alternative model of risky choice, affective decision making, where decision weights—which we label affective or perceived...»

«’S CLEAN UP LET FASHION The state of pay behind the UK high street Labour Behind the Label T’S CLEAN UP LE FASHION The state of pay behind the UK high street The Let’s Clean Up Fashion campaign is supported by Acknowledgements The principal writer for this report was Martin Hearson. Additional research and writing was by a team of volunteers: Ruth Domoney, Zoe Stopak-Behr, Kate Rickett, Seema Amin and Catherine Barter. Other contributions were made by Sam Maher, Liz Parker, Jeroen Merk,...»

«International Journal of Liberal Arts and Social Science ISSN: 2307-924X www.ijlass.org Motherless Children in As I Lay Dying -Darl, Dewey Dell and Vardaman Deng Yunfei School of Foreign Languages, China West Normal University, No.1 Xihua Road, Shun qin District Nanchong, Sichuan Province 637009 Email: 948043201@qq.com Abstract As I Lay Dying is one of the family novels William Faulkner wrote in the 1930’s. It depicts the Bundrens, a peculiar family in which parents and children are alienated...»


«US FOOD AND DRUG ADMINISTRATION Tobacco Products Scientific Advisory Committee MR0000020-MR0000029 General Snus Products April 9-10, 2015 ADVISORY COMMITTEE BRIEFING MATERIALS: AVAILABLE FOR PUBLIC RELEASE Swedish Match Page 1 FDA Advisory Committee Briefing Document General Snus TABLE OF CONTENTS Page EXECUTIVE SUMMARY 1. Introduction 2. DESCRIPTION OF THE SNUS PRODUCTS 3. THE COMPANY 4. MODIFIED RISK TOBACCO PRODUCTS 4.1. The Concept of “Modified-Risk” 4.2. Statutory Standard for Approval...»

«INSTRUCTIONS: 1. Read entire form.2. Fill in form on-line.3. Click Print button at the bottom of the form.4. Sign all forms where indicated.5. Mail one entry form and $25 per entry, not per person. September 15, 2016 NOTE: EACH INDIVIDUAL MUST SIGN A RELEASE FORM (details below) TO: RIDING CLUB AND SHERIFF’S POSSE Parade Participants The Southwestern Exposition and Livestock Show invites you to join us for another exciting year. The 2017 Stock Show All-Western Parade will be held in downtown...»

«CONTENTS PAGE The Society Officers of the Society 1979-80 1 Editorial........... 2 The Society and its Presidents........ 3 News of the Society: General.......... 5 The London Group........ 6 The Yorkshire Branch.. 7 Old Members' Dinner 14 June 1980 8 St Catharine's Gild.. 8 The Quincentenary Appeal and Building Fund.... 9 The American Friends......... 9 Personal News: Engagements.. 10 Honours and Awards. 12 Marriages... 10...»

«St Columba’s Catholic College, Springwood The Dove “It’s all about learning Term 4 Friday 12th December, 2014 and having high expectations Volume: 42 No: 19 of ourselves and each other” In this issue: Dear members of the St Columba’s community 1. Principal’s Address 2. School Counsellor Farewell and thank you 3. Student News 4. Careers At our Advent liturgies last week we prayed and gave thanks for the 5. Diocesan News...»

<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.