FREE ELECTRONIC LIBRARY - Abstracts, online materials

Pages:   || 2 | 3 | 4 | 5 |   ...   | 8 |

«Real-time systems are one of the most important applications of computers, both in commercial terms and in terms of social impact. Increasingly, ...»

-- [ Page 1 ] --

Fault-Tolerant Scheduling in Homogeneous Real-Time Systems

C. M. Krishna

University of Massachusetts at Amherst


Real-time systems are one of the most important applications of computers, both

in commercial terms and in terms of social impact. Increasingly, real-time computers are used to control life-critical applications and need to meet stringent reliability

conditions. Since the reliability of a real-time system is related to the probability

of meeting its hard deadlines, these reliability requirements translate to the need to meet critical task deadlines with a very high probability. We survey the problem of how to schedule tasks in such a way that deadlines continue to be met despite processor (permanent or transient) or software failure.

Categories and Subject Descriptors: A.1 [Introductory and Survey]; D.4.1 [Process Management]: scheduling; D4.5 [Reliability]: Fault-Tolerance; D4.7 [Organization and Design]: Real-time Systems; J.7 [Computers in other systems]:


General Terms: Algorithms, reliability.

Additional Key Words and Phrases: Cyber-physical systems, time redundancy, task assignment.

1 Introduction Real-time systems are an important application of computer systems. Increasingly, they are used in life-critical and complex applications, in the control of aircraft, medical equipment, automobiles, chemical plants, and power distribution systems.

The hallmark of real-time systems is that their tasks have deadlines and missing too many such deadlines in a row can result in catastrophic failure (such as an air crash or the death of a patient). As a result, much effort has been devoted in recent years to developing techniques by which to render such systems highly reliable. These efforts have generally involved the use of massive hardware redundancy, i.e., of using many more processors than are absolutely necessary to ensure that enough will remain alive, despite failures, to continue providing acceptable levels of service.

However, throwing a lot of redundant hardware at the problem is not always possible. Increasingly, real-time systems are used in cost-sensitive applications like cars, where a cost differential of even a hundred dollars can make a commercial difference. Also, there are constraints other than cost that must be taken into account, the most obvious of which is the availability of power: many applications have quite notable constraints on their total power or energy budget.

Author’s Address: Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA 01003; email: krishna@ecs.umass.edu. c C.M. Krishna, 2010.

Section Failure Type

–  –  –

Further, the vast majority of processor failures are transient in nature, indicating that what is primarily needed is a mechanism to tide the system over temporary processor outages.

For all these reasons, researchers have considered techniques which can complement massive redundancy and to make such redundancy as is available count effectively towards improving reliability. This field is now mature enough that a survey of the more important techniques is worth carrying out.

This paper is organized as follows. In Section 2, we provide some technical background on real-time scheduling, processor failure models, the fault-tolerant structures that are used, and the use of voltage scaling to provide processors with an extra reserve of speed. Section 3 covers the foundations of fault-tolerant scheduling. Dealing with transient failures is covered in Section 4. Handling software failures1 is very similar, as we see in Section 5. Only single processors need to be considered in both Sections 4 and 5; even if the system has multiple processors, the reaction to the failure involves the affected processor only. After this, we turn in Section 6 to handling permanent hardware faults. This obviously requires a multiprocessor system, and then, in Section 7 to the use of voltage scaling. A discussion of some open problems follows in Section 8. A table of notation is provided in the Appendix.

To provide convenient guide to the rest of the paper, Table 1 lists the failure assumptions and scheduling algorithms included in this survey.

–  –  –

In this section, we outline some of the more salient issues in real-time task scheduling of relevance to our problem. The reader should consult one of the many texts on real-time systems for further information [Cheng, 2002, Krishna and Shin, 1997, Liu, 2000, Murthy and Manimaran, 2001, Nissanke, 1997].

2.1 Types of Real-Time Systems We start by considering the two categories into which real-time systems can be classified.

Timing overruns are also considered software failures, for our purposes.

Real-time systems can be classified into hard and soft. Hard real-time systems are those whose failure (triggered by missing too many hard deadlines) leads to catastrophe. For example, if the computer on a fly-by-wire aircraft fails completely, the aircraft crashes. If a robot carrying out remotely commanded surgery misses a deadline to stop cutting in a particular direction, or a cancer treatment irradiation machine delivers too high a dose of radiation (by not switching off the beam at the right time), the patient can die.

In a soft real-time system on the other hand, missing any number of deadlines may be a cause of user annoyance; however, the outcome is not catastrophic. A server telling you the latest score in a cricket match may cause some users great distress by freezing at the most exciting point of the match, but that is not a catastrophe. An airline reservation system may take a very long time to respond, driving its more impatient customers away, but that may still not be classified as a catastrophe (at least in the short term).

Thus, the classification of real-time systems into hard and soft depends on the application, not on the computer itself. The same type of computer, running similar software, may be classied in one context as soft while it is hard in another. For example, consider a video or audio-only conferencing application. When used for routine business communications, it can be considered a soft real-time system: if the system has outages, it is annoying, but the conversations can be rescheduled. If, on the other hand, it is used by police officers and firefighters to coordinate actions at the scene of a major fire, it is a hard real-time system. Or, take a real-time database system. When used to provide sports scores (as we have seen), it is soft; however, if it is used to provide stock market data, an outage can cause significant losses and may be regarded by its users as catastrophic. In this case, it would be considered a hard real-time system.

Perhaps a good rule of thumb is to say that the designers of hard real-time systems expend a significant fraction of the development and test time ensuring that task deadlines are met to a very high probability; by contrast, soft real-time systems are really of the “best effort” variety, in which the system makes an effort to meet deadlines, but no more than that.

Note that the same computer system may run both hard and soft tasks. In other words, some of its workload may be critical and some may not be.

Most of the applications for which one requires fault-tolerant scheduling require hard realtime computers. Such systems run two types of tasks: periodic and aperiodic. As the term implies, a periodic task, Ti, is issued once every period of Pi seconds. Typically (but not always), the deadline of a periodic task is equal to its period: most of the results in real-time scheduling are based on this assumption. An aperiodic task, on the other hand, can be released at any time; however, specifications may limit their rate of arrival to no more than one every τ seconds.

The majority of real-time systems are extremely simple, many built out of primitive eight-bit processors. It is not uncommon for such processors to have either no pipeline (meaning that they do not overlap instruction execution) or a very simple pipeline (lacking such features as out-of-order execution or branch prediction); most have no caches. Fault-tolerance features in many such applications are rudimentary at best.

In many important applications, however, the real-time system carries a heavy workload and is in the control of significant physical systems. Such systems have been used in aerospace applications for decades; more recently, they have begun to proliferate in other hard real-time applications as well. These include cars, industrial robots, chemical plants, and mechanisms for remote surgery. It is these types of application for which the fault-tolerant scheduling approaches that we survey here are meant.

We concentrate in this survey on homogeneous real-time systems. That is, we do not consider what happens if the processors in the system are of differing types, each with a different affinity to certain instruction mixes and different failure rates. We do briefly mention issues related to removing this restriction in Section 8.

2.2 Task Scheduling All of real-time task scheduling theory assumes that the worst-case execution time (WCET) of tasks is known in advance. This is not a trivial assumption. Modern processors have out-oforder and speculative execution as well as multiple levels of cache: all of these can render the problem of determining the WCET of a piece of software very difficult. Obtaining the WCET is, at present, a rather active area in real-time systems2, but is outside the scope of this paper.

We can broadly classify task scheduling into two categories: offline and online. An offline schedule is one in which the schedule is generated ahead of time and is subsequently followed by the system as it operates. Generally, most tasks run by such systems are periodic, with perhaps a small (but not unimportant) component of the workload being aperiodic, with their arrival times not known in advance. We limit the load that aperiodic tasks impose on the system by specifying a minimum time between two successive arrivals of an aperiodic task. Space is reserved in the schedule to allow for aperiodic tasks to be executed on time. Note that in an offline schedule, the task loading is bounded: barring failures, a successfully generated schedule guarantees that all tasks will meet their deadlines.

An online schedule, on the other hand, is generated on-the-fly as tasks arrive, and does not require us to know in advance the task loading bounds on the system. The price we pay for this is that we cannot guarantee in advance that all task deadlines will be met. Instead, when each task arrives to the system, the system determines whether there is enough time to execute it as well as the other tasks currently awaiting execution. If there is, the task is guaranteed. If there is not, the task is rejected, and it is then up to the user (or the operating system) to invoke some backup activity3. For example, if the computer in an intensive care unit is unable to meet all the real-time requirements for monitoring a given patient and injecting suitable drugs into their IV, a nurse may be designated to take over some of its functions [Ghosh, et al., 1997].

Tasks may either be independent of one another or have precedence constraints. Tasks with precedence constraints can only be started when their preceding tasks have been finished As a matter of practice, many designers simply gather experimental data about the run time of a task over a large number of runs and multiply the maximum observed run time by some safety factor to produce an estimate of the WCET. Such estimates are not provably correct, however. There is no guarantee that even a very large number of experimental runs will catch the worst case combination of inputs.

In most of the work in this area, the guarantee or rejection takes place almost instantly; however, allowing the decision to be deferred until some previously guaranteed tasks have successfully completed execution may improve the fraction of incoming tasks that can be guaranteed [Naedele, 1999].

(assuming that a task needs all its inputs when it starts executing). The vast majority of faulttolerant scheduling results apply for independent tasks; unless otherwise stated, tasks will be assumed to be independent.

Scheduling algorithms may be either preemptive or non-preemptive. A preemptive scheduler allows tasks to be interrupted partway through their execution and then resume from where they left off. In some cases, this may not be possible; then, non-preemptive rules apply. In general, non-preemptive scheduling is more difficult than preemptive scheduling. In a non-preemptive regime, when a task starts executing, we are reserving the processor for that task until it finishes, even if other much more urgent tasks arrive. Scheduling decisions for non-preemptive task sets therefore can have anomalous results, e.g., reducing the execution time of a task can result in actually lengthening the total time taken to complete a set of tasks [Graham 1969].

2.3 Failures All computer systems are subject to hardware and software failures. Hardware failures can be classified as permanent or transient [Johnson, 1989, Koren and Krishna, 2007, Pradhan, 1996,

Sieworek and Swarz, 1999]. Permanent failures indicate faults that do not go away with time:

they include (for the purposes of this paper) intermittent failures, where the fault cycles between being active and being passive. Transient failures, by contrast, go away after some time. The most common form of transient failure is the inducing in memory cells of spurious values, caused by charged particles (e.g., alpha particles) passing through them. The charge carried by such particles may induce the memory cell to flip its value. However, no permanent damage is done to the cell: the next time it is written into, the incorrect memory state will be wiped out. The vast majority of faults are transient; their rate depends on the VLSI technology being used and the operating environment of the system.

In general, the smaller the feature size, the more susceptible a memory cell is to changing its state in response to interactions with charged particles. As feature sizes drop ever further into the deep submicron range (45 nm is quite a mature technology today; designs using 32 nm feature sizes are on the anvil), we can expect the problem to increase in severity. Typically, designers use error-correcting codes to counter such effects.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 8 |

Similar works:

«A Accounting T & Taxation VOLUME 6 NUMBER 1 2014 CONTENTS Determinants of the Bias and Inaccuracy of Management Earnings Forecasts 1 Andrew A. Anabila & Eun Young Whang When Do Companies Fund Their Defined Benefit Pension Plans? 13 Denise A. Jones Marginal Tax Rates around the Hawaii Itemized Deduction Cliff 25 Terrance Jalbert, Gary Fleischman & Mercedes Jalbert The Role of Derivatives in the Financial Crisis and Their Impact on Security Prices 39 Ronald A. Stunda The Impact of IFRS Adoption...»

«MAINE WORKFORCE 2025 MAINE 2025: An exploration of the future workforce requirements for the Maine State Government By Leading Futurists LLC and Green Consulting Group March 2015 LEADING FUTURISTS LLC i March 2015 MAINE WORKFORCE 2025 LEADING FUTURISTS LLC ii March 2015 MAINE WORKFORCE 2025 Preface Maine’s Bureau of Human Resources is seeking to answer critical questions about the future of the state workforce. The Bureau engaged foresight consultancy Leading Futurists LLC, working in...»

«Compliance checks series – CC/FS24 Tax avoidance schemes accelerated payments We have given you this factsheet because you have used a tax avoidance scheme, and we will soon ask you to make a payment of the amount that relates to your use of the scheme. Where this factsheet refers to tax, this includes Class 4 National Insurance contributions (NICs) that are collected through Self Assessment. From the tax year 6 April 2015 to 5 April 2016 it also includes most Class 2 NICs that are collected...»

«15 Timescale and cost implications This chapter provides an indication of how long it might be expected to take to establish a water safety plan. It also examines the likely cost implications; this is done through a series of examples drawn from supply experiences.15.1 TIMESCALE The time it will take to establish a water safety plan will depend upon a number of factors. These include: • the experience of the staff;• the amount of data available on the water supply; • the size and...»

«SLAC-PUB-14159 SHIELDING CALCULATIONS FOR THE HARD X-RAY GENERATED BY LCLS MEC LASER SYSTEM R. QIU, J. C. LIU, S. H. ROKNI AND A. A. PRINZ SLAC National Accelerator Laboratory: 2575 Sand Hill Road, Menlo Park, CA, 94025 rui@slac.stanford.edu LCLS Matter in Extreme Conditions (MEC) Instrument is an X-ray instrument that will be able to create and diagnose High Energy Density (HED) matter. The MEC laser system can generate hard X-ray due to the interaction of the laser and the plasma. This paper...»

«151 West 26th St. (212) 647-1100 New York, NY 10001 www.theatreworksusa.org FIRST IN FLIGHT: THE WRIGHT BROTHERS Study Guide There is destiny which makes us brothers;None goes his way alone. ~Edwin Markham ABOUT THE PLAY ~INTRODUCTION~ THEATREWORKS/USA is staging First in Flight: The Wright Brothers to help young people fully appreciate and celebrate the Brothers Wright the world's first pilots and the much anticipated arrival on December 17, 2003 of their Centennial of Flight. One hundred...»

«COUNTY OF MENDOCINO Steve Dunnicliff, Director Phone: 707-234-6650 DEPARTMENT OF PLANNING AND BUILDING SERVICES Fax: 707-463-5709 Ft. Bragg Phone: 707-964-5379 860 NORTH BUSH STREET ž UKIAH ž CALIFORNIA ž 95482 Ft. Bragg Fax: 707-961-2427 120 WEST FIR STREET ž FT. BRAGG ž CALIFORNIA ž 95437 pbs@co.mendocino.ca.us www.co.mendocino.ca.us/planning MEMORANDUM DATE: AUGUST 4, 2015 (AMENDED) MENDOCINO TOWN LCP AD HOC COMMITTEE TO: FROM: JULIANA CHERRY, PLANNER III RE: POINT OF VIEW ESTATES, SEC...»

«SOLUTIONS Sharing Opportunities for Low carbon Urban TransportatION D1.5 Report on the EU-China Green Urban Mobility workshop SOLUTIONS 1 SOLUTIONS Grant agreement no: 604714 Title Report on the EU-China Green Urban Mobility Workshop Deliverable D 1.5 Work Package 1 Transferability analysis, methods and tools Authors Oliver Lah (WI), Heather Allen (TRL) and Robin King (WRI) Reviewers Bernd Decker (RC), Elisabete Arsenio (LNEC) Status Final Public: PU /Restricted: PP PP Draft submitted to the...»

«Investment Opportunities and the Sources of Lifetime Inequality∗ Kartik Athreya† Felicia Ionescu‡ Urvi Neelakantan§ FRB Richmond Federal Reserve Board FRB Richmond Ivan Vidangos Federal Reserve Board February 15, 2016 Very preliminary and incomplete. Please do not cite. Abstract How much of the dispersion in lifetime earnings, wealth, consumption, and ultimately, utility or well-being is resolved early in life (prior to working life) vs. later? This critical question has received much...»

«1. KGAOLO YA PELE 1.1 MATSENO Puleng balwa gare ga bangwadi ba bohlokwa ba dingwalo tSa Sepedi. 0 ngwadile dipuku tSa direto Ie dipapadi. Tseo e Iego tSa dire to ke tSe di latelago: Ditlalemeso (1980), Seipone sa Madimabe (1981); Malopo a Boreti (1983); Kgaa Kgati tsa Khwiti ya noka yeso (1983). Ke yo mongwe wa bangwadi ba MatSwela (1984) ya go rulaganywa ke M.S. Serudu; Direti tse nne (1984) yeo morulaganyi e lego J.R. Maibelo Ie Sesegotheto (1989) yeo Ie yona e rulagantSwego ke M.S. Serudu....»

«Caring for the Senior Cat. Chronic Renal Insufficiency Chronic kidney disease (also known as renal insufficiency) is one of the most common diseases seen in older cats. Just like people, cats have 2 kidneys. The kidneys are responsible for filtering the blood and removing some of the toxins, as well as assisting in maintaining hydration and producing urine. As cats become older, their kidney function gradually declines. More than 90% of cats over the age of 10 will have some level of kidney...»

«What happened to the St Pauli passengers www.lynly.gen.nz/index.htm CHARLES HENRY MARTIN NEWSPAPER ARTICLES ABOUT MOUTERE Appendix M1 (Weblink SP Appendix M1) Alexander Turnbull Library Wellington Reference 96/11 “The Colonist” newspaper Nelson Thursday June 1st 1876 page 4 “The Early Days of Nelson;” The Moutere Thirty Years Ago and Now: The Trials and Experiences of Early Settlers [By C H M.] Above thirty years have passed away since the first adventurers slowly wound their way up the...»

<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.