WWW.ABSTRACT.DISLIB.INFO
FREE ELECTRONIC LIBRARY - Abstracts, online materials
 
<< HOME
CONTACTS



Pages:     | 1 || 3 | 4 |

«CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. (2015) Published online in Wiley Online Library ...»

-- [ Page 2 ] --

Another scheduler is called dynamic priority scheduler which utilizes parallel scheduling, that is, it uses any of the previous methods with a priority condition. This condition differs from an algorithm to another, but most of priority algorithms use job deadline, pricing, or other thresholds [11].

In 2012, Facebook introduced Corona [12], which is a new scheduling technique. Corona uses two management levels: (1) cluster manager: there is one cluster manager to keep track the resources and calculate the management of free slots; (2) job tracker: it assigns dedicated job tracker for each job; Corona depends on requester client to process small job to increase cluster performance.

The authors of [13] note that the number of jobs, which has small weights, is more than the number of jobs which has large weights. So, this algorithm depends on weights of jobs and hence uses round-robin algorithm for scheduling. The round-robin algorithm idea is to partition job queue into different sub-queues and mark each sub-queue according to its weight. Improved Weighted Round Robin depends on running jobs in unweighted queue in the first round. After running the first task of each job, the scheduler classifies the jobs into weighted sub-queues according to their weights.

To maintain fairness principle, the scheduler will allocate the resources to sub-queues equally in terms of percentages, where small-weighted sub-queues are allocated in the same averages that are allocated for large-weighted sub-queues. Experimental results show that the scheduler has significant impact on load balancing among sub-queues. This in fact has good outcomes in the whole Hadoop platform.

The authors of [14] observe that the order of executing job has an impact on all execution times for all jobs. So, they tried to concentrate more on ordering jobs to achieve best system performance.

So, they proposed Balanced Pool algorithm that applies capacity scheduler principle on Johnson algorithm. Balanced pool partitions the jobs with two queues. It then partitions the resources into two pools that keep enough resources to execute jobs for each pool. It then applies Johnson algorithm on each pool. It saves about 20% of all jobs execution time.

The authors of [15] proposed constraint-based scheduler, where it takes the job deadline as input.

It then calculates the minimum number of tasks that are adequate to continue processing for each submitted job. It then decides to continue in job or reject it, to avoid wasting resources with noncompleted jobs.

The authors of [16] proposed a new technique (the Purlieus) to improve data locality in MapReduce framework that runs on clusters such as a cloud server. The Purlieus provisions VM in data locality manner where job characteristics set up this method. The Purlieus classifies the job into three types, where each type has a proposed procedure for provision purpose and VMs.

The authors of [17] mathematically impacted the factors such as number of task and nodes on the data locality. They used term ‘goodness’ as a percent of local map task. The proposed optimal scheduler builds a cost matrix that assigns zero if the task is local, otherwise one. It then uses linear sum assignment problem (LSAP) to build its scheduler, where LSAP is used to assign N items to N workers. The algorithm sometimes uses some cases to make matrix as a square matrix; this simplifies using the LSAP for building optimal scheduler.

–  –  –

The authors of [18] proposed Locality-aware reduce task scheduling algorithm a new localityaware reduce task scheduler. This new practical scheduler improves MapReduce performance via scheduling the reduce phase. This scheduler waits until all mappers are finished. It then defines a sweet spot that combines all information about size of the partitions that are produced by mappers. Depending on such information, the scheduler decides to trade-off between early shuffled or combining more information.

The authors of [19] proposed a new scheduler that estimates execution time and prefetching input data before assigning new tasks for computing node. This algorithm proposed a new shuffle schema to improve system performance. The predicate scheduler (another name for prefetching scheduler) depends on predicting data blocks and forwards it to tasks that need it. The prediction module predicts three types of information, which are: (1) finish time for current tasks, (2) waiting tasks to be assigned to slave node, and (3) time duration for waiting tasks. This algorithm has good result specially in reducing input data transfers.

The authors in [20] proposed a new scheduler that is adapted for MapReduce clusters with different users. The new proposed scheduler depends on real time predicting which is used to reduce weight of data locality for unprocessed tasks. The proposed algorithm starts by predicting resource consumption for unprocessed tasks by estimating current processing tasks. The algorithm then calculates the data locality weight for each job to avoid small job problem. Hence, the proposed algorithm tries to enhance the map reduce cluster resources; eventually, the experiments present promising results when compared with traditional map reduce scheduler.

LiPS system is introduced in [21] and promised a cost-efficient data and task co-scheduler for MapReduce in a cloud environment using linear programming technique.





A hierarchical MapReduce scheduler for hybrid data centers is presented in [22]. It accompanied task scheduling in integrated virtualized and non-virtualized environments.

3. MAPREDUCE SCHEDULING ALGORITHMS

MapReduce framework is one of the major big data management systems introduced by Google in

2004. It is mainly based on parallel processing of computing infrastructure that exploits the massive computing infrastructure available to handle many major issues introduced by the rising use of the big data. MapReduce framework provides a set of features such as user-defined functions, automatic parallelization and distribution, fault tolerance, and high availability by data replicating. Hadoop is an Apache open source implementation for MapReduce parallel framework. It hides the details of parallel processing and helps in fault tolerance by replicating data into multiple nodes. MapReduce framework follows master/slave technique where job tracker is responsible for scheduling, monitoring, and re-executing failure tasks on slave machines. Task tracker is responsible for processing tasks that have been assigned by job tracker.

As clear from the MapReduce methodology name, it deals with the big data management in two phases: the map phase and the reduce phase. It is important to mention that because all the processing that involves the scheduling work will be in the map phase, and because this paper deals with evaluating a set of different schedulers (which are part of the map phase), we limit this paper to deal with only the map phase. Figure 1 illustrates a high-level overview of the MapReduce methodology.

In this paper, a set of scheduling algorithms will be considered to test their performance over a virtualized environment. Their results will be then compared according to some criteria. The set of schedulers includes the FIFO scheduler, MM scheduler, Delay scheduler, and our proposed scheduler the MTL. Such set of schedulers is chosen in a way that spans a group of well-known and widely used schedulers as well as schedulers with different architectures. The following briefly illustrates how each of these schedulers works.

3.1. First in first out scheduler First in first out schedulers ‘default Hadoop’ attempts to increase data locality [7, 23]. FIFO scheduler works using a regular FIFO queue. Each loaded job is decomposed into a set of tasks and then loaded to the queue one by one. The FIFO scheduler processes the first task submitted and assign

–  –  –

Figure 1. MapReduce framework (Adapted from MapReduce Systems [24]).

it a free slot as it becomes available. After handling the first task, it starts handling the next task in the queue and keep doing this process till finishing the last task in a first come, first served fashion. Hence, in these criteria, the jobs are processed by the order of jobs as first come, first served.

As internal details of this algorithm, a slave node has free map slots. Once a heartbeat is sent, the scheduler checks local map tasks in the first job and accordingly assigns it to those free map slots.

If the slave node does not have a data locality, then the scheduler assigns only one non-local map task for this node during that heartbeat interval.

The FIFO scheduler has several drawbacks that are important to be discussed. Although the shared cluster looks like it gives a fair share for scheduled tasks, however, the scheduled tasks are impossible to be from another job for current node. In other words, the node does not acquire fair chance to process the data locality. Another drawback of the FIFO scheduler comes from the fact that when small ad hoc tasks are run on large clusters lead to low data locality. The following example illustrates the second disadvantage. Suppose that you want to run a job from 10 map tasks on a cluster of 200 nodes, every map task needs input data, so, the job needs 10 input data. If every input data exist on individual node, then the job can be processed in 10 nodes, and every input data has two replicas on different nodes. As a result, the numbers of nodes that contain input data reach to 30 nodes at maximum. So, low data localities are achieved. Because there will be 170 nodes without local input, data will be processed.

3.2. Matchmaking scheduler Matchmaking algorithm [23] gives every slave node a chance to process local tasks. In fact, this scheduler gives every job a fair share of the cluster over a respected predefined time slot. Predefining the map slot is helpful in preventing any greedy jobs from reserving resources for long times which might lead other jobs from starvation.Unlike the FIFO scheduler, the MM scheduler allows nodes

–  –  –

to process local tasks from other jobs in the queue, hence, solving the main problem that the FIFO suffers from. Looking for internal details of this MM scheduler shows that when a scheduler does not find local task in the first job, the MM scheduler searches for local map tasks for that node in the next job. In case of not finding more local tasks at this heartbeat, no any non-local task will be assigned, hence, giving a fair share of all the tasks. In the second heartbeat cycle, if a node still does not find more local tasks to free slots and to avoid starvation, the MM scheduler will assign only one non-local map task to the node that has free slots in every heartbeat interval. When a new job arrives, the MM algorithm will clear all markers on nodes and start from the beginning for every new arriving job, because the new job tasks may exist in marked nodes.

The MM scheduling algorithm solves the problems that the FIFO scheduling algorithm suffers from. As mentioned earlier, it gives a fair share for all scheduled tasks by improving the data locality.

However, the MM scheduler suffers from a shortcoming in its execution time. In fact, the MM scheduler does not assign non-local tasks until a new heartbeat is obtained. This means that more time will be taken with large clusters. Another shortcoming appears when new jobs arrived; the MM scheduler will clear all markers and start from the beginning. This in fact will result in many jobs arriving such as Facebook framework.

3.3. Delay scheduler The Delay algorithm [23] is proposed to solve some of the problems of the FIFO scheduler. Specifically, like MM, the Delay scheduler gives every job a fair share of the cluster over a respected predefined time slot. Predefining the map slot is helpful in preventing any greedy job from reserving resources for long periods of time which might lead to job starvation. The main idea in Delay algorithm is that when a node sends heartbeat to request a map task, if the first job cannot process local task, it delays the job and processes the next one. To ensure the fairness among all scheduled jobs and avoid job starvation, a maximum delay time is used to limit the time a job is delayed.

This scheduler solves problems appearing in FIFO; however, it creates new problems. The first issue is that the Delay scheduler has a problem in terms of its efficiency because it does not work well when the map slots are freed slowly (map slots need more time to process map tasks). The second issue appears when no local task is found in a node, it assigns multiple non-local tasks to that node if it has multiple free slots in that heartbeat. Consequently, the Delay scheduler, in some cases, performs worse than the default FIFO.

3.4. Multithreading locality scheduler Multithreading locality scheduler utilizes multithreading architecture to perform all the scheduling processes where each thread is assigned to a predetermined block of jobs. The MTL scheduler differs from all the previous schedulers in dealing with synchronizations of blocks. Such blocks’ synchronization is achieved through using multithreading scheduling. It uses multithreading scheduling on a cluster of nodes in one scheduling direction from wait job queue to cluster nodes (unlike other schedulers that work from the nodes to the job queue). This way of dealing with the scheduling processes makes it more appropriate and feasible because the MTL schedule tries to find the best node for a corresponding job. In fact, it has an advantage over the other existing schedulers in solving data locality problems by parallel searching using multithreading techniques [7].

The MTL starts by dividing the cluster into N blocks. Each block contains a number of commodity machines to process and store input data. Each block is scheduled by a special thread that schedules the jobs in the wait queue. Once a new job arrives to the cluster, the MapReduce scheduler contacts the Namenode to determine the rack that includes the largest proportion of data locality tasks for this job.

When any job is to be processed, the threads start searching in their blocks node for local map task, where each thread takes information about current task and starts searching in their blocks.



Pages:     | 1 || 3 | 4 |


Similar works:

«The trouble at your door Targeted cyber-attacks in the UK and Europe December 2015 – UK Edition The research presented in this report examines the perceptions and experiences related to targeted cyber-attacks across 600 European organisations. The report includes a list of the worst 40 reported attacks in the past 12 months. Targeted attacks are a concern for the vast majority. Almost a quarter said such attacks are now inevitable and more than one fifth admitted to a recent data theft....»

«SUSTAINABLE TOURISM: RESEARCH AND REALITY Ralf Buckley Griffith University, Australia Ralf Buckley (Director, International Centre for Ecotourism Research, Griffith University, Australian 4222, r.buckley@griffith.edu.au) leads Griffith University’s research in sustainable tourism, currently ranked first worldwide. He has ~750 publications including ~200 journal articles and a dozen books, about half in ecotourism, and has worked in 40 countries. Abstract: Social and environmental impacts,...»

«The Myth of Morality Richard Joyce University of Sheffield Wretched virtue! Thou art a mere name, but I did practice thee as real! Unknown; cited by Plutarch “De superstitione,” Moralia Contents Preface page ix 1 Error theory and motivation 1 2 Error theory and reasons 30 3 Practical instrumentalism 53 4 The relativity of reasons 80 5 Internal and external reasons 106 6 Morality and evolution 135 7 Fictionalism 175 8 Moral fictionalism 206 Epilogue: Debunking myths 232 Select bibliography...»

«ARBITRATION DECISION NO.: UNION: OCSEA, Local 11, AFSCME, AFL-CIO EMPLOYER: Department of Mental Retardation and Developmental Disabilities Mount Vernon Developmental Center DATE OF ARBITRATION: October 27, 1989 DATE OF DECISION: November 8, 1989 GRIEVANT: Pamela Neipling OCB GRIEVANCE NO.: 24-09-(89-02-14)-0174-01-04 ARBITRATOR: Harry Graham FOR THE UNION: Brenda Persinger FOR THE EMPLOYER: Rodney Sampson KEY WORDS: Suspension Tardiness Progressive Discipline Mitigating Circumstances...»

«RPRD 802 Financial Institution Theory and Practice Draft: September 28, 2016 Fall 2016 Frank Milne Overview: This course provides an overview of the banking, insurance and financial institution literature which has attempted to understand the recent financial crisis. The post crisis literature is voluminous and of varied quality. It is easy to become confused by the plethora of arguments, models and evidence covered in the large number of articles, reports and books. Any such course must be...»

«FirstPROOF Guide To Pre-Press Soft-Proofing Copyright © 2002-2016 Hamillroad Software Page 1 Contents Introduction What No Film? Pre-Press Soft-Proofing Basic Proofing Page Content Separations Blank Separations Duplicate Separations Halftone Screening Screens Screen Moiré Vignettes & Blends Content Moiré Traps Black Traps Overprints Step-n-Repeats / Object Alignment Advanced Proofing Ink Limits Max Ink Limit Min (Dot) Ink Limit Front-to-Back Registration Cylinder Seams Compare Separations...»

«Lime, Gypsum, and Basaltic Dust Effects on the Calcium Nutrition and Fruit Quality of Pineapple J.A. Silva, R. Hamasaki, R. Paull, R. Ogoshi, D. P. Bartholomew, S. Fukuda, N.V. Hue, G. Uehara, and G.Y. Tsuji. Dept.of Tropical Plant and Soil Sciences, Univ. of Hawaii, Honolulu, HI 96822, U.S.A. Keywords: Ananas comosus, iron, manganese, soil analysis, tissue analysis, translucence Abstract Pineapples grown in acid soils containing high levels of manganese (Mn) can exhibit iron deficiency because...»

«II° INFORME NACIONAL SOBRE LOS OBJETIVOS DE DESARROLLO DEL MILENIO 2009 UN MUNDO MEJOR PARA TODOS ORGANIZACION DE LA PREPARACION DEL II INFORME NACIONAL SOBRE EL CUMPLIMIENTO DE LOS OBJETIVOS DE DESARROLLO DEL MILENIO SUPERVISION GENERAL José ELA OYANA, Ministro de Planificación, Desarrollo Económico e Inversiones  Victoriana NCHAMA NSUE, Secretaria de Estado para la Cooperación Internacional  Leo HEILEMAN, Representante Residente de PNUD y Coordinador Residente del  Sistema de...»

«Accounting for Optionality in Nonnative Grammars: Parametric Change in Diachrony and L2 Development as Instances of Internalized Diglossia Helmut Zobl and Juana M. Liceras Carleton University and University of Ottawa 1. Introduction This paper introduces the Competing Grammars Hypothesis (CGH) and considers its potential as a framework for explaining central aspects of L2 development, in particular the occurrence of optionality in the acquisition of new parametric options. The CGH was...»

«Becoming a Practice Owner: The Challenges facing UK Dentists October 2012 By Martin Kemp, Marelize Van den Berg, Penny Whitehead, and Henry Edwards British Dental Association 64 Wimpole Street London W1G 8YS About the BDA The British Dental Association (BDA) is the professional association for dentists in the UK. It represents more than 23,000 dentists working in general practice, in community and hospital settings, in academia and research, and in the armed forces, and includes dental...»

«ONE DIRECTION – ON THE ROAD AGAIN AUSTRALIAN STADIUM TOUR FAQ FOR ALL VENUES TICKETED BY TICKETEK Q. When do tickets go on sale to the general public? A. Tickets to all venues for the Australian tour go on sale Saturday 31 May, 4.00pm (local time). Venue Concert Date On sale time Tickets will be On Sale Via Sydney www.ticketek.com.au Saturday 7 Feb Saturday 31 May Allianz 2015 2014 m.ticketek.com.au Stadium 4:00PM (AEST) Call Centre 132 849 Ticketek Agencies* *Check with your local Agency to...»

«SAFER WOLVERHAMPTON PARTNERSHIP Crime Reduction, Community Safety and Drugs Strategy 2014-2017 www.saferwton.org.uk Contents Foreword Executive Summary Introduction The Partnership Functions and Statutory Duties The City Reflections on 2011-2014 SWP Priorities How we determined our priorities Priority 1: Reducing Reoffending Priority 2: Substance Misuse Priority 3: Violence Against Women and Girls Priority 4: Gangs and Youth violence/youth crime Priority areas Prevent Funding Equalities...»





 
<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.