Capstone Project Plan
Name:
Parth Panchal
Student ID:
11569876
Title:
Emerging Technology and Innovation Topic
Task Scheduling Methods in Cluster Computing Environment for High Performance
Weekly Progress Reports Plan (In class, Discussion Board or Project Blog entries)
Week 1-3
Rationale
Parallel computing perform concurrently execution tasks on distributed nodes. Large application split up into tasks and run on number on nodes for high performance computing. Cluster environment composed with heterogeneous devices and software components capable of cost effective and high performance computing on parallel application. In heterogeneous cluster environment proper scheduling of tasks and allocation to nodes is important for high
…show more content…
Many task scheduling algorithms work have done already in this field and some work requires for more cost effective use heterogeneous cluster computer and increase performance use with new techniques. Problem Formulation:
As mention in above section heterogeneous nodes have varied execution power. Heterogeneous nodes have required dynamic nature of work. High utilization of resources is achieved by scheduling techniques. Proper utilization of nodes improved computing performance. Controlling the flow of scheduling and allocation tasks to node with minimize communication implies problem. To formulate the problem we use equation as shown in equation 1 for utilization of resources on heterogeneous nodes.
Data collection or systems design methods
A. Static Task Scheduling
Static task scheduling performed during compile time. Task partitioning and allocation of tasks to nodes by predestinations of task execution time node computation power, task dependency and communication between nodes. It required high memory storage for compile time estimation. It does not perform runtime reallocation and leads node waiting time. Static task scheduling in heterogeneous cluster reduce communication between nodes but leads to node waiting time. In equation 2 as shown system utilization $U$ achieves maximize when current execution tasks $t$ maximum and node
Clusters are arranged so that they will improve speed and reliability. Scheduling is matching a task to the resources at particular time. Google uses cluster management term which includes the allocation of recourses to different application.
The first step is to build a computer cluster using 10 nodes plus the master. Each node is a desktop computer which will all be linked together, and then connected to one master computer. The master computer delegates tasks to different nodes, and then each node will send the data back to the master. This is the same process which computers currently use when they contain more than one processor, and is referred to as parallel processing. After the cluster has been built, the efficiency will be measured by measuring physical dimensions, processing power, and power consumption. This will then be compared to an average modern computer. Finally, an overview of the process will be written in the form of instructions for someone else to build a similar cluster, and the pros and cons of using a cluster, compared to other
4. Applications where various of machines can be doled out for each to do a task e.g every processing a single file
| Cloud computing is considered to be a new computing paradigm where applications, data and Information Technology services are provided over the internet. A very important factor is cloud computing research is task management which plays a key role in ensuring an efficient system. Task scheduling problems are premier considerations which relate to the efficiency
Furthermore, grid computing needs a sophisticated infrastructure: small servers, quick connections between the servers and at last, so as to maximize the potential of that infrastructure, it needs the utilization of quality tools, software system and skilled technicians to manage the grid. In different words, once of these elements are place along, it's obvious that this technology is
Straightforward access to large-scale distributed computational assets is provided by computational Grids. With the help of their size and geographic dissemination they help to create large computing centers. Several simulation studies are performed for evaluating the mechanisms to allocate resources in a Grid. There are three mechanisms to allocate resources in a Grid. The first method is the non-cooperative sealed-bid method where the tasks are auctioned off to the highest bidder. Second, the semi-cooperative n-round bid method in which each site delegates the task its work to others if it cannot perform the work itself. Last, the cooperative method in which all the sites cooperatively perform all the tasks as efficiently as possible. The simulation model has various leveled Grid structure in which machines are built around larger computing centers called “federations” [1].
Chapter 13 is titled “Scheduling Operations” and it is mainly about scheduling decisions for batch operations and how they deal with the allocation of scarce resources to jobs, activities, tasks, or customers. “Scheduling results in a time-phased plan, or schedule, of activities. The schedule indicates what is to be done, when, by whom, and with what equipment. Scheduling should be clearly differentiated from aggregate planning” (Schroeder, pg. 293).
In conventional datacenters, there were two networks. One used for local area network which was built on Ethernet, was used by users to access applications running on servers. And the second one often built on Fiber channel, which connects servers to the storage module where mountains of data are stored. Both networks require huge capital investment, each requiring specialized hardware. Both networks have vastly different management tools, which require staff with different skill sets to build, maintain and manage. With the proliferation of datacenter, equipment density and power consumption became more critical than ever. Thus the cost of maintenance and total cost of ownership began to increase.
Parallel programming, the utilisation of many small tasks to complete a larger one, has become far more prevalent in recent times as problems call for systems with higher performance, faster turnover times, easy access, and lower costs. While this has previously been cost-prohibitive, given that one would have had to purchase a large number of physical machines to work on, the development of cloud computing systems has largely answered this call, providing resources and computing power as a service to users, rather than a product. The addition of hardware virtualisation has further increased the availability of massively-parallel collections of computers as flexible networked platforms for computing large-scale problems.
The scheduling of Linux operating system is priority based scheduling. It is to make scheduling policies into the core of Linux which called Kernel for multi-tasking processes. There are two different scheduling: real time and normal, for handling large data processes performance balance and sharing CPU equally in the system. In the scheduling of Kernel, each process has a priority value which ranges from 1 to 139. 1 is the highest priority level. 139 is the lowest priority level. The real time priorities range from 1 to 99 and the normal priorities range from 100 to 139. The smaller number of priority value, the priority is higher. All real time programs have a higher priority than normal programs in the system. In Linux scheduling is implemented by a class named sched_class (Seeker, 2013).
provisioning, pay-as-you-go model making it reliable to run applications on the cloud. On the cloud environment, the running scientific applications are modeled as a workflow graph whereon completing one task, beginning of another task takes place which results in the formation of a Directed Acyclic Graph (DAG). One of the key
The Important of Cloud Computing is increasing and it is receiving a growing attention in Scientific & Industrial Communities. Cloud computing is one of the top 10 most important technologies and has a greater possibility in successive years by companies and organizations. Cloud computing implement everywhere, favourable, on demand network access to a shared pool of configurable computing resources (e.g. networks, servers, storage, applications and servers) that can be immediately supply and released with minimal management effort.
CPU scheduling is the basis of multiprogrammed operating systems. By switching the CPU among processes, the operating system can make the computer more productive.
Therefore, the execution rate of all tasks is the same among all processors. Hence, the scheduling
Cloud computing is developing as a definitive alternative to supercomputers for some high- performance computing (HPC) applications. Cloud computing provides the benefits of virtualization, good utilization of resources and minimizing or in several cases