This paper is based on CUDA, a parallel computing platform model, which utilizes the resources of the Graphical Processing Unit (GPU), increasing the computing performance of our system, hence creating a robust parallel computing unit. In this paper, we will be introducing a brief history on CUDA, it’s execution flow and it’s architecture to handle processor intensive tasks. We will also be highlighting some of it’s real life applications and the difference in performance as compared of the only CPU based architectures. Also, since most of the CUDA applications are written in C/C++, we will also be exploring how CUDA provides the programmable interface in such languages as well. Finally, we will be including the current research activities …show more content…
So, in 2007, NVIDIA released CUDA, which provided the parallel architecture, to support the usage of the GPUs. It was designed to work with programming languages such as C/C++ or Fortran and this really helped specialists in parallel programming to use CUDA, rather than to learn other advanced skills in GPU programming[10] . The model for GPU computing is to use a CPU and GPU together in a heterogeneous co-processing computing model[3]. The framework is designed such that the sequential part of the application runs on the CPU and the computationally-intensive part is accelerated by the GPU. From the user’s point of view, the application is faster because it is using the better performance of the GPU to improve its own performance.  Figure1: Core comparison between CPU and GPU 3. Architecture Since GPUs have large number of resources with hundreds of cores and thousands of threads to be utilized and have very high number of arithmetic and logical units. Hence it provides a huge parallel architectural framework to work with.  Here is a block diagram that generally describes CUDAs architecture. Figure 2: Block diagram for CUDA Architecture[4] Basic Units of CUDA Figure 2 showed the top level block diagram of the overall architecture of CUDA. Now, exploring more on to the details, we will be discussing about the basic units of CUDA.  Figure 3 : CUDA supported GPU structure [11] The architecture
Processor: A computer processer, otherwise known as the CPU (Central Processing Unit) is a part of the computer that receives input and decides that the output will be. Many modern CPUs are capable of processing trillions of calculations per second. The speed of a processor measured by how many operations a CPU can do in one second; this is done in MHz or GHz (Megahertz or Gigahertz). A processer with the speed of 1 MHz does 1,000,000 operations per second, and a 1GHz processer does 1,000,000,000 operations per second. This is known as the clock speed.
The processor (CPU) is like the brain of a computer, the thing that carries out the tasks you give it. Better CPUs can perform more tasks at once, and perform them faster. Not everyone takes full advantage of their processor's full speed, so the high-end processers are only really needed if you're performing intensive things like gaming or video editing. It's also one of the most expensive parts of a computer, so if you aren't doing these
Moreover, GPUs utilize an on a very basic level diverse engineering; one would need to program an application particularly for a GPU for it to work, and fundamentally extraordinary procedures are required to program GPUs. These distinctive procedures incorporate new programming dialects, adjustments to existing dialects, and new programming ideal models that are more qualified to communicating a calculation as a parallel operation to be performed by many stream processors.
3. The Graphical Processing Unit is a hardware component capable of quickly drawing items to the screen.
Video card is bit of PC hard product the introduces in Mather load up that work for video and download all film in the meantime additionally influence quick for PC which to can have more space inside the video card it makes cooling for CPU and RAM.
Description and relevant performance metrics: Digital Computers with 2688 Intel Itanium Processors and 384 MIPS Processors distributed amongst 10 single image NUMA-based clusters. Individual clusters have a compute capability in excess of 190 million MTOPS
A graphics card is a component of a computer system that helps to generate an image on the computer screen. Graphics cards are connected between the CPU and the screen in a computer system in order to provide the user with a clear image on the screen, a graphics card has its own RAM and its own Graphics Processor, and this means that it does not rely on the CPU to provide the entire RAM for the computer system. The inclusion of RAM into the graphics card means that the card will take care of most of the graphics processing allowing the computers RAM to do other jobs and do them more efficiently. The information process begins in the CPU where it sends information about the image to the graphics card; the graphics card decides how to use the pixels to create the image on the screen. There are different types of graphics card available and some are better than others.. A good quality graphics card produces smoother video and graphics playback and sharper pictures. A good graphics card can mean a faster refresh rate so if working with photos you can display them quicker and have better colour resolution. A graphics card can be used where it can produce high definition gameplay and also enable dual display and even use a television screen as a monitor through the use of a HDMI cable. High end graphics cards can be upwards of five thousand pounds whereas lower end graphics cards can be as low as fifty pounds. For a
Multiprocessor (having more than one processor) refers to a system with two or more processors or CPUs. Multiprocessing (supporting multiple processes) refers to a system that can process one or more tasks at a time.
This type of computing even makes use of other resources such as SANs, network equipment, and security devices. It can also support applications that are accessible through the Internet. These applications make use of large data centers and powerful servers that host Web applications and Web services.
In Concurrent and parallel Processing HP must focus on modifying the manufacturing process; in essence, modification of the product design. It helps to combine stems in a manufacturing process, thus reducing lead time.
CUDA is a programming model created by NVIDIA gives the developer access to GPU computing resources following through an Application Programming Interface (API) the standard CUDA terminology. We will see GPU as the device and CPU as the host programming language extends to C / C ++ Language. GPU programming is different from model normal CPU
4. Performance Comparison of Dual Core Processors Using Multiprogrammed and Multithreaded Benchmarks ............................................................................................... 31 4.1 Overview ........................................................................................................... 31 4.2 Methodology ..................................................................................................... 31 Multiprogrammed Workload Measurements .................................................... 33 4.3 4.4 Multithreaded Program Behavior ..................................................................... 36 5. 6. Related Work ............................................................................................................ 39 Conclusion ................................................................................................................ 41
Using OpenCL in programming is relatively straightforward. First, write a kernel-type method in C; this is the code that will be run on a kernel. Next, identify the platform the code is running on, and the device it will execute on within that platform. Then you must instantiate a context from the device and create an instance of the kernel (this is the program). Next, send the kernel to the
Abstract- Video compression utilizes high computational power. For real-time requirements of multimedia applications faster computations need to be performed. To fulfill such requirements of video compression a parallel computational algorithm can be implemented. In this thesis, the parallelism of multi-core architecture was exploited using block matching algorithm. The test-implementation of the algorithm was done on MATLAB. The parallel model was designed using OpenMP and its implementation was tested on multi-core architectures.
3. Presents the software description. It explains the implementation of the project using PIC C Compiler software.