preview

Essay On Processing Elements

Decent Essays

A processing element (PE) is an abstraction that is most useful in the standard to illustrate some concepts and is essentially a virtual scalar processor. A compute unit (CU) is composed of one or more processing elements and local memory. A device is a collection of compute units. A multi core CPU or multiple CPUs in a multi socket machine constitute a single device. The separate cores are compute units. A command queue is attached to a single device and submits work to it. They are created within the scope of a context, while different queues may be attached to the same device. Applications queue kernel execution instances with queuing happening in order and execution happening in or out of order. A kernel is a function declared in a …show more content…

For a task execution, the work item IDs are organized in up to three dimensions which define the N dimensional computation domain (1, 2 or 3) which dictates the total number of work items that execute in parallel. A work group is a collection of related work items that is scheduled on a single compute unit. The work items in the group execute the same kernel, share local memory, work group barriers and memory fences efficiently. Work group instances are executed in parallel across multiple compute units or concurrently on the same compute unit. The dimensions determine how kernels operate upon input in parallel. The application usually specifies the dimensions based on the size of the input. OpenCL executes kernel functions on the device. The host coordinates the execution and provides arguments/execution parameters to launch the kernel. The argument list is identical for all invocations. When launching a kernel for execution, the host code defines the grid dimensions or the global work size, the number of iterations to perform. The host code can also define the partitioning to work groups or leave it to the implementation. During the execution, the implementation runs a single work item for each point on the grid. It also groups the execution on compute units according to the work group size. The local work group size is how many work items are in a work group, while global work size is the

Get Access