Quantcast
Channel: Parallel Computing in C++ and Native Code forum
Viewing all articles
Browse latest Browse all 856

Any optimization ways in memory transfer in AMP?

$
0
0

Recently we did some research on Image processing using GPGPU, and we tired both OpenCL and AMP. It was found that AMP performs worse than OpenCL, consuming more time with the same workload. And we found that the bottleneck is memory transferring between Host and GPU. So we wrote a test code, using AMP or OpenCL just to copy the values of one vector to another. It contains three partitions: copying data from Host to GPU, execution in GPU, and copying data back from GPU to Host, and the time spent in each partition was recorded:

 

Platform: IvyBridge, I5-3450, with HD2500 GPU

Data length: 3264*2448*sizeof(int) bytes

OpenCL config: global_size = 3264*2448, local_size = 1, implement the copies via clEnqueueWriteBuffer and clEnqueueReadBuffer.

AMP config: use the array<int, 1>a(3264*2448) with no tile, implement the copies via the function copy().

 

 

Copy to GPU

Execution in GPU

Copy back to CPU

OpenCL

7.45 ms

100.57 ms

7.68 ms

AMP

98.64 ms

39.4 ms

89.11 ms

 

It can be found that, AMP consumes much more time in memory transfer between Host and GPU, but less time in GPU execution. We guess that maybe AMP copies data to a deeper memory buffer than OCL, so in GPU execution, fetching the data from the GPU buffer in OCL will spend much more time.

 

         However, increasing the local_size in OpenCL can help shorten the GPU Execution time:

Local_size

1

4

8

16

32

64

GPU Execution in OpenCL

100.57ms

35.34ms

18.13ms

10.37ms

10.15ms

10.21ms

 

         And time spent in the memory copy almost remains the same, since they have nothing to do with local_size.

 

But in AMP, the bottleneck is the memory transfer, not the GPU execution, and changing the tile size doesn’t help (in fact, it increased the consumed time of GPU execution in our test). So after changing the local_size in OpenCL, it outperforms a lot than AMP.

 

Anyone come across similar issues? And what’s your solutions? I need your advice.

Thanks a lot.


Viewing all articles
Browse latest Browse all 856

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>