Quantcast
Channel: Parallel Computing in C++ and Native Code forum
Viewing all articles
Browse latest Browse all 856

How do I test the time it takes to send a request to my graphics card and return a simple result?

$
0
0

I'm trying to measure graphics card latency (time taken to send a trivial piece of data and send it back). To do this I've designed a test that sends an index to the graphics card and returns the nth value in an array:

#include <amp.h>
#include <chrono>
#include <iostream>
#include <memory>

int main()
{
	// get a c-array of n numbers squared
	const int n = 10000;
	std::unique_ptr<float[]> squareSequence(new float[n]);
	for (int i = 0; i != n; ++i)
	{
		squareSequence[i] = (float) i * (float) i;
	}

	// send them to the GPU
	Concurrency::array<float, 1> squareSequenceGPU (n, squareSequence.get());

	// get a staging array where the nth square number will be stored
	Concurrency::array_view<float, 1> nthSquareNumber(1);
	nthSquareNumber.discard_data();

	std::chrono::nanoseconds totalTimeTaken (0);

	// ask for the nth square number n times
	for (int i = 0; i != n; ++i)
	{
		// start timing
		std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();

		// request the nth square number from the GPU
		parallel_for_each(
			nthSquareNumber.extent,
			[&, nthSquareNumber,n](Concurrency::index<1> idx) restrict(amp)
		{
			nthSquareNumber[0] = squareSequenceGPU[n];
		});

		// get the value from the GPU
		nthSquareNumber.synchronize();
		
		// end timer
		std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();

		// get timing
		std::chrono::nanoseconds time_span = std::chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1);
		std::cout << ".";

		totalTimeTaken += time_span;

		std::cout << "took " << (double)time_span.count() << "ns for number" << i << std::endl;

		// discard 1st since this will include kernel compilation time
		if (i == 0)
			totalTimeTaken = std::chrono::nanoseconds(0);
	}

	std::cout << "took " << (double)totalTimeTaken.count() / (double)(n-1) << "ns per number" << std::endl;

	system("pause");
}

However, when I run this, most of the timings come back as "0ns", with every hundredth or so being about a million nanoseconds. Obviously AMP is optimising my code here. How can I stop it from doing that in VS2013 express?


Viewing all articles
Browse latest Browse all 856

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>