Hi,
Recently I have a weird problem with synchronizing array_views from GPU to CPU memory.
The short example below demonstrates this:
#include <vector> #include <amp.h> #include <algorithm> #include <random> int main() { std::mersenne_twister<unsigned int, 32, 624, 397, 31, 0x9908b0df, 11, 7, 0x9d2c5680, 15, 0xefc60000, 18> rng; const unsigned int vect_size = 10000; std::vector<float> test(vect_size); std::generate(test.begin(), test.end(), rng); concurrency::extent<1> ext(vect_size); concurrency::array_view<const float, 1> test_av(ext, test); std::vector<unsigned int> output(1920 * 1080); concurrency::extent<1> res(output.size()); concurrency::array_view<unsigned int, 1> output_av(res, output); output_av.discard_data(); concurrency::parallel_for_each(res, [=] (concurrency::index<1> idx) restrict(amp) { for(int i = 0; i < test_av.get_extent().size(); ++i) output_av[idx] += 10 + test_av[i]; // might overflow }); output_av.synchronize(); // <= ERROR HERE return 0; }
(Note that the above code is just an example to simulate the error message I'm receiving)
the above code executes perfectly fine up to where the output_av array_view is about to synchronize back to the CPU memory ( output_av.synchronize(); ), once the program reaches to this line I receive the below error message and my display driver stops responding for a few seconds:
My accelerator is AMD Radeon HD6970 with 2GB of memory, so I don't think the data size (10,000 floats and 2,073,600 unsigned ints) is the problem (I might be wrong).
Can anyone help me to understand where is the problem coming from? Am I doing something the wrong way?
Thanks for your time and help in advance :)