I have been attempting write a convolution kernel in AMP to process the depth data from a Kinect and I've been experiencing some problems. I want to make the results form the kernel simply become smaller then the original image, but I don't seem to
be able to index into both images with a 2 dimensional index. To diagnose this problem I've been trying to write a simple operation that copies a portion of the original image to the results array.
The CPU implementation is as follows:
void SimpleOperationCPU(int* result, int* image, int width, int height)
{
int size = 2 * radius + 1;
for (int r = 0; r <= height - size; r++)
{
for (int c = 0; c <= width - size; c++)
{
result[r * (width - 2 * radius) + c] = image[(r) * (width) + c];
}
}
}
Using a strictly single dimensional extents I get the result I wanted (if a bit slower then I'd want):
void SimpleOperationSINGLE(int* kernel, int* image, int width, int height)
{
int size = 2 * radius + 1;
extent<1> imageExtent(width * height);
array_view<const int, 1> imageAMP(imageExtent, image);
extent<1> resultExtent((width - 2) * (height - 2 * radius));
array_view<int, 1> resultAMP(resultExtent, result);
int myWidth = width;
parallel_for_each(resultExtent,
[=](index<1> index) restrict(amp)
{
resultAMP[index] = imageAMP[index[0] + 2 * fast_math::floor(index[0] / myWidth)];
});
resultAMP.synchronize();
}
[success image excluded due to image number restrictions]
However, if I try to index into the arrays with a two dimensional extent it gives me an off by one error:
void SimpleOperation(int* result, int* image, int width, int height)
{
// Setup for AMP stuff
extent<2> imageExtent(width, height);
extent<2> resultExtent((width - 2 * radius), (height - 2 * radius));
array_view<const int, 2> imageAMP(imageExtent, image);
array_view<int, 2> resultAMP(resultExtent, result);
resultAMP.discard_data();
parallel_for_each(,
[=](index<2> idx) restrict(amp)
{
resultAMP(idx[0], idx[1]) = imageAMP(idx[0], idx[1]);
});
resultAMP.synchronize();
}
![]()
And trying to solve this with a mixed solution, it seems to cut off the edge:
void SimpleOperationMIXED(int* result, int* image, int width, int height)
{
int myOtherWidth = width - 2 * radius;
// Setup for AMP stuff
extent<1> imageExtent(width * height);
extent<1> resultExtent((width - 2 * radius) * (height - 2 * radius));
array_view<const int, 1> imageAMP(imageExtent, image);
array_view<int, 1> resultAMP(resultExtent, result);
resultAMP.discard_data();
int myWidth = width;
extent<2> e2(width - 2 * radius, height - 2 * radius);
parallel_for_each(e2,
[=](index<2> idx) restrict(amp)
{
resultAMP(idx[0] * myOtherWidth + idx[1]) = imageAMP(idx[0] * myWidth + idx[1]);
});
resultAMP.synchronize();
}
Do you think you could shed some light on these issues?