I'm hoping to tweak it to add a field called bias, which will allow you to choose the direction of the bias. So long as the algorithm iterates the field diagonally it is always the case that it can safely perform the convolution of the data and stick the answer to that in the corner that is never going to be used again.
So the entire thing seems to be pointless. If you put your matrix result point at the corner of the kernel, you can do the convolution with just a scanline. Literally a J/K loop, and never allocate another big block of memory.
Why didn't anybody point this out before. All the convolutions kicking around are mostly pointless. They are obsessed enough with keeping the pixel location consistent that they insist on odd ranged kernels and leave garbage at the edges (or more typically just don't apply it there). When really you could get the result in the same memory.