Hello forum,
I am trying to understand the meaning of warp divergence rate metric in streamline for G-715 GPU.
Using the below test case and I was expecting the divergence rate metric to show up around 50% assuming the warp size on the GPU is 16 and local size is set 16, and only 8 threads are executing either if or else block. But the streamline shows Warp divergence of 98% and Full warp rate 100 and Number fragment warps 2 warps. Any insights on why the divergence is close to 100% would be appreciated. (I am launching the test with global size of X=16, Y=1, Z=1)
#version 320 eslayout(std430, binding = 0) buffer OutputBuffer { float data[];};layout(local_size_x = 16, local_size_y = 1, local_size_z = 1) in;void main() { uint threadId = gl_LocalInvocationID.x; if (threadId > 8u) { for(int i=0; i<64; i++) { data[threadId] = data[threadId] + float(threadId) * 2.0; } } else { for(int i=0; i<64; i++) { data[threadId] = data[threadId] + float(threadId) * 5.0; } }}
Warp size is 16 wide.
If only uses 8, so is divergent.
Else uses the other 8, so is divergent.
The if and the else contain a lot of instructions due to the loop, so the divergent code dominates the initial 16-wide non-divergent code that tests thread ID. 98% seems right to me.
*EDIT* Note that divergent counter simply counts the number of instruction issues that have any level of divergence. It does not count the amount of divergence in each instruction issue.