This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Uniform control-flow cycles reported by MALIOC

I have a Unity shader using the multi-compile keyword. I am trying to replace it with a uniform flow-control in order to reduce the number of variants.

I have 4 questions.

Q1: I cannot understand the output of MALIOC (Mali-G71).

Arithmetic Cycles of Fragment shader (in all cases Total Cycles==Shortest Path Cycles==Longest Path Cycles)

- Without the keyword: 7.50

- With the keyword: 7.65

- Uniform flow-control: 7.50 

It seems to me that MALIOC reports the cycles of shader with uniform flow-control by assuming the uniform value, and thus only computes the cycles of a path.

If the instructions of both paths are executed, the cycles should be much longer.

Q2: Is uniform flow-control so terrible as described here ? https://developer.arm.com/documentation/101897/0200/shader-code/uniform-control-flow

Q3: May we assume that the driver optimises the shader on-the-fly based upon the uniform value so that only one branch will be executed (I guess not) ?

Q4: Which GPU counters should I check in Streamline for the potential problems of uniform flow control ? According to my experiment, the "Diverged instructions" are almost none in all cases.

Parents
  • I use UNITY_BRANCH to force a branch instruction to be generated. It's still uniform-based flow control.

    Unity shows the generated code uses if-statement instead of ternary operador (default). I cannot know how exactly they will be translated to in lower-level.

    According to my measurement with Streamline, the fragment cycles and executed instructions are almost the same for both implementation. If I understand it correctly, even if a branch instruction is generated, both then-path and else-path must be executed by the shader core anyway, is this correct ?

    Isn't there any branch prediction in this case ?

Reply
  • I use UNITY_BRANCH to force a branch instruction to be generated. It's still uniform-based flow control.

    Unity shows the generated code uses if-statement instead of ternary operador (default). I cannot know how exactly they will be translated to in lower-level.

    According to my measurement with Streamline, the fragment cycles and executed instructions are almost the same for both implementation. If I understand it correctly, even if a branch instruction is generated, both then-path and else-path must be executed by the shader core anyway, is this correct ?

    Isn't there any branch prediction in this case ?

Children
No data