This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why reordering uniforms affects arithmetic cycles?

Hello.


I've recently started using Mali Offline Compiler to get insight into our shaders and I get confusing results from it which I can't really explain.

So I have one quite big shader.

It has block of uniforms, quite large one cause it's uber shader.
I noticed that if I reorder uniforms in a different way - I get different results from Mali compiler.

#if HLSLCC_ENABLE_UNIFORM_BUFFERS
UNITY_BINDING(0) uniform UnityPerMaterial {
#endif
	UNITY_UNIFORM vec4 _MainTex_ST;
	UNITY_UNIFORM float _MainTexUVSet2;
	UNITY_UNIFORM vec4 _SecondaryTex_ST;
	UNITY_UNIFORM mediump vec4 _SecondaryColor;
	UNITY_UNIFORM float _SecondaryTexUVSet2;
	UNITY_UNIFORM vec4 _MaskTex_ST;
	UNITY_UNIFORM float _MaskTexUVSet2;
	UNITY_UNIFORM vec4 _DissolveTex_ST;
	UNITY_UNIFORM float _DissolveTexUVSet2;
	UNITY_UNIFORM mediump vec3 _MainColorBright;
	UNITY_UNIFORM mediump vec3 _MainColorMid;
	UNITY_UNIFORM mediump vec3 _MainColorDark;
	UNITY_UNIFORM mediump vec4 _MainColor;
	UNITY_UNIFORM vec2 _MainTexScrollSpeed;
	UNITY_UNIFORM vec2 _SecondaryTexScrollSpeed;
	UNITY_UNIFORM vec2 _DissolveTexScrollSpeed;
	UNITY_UNIFORM mediump float _Intensity;
	UNITY_UNIFORM mediump float _PSDriven;
	UNITY_UNIFORM mediump float _DissolveAmount;
	UNITY_UNIFORM mediump float _DissolveSoftness;
	UNITY_UNIFORM int _ScrollMainTex;
	UNITY_UNIFORM int _ScrollSecondaryTex;
	UNITY_UNIFORM int _ScrollDissolveTex;
	UNITY_UNIFORM int _MultiplyWithVertexColor;
	UNITY_UNIFORM int _MultiplyWithVertexAlpha;
	UNITY_UNIFORM int _UseGradientMap;
	UNITY_UNIFORM int _UseStepMasking;
	UNITY_UNIFORM float _Curvature;
	UNITY_UNIFORM mediump float _StepBorder;
	UNITY_UNIFORM mediump float _UseRForSecondaryTex;
	UNITY_UNIFORM mediump float _UseRForMask;
	UNITY_UNIFORM mediump float _MaskSecondTexWithFirst;
	UNITY_UNIFORM mediump float _UseRAsAlpha;
#if HLSLCC_ENABLE_UNIFORM_BUFFERS
};
 

So if I take let say _Curvature uniform and reorder it so it's before any other half/int variable
Here are results from fragment shader:

Mali Offline Compiler v7.4.0 (Build 330167)
Copyright 2007-2021 Arm Limited, all rights reserved

Configuration
=============

Hardware: Mali-T720 r1p1
Architecture: Midgard
Driver: r23p0-00rel0
Shader type: OpenGL ES Fragment

Main shader
===========

Work registers: 4
Uniform registers: 0
Stack spilling: false

                                A      LS       T    Bound
Total instruction cycles:   16.00    9.00    4.00        A
Shortest path cycles:       10.00    9.00    3.00        A
Longest path cycles:        10.25    9.00    3.00        A

A = Arithmetic, LS = Load/Store, T = Texture

And then they become

Mali Offline Compiler v7.4.0 (Build 330167)
Copyright 2007-2021 Arm Limited, all rights reserved

Configuration
=============

Hardware: Mali-T720 r1p1
Architecture: Midgard
Driver: r23p0-00rel0
Shader type: OpenGL ES Fragment

Main shader
===========

Work registers: 4
Uniform registers: 0
Stack spilling: false

                                A      LS       T    Bound
Total instruction cycles:   16.00    9.00    4.00        A
Shortest path cycles:        9.50    9.00    3.00        A
Longest path cycles:         9.75    9.00    3.00        A

A = Arithmetic, LS = Load/Store, T = Texture


This uniform is only used in vertex shader but somehow it also affects fragment shader results.

Why do arithmetic cycles are now different?

Right now I have no idea what affects it and how to optimize this in the best possible way and if I should even bother.
But when shader executes in let say 10 cycles and reordering fields can make it execute in 9 or even 8 cycles - this is 10-20% of performance to be gained so I would like to understand what's going on underhood.

Is there a way to get disassembly from mali compiler?
Right now it is a black box to me.

I am attaching both shaders and output from mali compiler in case someone will take a look.

mali.zip

Parents
  • I'm glad you're finding the tools useful =)

    For shader optimization, if you want to target entry-level lowest-common denominator I think there are really three major classes of interesting device in terms of giving different results:

    • Mali-T720 (SIMD, but without the uniform constant register optimization later GPUs have).
    • Mali-T820 (SIMD, but with the uniform constant register optimization)
    • Mali-G52 (Scalar instruction set).

    There were a lot of Mali-T720-based devices sold, but it's an old product now (first released 9 years ago) so I'd agree with your position that it's not worth worrying too much about. 

    Mali-T820 is Midgard (SIMD) which is a few years newer than Mali-T720, but still relatively old (first released 7 years ago). There are still a lot of Midgard devices kicking around, so it's probably still worth checking but I wouldn't totally rewrite your shaders for it, especially if those changes are detrimental to Mali-G52. 

    All (?) modern GPUs use scalar warp instruction sets (including both Mali and GPUs from other vendors) so the Mali-G52 results should more indicative of what you will see on any hardware released in the last 5 years. (Mali-G31 is a more restrictive target, but mostly found in embedded devices, so I wouldn't worry about that one unless you know you have users using it).

    Cheers, 
    Pete

Reply
  • I'm glad you're finding the tools useful =)

    For shader optimization, if you want to target entry-level lowest-common denominator I think there are really three major classes of interesting device in terms of giving different results:

    • Mali-T720 (SIMD, but without the uniform constant register optimization later GPUs have).
    • Mali-T820 (SIMD, but with the uniform constant register optimization)
    • Mali-G52 (Scalar instruction set).

    There were a lot of Mali-T720-based devices sold, but it's an old product now (first released 9 years ago) so I'd agree with your position that it's not worth worrying too much about. 

    Mali-T820 is Midgard (SIMD) which is a few years newer than Mali-T720, but still relatively old (first released 7 years ago). There are still a lot of Midgard devices kicking around, so it's probably still worth checking but I wouldn't totally rewrite your shaders for it, especially if those changes are detrimental to Mali-G52. 

    All (?) modern GPUs use scalar warp instruction sets (including both Mali and GPUs from other vendors) so the Mali-G52 results should more indicative of what you will see on any hardware released in the last 5 years. (Mali-G31 is a more restrictive target, but mostly found in embedded devices, so I wouldn't worry about that one unless you know you have users using it).

    Cheers, 
    Pete

Children
No data