In "Arm Cortex-A77 Core Software Optimization Guide", it says:
In my understanding, it would be very difficult to predict multiple branch instructions in a 32-Byte aligned instruction memory region, since the branch directions of the former branches could influence that of the latter branches. Or could you give me any information? Thansk.
The CA77 software guide just gives you the recommendation not to do something if you want to achieve better performance.
It does not mean that you must place 4 BTI instructions in 32-byte range.