This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why is ROM consumption in double is bigger than single-precision floating-point?

Hello everyone,

I'm analyzing two codes exactly the same on a ARM Cortex-M4F (Armv7E-M).
The only difference is that one uses double precision floating-point, and the another single one.
I realized that the double precision one consumes more ROM memory than the single.

My guess is that since Floating-Point Unit (FPU) of ARM Cortex-M4F is natively made for 32bits floating-points, when the code uses 64bits, the program generates more assembly code to overcome this limitation somehow.

Anyone knows why?
Can anyone provide any source or material that validate your statement?

Respectfully,
Tiago

Top replies

WestfW over 2 years ago +1 verified

If all your CPU has is a 32bit floating-point unit, any double-precision calculations will be done using the software-only 64 libraries (at a cost of ROM space, and probably more performance loss than...

Parents

+1 WestfW over 2 years ago

If all your CPU has is a 32bit floating-point unit, any double-precision calculations will be done using the software-only 64 libraries (at a cost of ROM space, and probably more performance loss than you expected.)

; double z, z2;

; float x, y;

x += y;
    4300:       4b0d            ldr     r3, [pc, #52]
    4302:       4a0e            ldr     r2, [pc, #56]
    4304:       edd3 7a00       vldr    s15, [r3]
    4308:       ed92 7a00       vldr    s14, [r2]
    430c:       ee77 7a87       vadd.f32        s15, s15, s14   ;; floating point add instruction
void loop() {
    4310:       b510            push    {r4, lr}
z += z2;
    4312:       4c0b            ldr     r4, [pc, #44]
x += y;
    4314:       edc3 7a00       vstr    s15, [r3]
z += z2;
    4318:       4b0a            ldr     r3, [pc, #40]
    431a:       e9d4 0100       ldrd    r0, r1, [r4]
    431e:       e9d3 2300       ldrd    r2, r3, [r3]
    4322:       f002 f87b       bl      641c <__adddf3>     ;; call to floating point add function.
    4326:       e9c4 0100       strd    r0, r1, [r4]

(Possibly depending on compiler. I don't know of any compiler that uses a single point FPU to help with Doubles, and it looks "hard", but I guess it's possible.)

(I believe that M4F does not have an option for double-precision HW. That doesn't show up until M7.)

See also: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/10-useful-tips-to-using-the-floating-point-unit-on-the-arm-cortex--m4-processor
Cancel
Up +1 Down

Cancel

Reply

+1 WestfW over 2 years ago

If all your CPU has is a 32bit floating-point unit, any double-precision calculations will be done using the software-only 64 libraries (at a cost of ROM space, and probably more performance loss than you expected.)

; double z, z2;

; float x, y;

x += y;
    4300:       4b0d            ldr     r3, [pc, #52]
    4302:       4a0e            ldr     r2, [pc, #56]
    4304:       edd3 7a00       vldr    s15, [r3]
    4308:       ed92 7a00       vldr    s14, [r2]
    430c:       ee77 7a87       vadd.f32        s15, s15, s14   ;; floating point add instruction
void loop() {
    4310:       b510            push    {r4, lr}
z += z2;
    4312:       4c0b            ldr     r4, [pc, #44]
x += y;
    4314:       edc3 7a00       vstr    s15, [r3]
z += z2;
    4318:       4b0a            ldr     r3, [pc, #40]
    431a:       e9d4 0100       ldrd    r0, r1, [r4]
    431e:       e9d3 2300       ldrd    r2, r3, [r3]
    4322:       f002 f87b       bl      641c <__adddf3>     ;; call to floating point add function.
    4326:       e9c4 0100       strd    r0, r1, [r4]

(Possibly depending on compiler. I don't know of any compiler that uses a single point FPU to help with Doubles, and it looks "hard", but I guess it's possible.)

(I believe that M4F does not have an option for double-precision HW. That doesn't show up until M7.)

See also: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/10-useful-tips-to-using-the-floating-point-unit-on-the-arm-cortex--m4-processor
Cancel
Up +1 Down

Cancel

Children

No data