风过空庭,字句正徐来。
关于关于本站关于我给我点钱
更多时间线友链文件服务wiki
联系写留言发邮件GitHub
© 2024-2026 yono. | RSS 订阅 | 站点地图 | | Stay hungry. Stay foolish.
Powered by Mix Space&
白い
.
| 粤 ICP 备2024284785号-1 |
正在被0人看爆
纸白微明,未成篇章。

关于局部变量的栈行为——由循环语句内定义循环变量引申

(已编辑)
/
108
AI·GEN

关键洞察

这篇文章上次修改于,可能部分内容已经不适用,如有疑问可询问作者。

关于局部变量的栈行为——由循环语句内定义循环变量引申

  • Loading...
  • Loading...
  • Loading...
  • Loading...
  • Loading...
  • 结论

    结论写在前面

    1.

    在 for 循环语句内定义局部循环变量,使用 AC6 编译器或者 GCC 编译器,都不会因此产生多次栈操作,而是会使用相同的两个堆栈偏址。如果开优化,当二者逻辑功能没有实际差异时,汇编将完全一样。

    1. 事实上在 for 循环的同时定义循环变量是一个优秀的操作。对于将所有局部变量的定义全部提前到函数头部,会产生事实上的负优化或无优化(依据优化等级和编译器不同)。

    2. 由此引申,如果追求极致的性能,应当仅在使用到局部变量的分支中声明该局部变量。

    以下测试均以 stm32H7为目标编译

    用如下的写法讨论局部变量的栈操作

    for(int i = 0; i < 50; i++)
    {
        for(int j = 0; j < 50; j++)
        {
            HAL_Delay(1);
        }
    }
    

    感性来看,第一个循环每进行一次,都会声明一个局部变量 j,那么是否会由此产生多次的栈申请操作呢?

    这部分的反汇编如下

    ASM
    0x0000001e:       LDR      r0,[sp,#0]
    0x00000020:       STR      r0,[sp,#8]
    0x00000022:       B        {pc}+0x2 ; 0x24
    0x00000024:       LDR      r0,[sp,#8]
    0x00000026:       CMP      r0,#0x31
    0x00000028:       BGT      {pc}+0x2c ; 0x54
    0x0000002a:       B        {pc}+0x2 ; 0x2c
    0x0000002c:       MOVS     r0,#0
    0x0000002e:       STR      r0,[sp,#4]
    0x00000030:       B        {pc}+0x2 ; 0x32
    0x00000032:       LDR      r0,[sp,#4]
    0x00000034:       CMP      r0,#0x31
    0x00000036:       BGT      {pc}+0x14 ; 0x4a
    0x00000038:       B        {pc}+0x2 ; 0x3a
    0x0000003a:       MOVS     r0,#1
    0x0000003c:       BL       HAL_Delay
    0x00000040:       B        {pc}+0x2 ; 0x42
    0x00000042:       LDR      r0,[sp,#4]
    0x00000044:       ADDS     r0,#1
    0x00000046:       STR      r0,[sp,#4]
    0x00000048:       B        {pc}-0x16 ; 0x32
    0x0000004a:       B        {pc}+0x2 ; 0x4c
    0x0000004c:       LDR      r0,[sp,#8]
    0x0000004e:       ADDS     r0,#1
    0x00000050:       STR      r0,[sp,#8]
    0x00000052:       B        {pc}-0x2e ; 0x24
    

    外层循环

    不是我们的主要讨论对象,反正就是会使用跳转将内层循环执行50次

    ASM
    0x0000001e:       LDR      r0,[sp,#0]
    0x00000020:       STR      r0,[sp,#8]
    0x00000022:       B        {pc}+0x2 ; 0x24
    0x00000024:       LDR      r0,[sp,#8]
    0x00000026:       CMP      r0,#0x31
    0x00000028:       BGT      {pc}+0x2c ; 0x54
    0x0000002a:       B        {pc}+0x2 ; 0x2c
    ; .....内层循环
    0x0000004a:       B        {pc}+0x2 ; 0x4c
    0x0000004c:       LDR      r0,[sp,#8]
    0x0000004e:       ADDS     r0,#1
    0x00000050:       STR      r0,[sp,#8]
    0x00000052:       B        {pc}-0x2e ; 0x24
    

    内层循环

    CodeBlock Loading...

    2c、2e 两句 将 sp+4处堆栈的值置0

    然后利用增1和跳转,执行50次循环

    也就是每次执行外层循环都会有这套针对 sp+4处堆栈的操作逻辑,外层循环每次都是针对 sp+8处堆栈的操作逻辑

    如果是预先定义局部变量呢?

    改为如下写法

    CodeBlock Loading...

    这部分的反汇编如下

    CodeBlock Loading...

    可以看出,其循环部分(26-56)与此前的写法(22-52)部分没有差异,反而多将(sp+4)与(sp+8)置零的两条语句,产生了负优化。

    将循环复杂化是否会不一样

    如下代码,与反汇编,占用的(sp+8)与(sp+12),依然不会产生过多的堆栈操作

    CodeBlock Loading...
    CodeBlock Loading...

    如下代码,将声明提前,依然产生负优化了

    CodeBlock Loading...
    CodeBlock Loading...

    使用优化

    O1

    仍然是上面的循环复杂化

    for 内声明

    CodeBlock Loading...

    将声明提前,二者完全一致

    CodeBlock Loading...

    O2

    仍然是上面的循环复杂化

    for 内声明

    CodeBlock Loading...

    将声明提前,二者完全一致

    CodeBlock Loading...

    O3

    O3没有讨论价值,完全展开循环。

    GCC 环境下的情况

    将局部变量提前定义,同样是负优化

    局部变量定义在 for 内,20条

    局部变量先定义,24条

    ASM
    0x0000002c:       MOVS     r0,#0
    0x0000002e:       STR      r0,[sp,#4]
    0x00000030:       B        {pc}+0x2 ; 0x32
    0x00000032:       LDR      r0,[sp,#4]
    0x00000034:       CMP      r0,#0x31
    0x00000036:       BGT      {pc}+0x14 ; 0x4a
    0x00000038:       B        {pc}+0x2 ; 0x3a
    0x0000003a:       MOVS     r0,#1
    0x0000003c:       BL       HAL_Delay
    0x00000040:       B        {pc}+0x2 ; 0x42
    0x00000042:       LDR      r0,[sp,#4]
    0x00000044:       ADDS     r0,#1
    0x00000046:       STR      r0,[sp,#4]
    0x00000048:       B        {pc}-0x16 ; 0x32
    
    int i = 0;
    int j = 0;
    for(i = 0; i < 50; i++)
    {
        for(j = 0; j < 50; j++)
        {
            HAL_Delay(1);
        }
    }
    
    ASM
    0x0000001e:       LDR      r0,[sp,#0]
    0x00000020:       STR      r0,[sp,#8]
    0x00000022:       STR      r0,[sp,#4]
    0x00000024:       STR      r0,[sp,#8]
    0x00000026:       B        {pc}+0x2 ; 0x28
    0x00000028:       LDR      r0,[sp,#8]
    0x0000002a:       CMP      r0,#0x31
    0x0000002c:       BGT      {pc}+0x2c ; 0x58
    0x0000002e:       B        {pc}+0x2 ; 0x30
    0x00000030:       MOVS     r0,#0
    0x00000032:       STR      r0,[sp,#4]
    0x00000034:       B        {pc}+0x2 ; 0x36
    0x00000036:       LDR      r0,[sp,#4]
    0x00000038:       CMP      r0,#0x31
    0x0000003a:       BGT      {pc}+0x14 ; 0x4e
    0x0000003c:       B        {pc}+0x2 ; 0x3e
    0x0000003e:       MOVS     r0,#1
    0x00000040:       BL       HAL_Delay
    0x00000044:       B        {pc}+0x2 ; 0x46
    0x00000046:       LDR      r0,[sp,#4]
    0x00000048:       ADDS     r0,#1
    0x0000004a:       STR      r0,[sp,#4]
    0x0000004c:       B        {pc}-0x16 ; 0x36
    0x0000004e:       B        {pc}+0x2 ; 0x50
    0x00000050:       LDR      r0,[sp,#8]
    0x00000052:       ADDS     r0,#1
    0x00000054:       STR      r0,[sp,#8]
    0x00000056:       B        {pc}-0x2e ; 0x28
    
    int test = 0;
    for(int i = 0; i < 50; i++)
    {
        for(int j = 0; j < 50; j++)
        {
            if((test & 0x01) == 0)
                HAL_Delay(1);
            else
                HAL_Delay(2);
        }
        test++;
    }
    
    ASM
    0x0000001e:    9801        ..      LDR      r0,[sp,#4]
    0x00000020:    9004        ..      STR      r0,[sp,#0x10]
    0x00000022:    9003        ..      STR      r0,[sp,#0xc]
    0x00000024:    e7ff        ..      B        {pc}+0x2 ; 0x26
    0x00000026:    9803        ..      LDR      r0,[sp,#0xc]
    0x00000028:    2831        1(      CMP      r0,#0x31
    0x0000002a:    dc21        !.      BGT      {pc}+0x46 ; 0x70
    0x0000002c:    e7ff        ..      B        {pc}+0x2 ; 0x2e
    0x0000002e:    2000        .       MOVS     r0,#0
    0x00000030:    9002        ..      STR      r0,[sp,#8]
    0x00000032:    e7ff        ..      B        {pc}+0x2 ; 0x34
    0x00000034:    9802        ..      LDR      r0,[sp,#8]
    0x00000036:    2831        1(      CMP      r0,#0x31
    0x00000038:    dc12        ..      BGT      {pc}+0x28 ; 0x60
    0x0000003a:    e7ff        ..      B        {pc}+0x2 ; 0x3c
    0x0000003c:    f89d0010    ....    LDRB     r0,[sp,#0x10]
    0x00000040:    07c0        ..      LSLS     r0,r0,#31
    0x00000042:    b920         .      CBNZ     r0,{pc}+0xc ; 0x4e
    0x00000044:    e7ff        ..      B        {pc}+0x2 ; 0x46
    0x00000046:    2001        .       MOVS     r0,#1
    0x00000048:    f7fffffe    ....    BL       HAL_Delay
    0x0000004c:    e003        ..      B        {pc}+0xa ; 0x56
    0x0000004e:    2002        .       MOVS     r0,#2
    0x00000050:    f7fffffe    ....    BL       HAL_Delay
    0x00000054:    e7ff        ..      B        {pc}+0x2 ; 0x56
    0x00000056:    e7ff        ..      B        {pc}+0x2 ; 0x58
    0x00000058:    9802        ..      LDR      r0,[sp,#8]
    0x0000005a:    3001        .0      ADDS     r0,#1
    0x0000005c:    9002        ..      STR      r0,[sp,#8]
    0x0000005e:    e7e9        ..      B        {pc}-0x2a ; 0x34
    0x00000060:    9804        ..      LDR      r0,[sp,#0x10]
    0x00000062:    3001        .0      ADDS     r0,#1
    0x00000064:    9004        ..      STR      r0,[sp,#0x10]
    0x00000066:    e7ff        ..      B        {pc}+0x2 ; 0x68
    0x00000068:    9803        ..      LDR      r0,[sp,#0xc]
    0x0000006a:    3001        .0      ADDS     r0,#1
    0x0000006c:    9003        ..      STR      r0,[sp,#0xc]
    0x0000006e:    e7da        ..      B        {pc}-0x48 ; 0x26
    
    int test = 0;
    int i    = 0;
    int j    = 0;
    for(i = 0; i < 50; i++)
    {
        for(j = 0; j < 50; j++)
        {
            if((test & 0x01) == 0)
                HAL_Delay(1);
            else
                HAL_Delay(2);
        }
        test++;
    }
    
    ASM
    0x0000001e:    9801        ..      LDR      r0,[sp,#4]
    0x00000020:    9004        ..      STR      r0,[sp,#0x10]
    0x00000022:    9003        ..      STR      r0,[sp,#0xc]
    0x00000024:    9002        ..      STR      r0,[sp,#8]
    0x00000026:    9003        ..      STR      r0,[sp,#0xc]
    0x00000028:    e7ff        ..      B        {pc}+0x2 ; 0x2a
    0x0000002a:    9803        ..      LDR      r0,[sp,#0xc]
    0x0000002c:    2831        1(      CMP      r0,#0x31
    0x0000002e:    dc21        !.      BGT      {pc}+0x46 ; 0x74
    0x00000030:    e7ff        ..      B        {pc}+0x2 ; 0x32
    0x00000032:    2000        .       MOVS     r0,#0
    0x00000034:    9002        ..      STR      r0,[sp,#8]
    0x00000036:    e7ff        ..      B        {pc}+0x2 ; 0x38
    0x00000038:    9802        ..      LDR      r0,[sp,#8]
    0x0000003a:    2831        1(      CMP      r0,#0x31
    0x0000003c:    dc12        ..      BGT      {pc}+0x28 ; 0x64
    0x0000003e:    e7ff        ..      B        {pc}+0x2 ; 0x40
    0x00000040:    f89d0010    ....    LDRB     r0,[sp,#0x10]
    0x00000044:    07c0        ..      LSLS     r0,r0,#31
    0x00000046:    b920         .      CBNZ     r0,{pc}+0xc ; 0x52
    0x00000048:    e7ff        ..      B        {pc}+0x2 ; 0x4a
    0x0000004a:    2001        .       MOVS     r0,#1
    0x0000004c:    f7fffffe    ....    BL       HAL_Delay
    0x00000050:    e003        ..      B        {pc}+0xa ; 0x5a
    0x00000052:    2002        .       MOVS     r0,#2
    0x00000054:    f7fffffe    ....    BL       HAL_Delay
    0x00000058:    e7ff        ..      B        {pc}+0x2 ; 0x5a
    0x0000005a:    e7ff        ..      B        {pc}+0x2 ; 0x5c
    0x0000005c:    9802        ..      LDR      r0,[sp,#8]
    0x0000005e:    3001        .0      ADDS     r0,#1
    0x00000060:    9002        ..      STR      r0,[sp,#8]
    0x00000062:    e7e9        ..      B        {pc}-0x2a ; 0x38
    0x00000064:    9804        ..      LDR      r0,[sp,#0x10]
    0x00000066:    3001        .0      ADDS     r0,#1
    0x00000068:    9004        ..      STR      r0,[sp,#0x10]
    0x0000006a:    e7ff        ..      B        {pc}+0x2 ; 0x6c
    0x0000006c:    9803        ..      LDR      r0,[sp,#0xc]
    0x0000006e:    3001        .0      ADDS     r0,#1
    0x00000070:    9003        ..      STR      r0,[sp,#0xc]
    0x00000072:    e7da        ..      B        {pc}-0x48 ; 0x2a
    
    ASM
    0x00000014:    2400        .$      MOVS     r4,#0
    0x00000016:    bf00        ..      NOP      
    0x00000018:    f0040501    ....    AND      r5,r4,#1
    0x0000001c:    2632        2&      MOVS     r6,#0x32
    0x0000001e:    bf00        ..      NOP      
    0x00000020:    2002        .       MOVS     r0,#2
    0x00000022:    2d00        .-      CMP      r5,#0
    0x00000024:    bf08        ..      IT       EQ
    0x00000026:    2001        .       MOVEQ    r0,#1
    0x00000028:    f7fffffe    ....    BL       HAL_Delay
    0x0000002c:    3e01        .>      SUBS     r6,#1
    0x0000002e:    d1f7        ..      BNE      {pc}-0xe ; 0x20
    0x00000030:    3401        .4      ADDS     r4,#1
    0x00000032:    2c32        2,      CMP      r4,#0x32
    0x00000034:    d1f0        ..      BNE      {pc}-0x1c ; 0x18
    
    ASM
    0x00000014:    2400        .$      MOVS     r4,#0
    0x00000016:    bf00        ..      NOP      
    0x00000018:    f0040501    ....    AND      r5,r4,#1
    0x0000001c:    2632        2&      MOVS     r6,#0x32
    0x0000001e:    bf00        ..      NOP      
    0x00000020:    2002        .       MOVS     r0,#2
    0x00000022:    2d00        .-      CMP      r5,#0
    0x00000024:    bf08        ..      IT       EQ
    0x00000026:    2001        .       MOVEQ    r0,#1
    0x00000028:    f7fffffe    ....    BL       HAL_Delay
    0x0000002c:    3e01        .>      SUBS     r6,#1
    0x0000002e:    d1f7        ..      BNE      {pc}-0xe ; 0x20
    0x00000030:    3401        .4      ADDS     r4,#1
    0x00000032:    2c32        2,      CMP      r4,#0x32
    0x00000034:    d1f0        ..      BNE      {pc}-0x1c ; 0x18
    
    ASM
    0x00000014:    2500        .%      MOVS     r5,#0
    0x00000016:    bf00        ..      NOP      
    0x00000018:    2402        .$      MOVS     r4,#2
    0x0000001a:    2632        2&      MOVS     r6,#0x32
    0x0000001c:    07e8        ..      LSLS     r0,r5,#31
    0x0000001e:    bf08        ..      IT       EQ
    0x00000020:    2401        .$      MOVEQ    r4,#1
    0x00000022:    bf00        ..      NOP      
    0x00000024:    4620         F      MOV      r0,r4
    0x00000026:    f7fffffe    ....    BL       HAL_Delay
    0x0000002a:    3e01        .>      SUBS     r6,#1
    0x0000002c:    d1fa        ..      BNE      {pc}-0x8 ; 0x24
    0x0000002e:    3501        .5      ADDS     r5,#1
    0x00000030:    2d32        2-      CMP      r5,#0x32
    0x00000032:    d1f1        ..      BNE      {pc}-0x1a ; 0x18
    
    ASM
    0x00000014:    2500        .%      MOVS     r5,#0
    0x00000016:    bf00        ..      NOP      
    0x00000018:    2402        .$      MOVS     r4,#2
    0x0000001a:    2632        2&      MOVS     r6,#0x32
    0x0000001c:    07e8        ..      LSLS     r0,r5,#31
    0x0000001e:    bf08        ..      IT       EQ
    0x00000020:    2401        .$      MOVEQ    r4,#1
    0x00000022:    bf00        ..      NOP      
    0x00000024:    4620         F      MOV      r0,r4
    0x00000026:    f7fffffe    ....    BL       HAL_Delay
    0x0000002a:    3e01        .>      SUBS     r6,#1
    0x0000002c:    d1fa        ..      BNE      {pc}-0x8 ; 0x24
    0x0000002e:    3501        .5      ADDS     r5,#1
    0x00000030:    2d32        2-      CMP      r5,#0x32
    0x00000032:    d1f1        ..      BNE      {pc}-0x1a ; 0x18