当前位置：首页 > news >正文

fstrict-aliasing

news 来源：原创 2024/4/29 6:42:43

承如“optimization blocks”文中所述，由于相同的指针可能指向相关的内存区，因此编译器将不做过分的优化……

特意搜了下编译器在不同的优化等级下都有哪些默认优化，因此有了此记录（比较长，尽管只讲述了fstrict-aliasing ……）

下述内容引述自：http://dbp-consulting.com/tutorials/StrictAliasing.html

http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule

gcc -v
        gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5)
arm-none-arm-none-linux-gnueabi-gcc -v
        gcc version 4.3.2 (Sourcery G++ Lite 2008q3-72)

首先看下在不同优化等级下gcc的默认优化参数(此处重点关注fstrict-aliasing  在O0（默认优化等级），O2（一般使用时的优化等级）
下的开启情况）
gcc -Q -O0 --help=optimizers
    -fstrict-aliasing                   [disabled]
arm-gcc -Q -O0 --help=optimizers
    -fstrict-aliasing                   [disabled]
    
gcc -Q -O2 --help=optimzers
    -fstrict-aliasing                   [enabled]
    
arm-gcc -Q -O0 --help=optimizers
    -fstrict-aliasing                   [enabled]

在O0等级下，无论是否显式的添加fstrict-aliasing，fstrict-aliasing都被禁用
在O2等级下，默认开启fstrict-aliasing，可以显式的添加fno-strict-aliasing以禁用fastrict-aliasing

先看下程序和运行结果（分arm与x86）

示例file1.c、
unsigned int swap_words(unsigned int arg)
{
    unsigned short int* const sp = (unsigned short int*)&arg;
    unsigned short int lo = sp[0];
    unsigned short int hi = sp[1];

    sp[0] = hi;
    sp[1] = lo;
    return arg;
}

int main(void)
{
    int x = 0x12345678;
    x = swap_words(x);
    printf("%08x\n", x);
    return 0;
}
gcc -m32 file1.c -o mytest : 56781234
swap_words:
    pushl   %ebp
    movl    %esp,      %ebp
    subl    $16,       %esp
    leal    8(%ebp),   %eax
    movl    %eax,      -8(%ebp)   ;sp
    movl    -8(%ebp),  %eax
    movzwl  (%eax),    %eax       ;arg    
    movw    %ax,       -2(%ebp)   ;arg(l) 
    movl    -8(%ebp),  %eax
    addl    $2,        %eax
    movzwl  (%eax),    %eax
    movw    %ax,       -4(%ebp)   ;arg(h) 
    movl    -8(%ebp),  %eax
    movzwl  -4(%ebp),  %edx
    movw    %dx,       (%eax)     ;arg: arg(h)arg(h)
    movl    -8(%ebp),  %eax
    leal    2(%eax),   %edx
    movzwl  -2(%ebp),  %eax      
    movw    %ax,       (%edx)     ;arg: arg(l)arg(h)
    movl    8(%ebp),   %eax
    leave   
    ret

gcc -m32 -O2 file1.c -o mytest: 12345678
swap_words:
    pushl   %ebp
    movl    %esp,      %ebp
    movzwl  8(%ebp),   %eax      ;arg    
    movzwl  10(%ebp),  %edx      ;arg(h) 
    movw    %ax,       10(%ebp)  ;arg(l)arg(l)
    movl    8(%ebp),   %eax      ;arg(l)arg(l)
    movw    %dx,       8(%ebp)   ;arg(l)arg(h)
    popl    %ebp    
    ret
尽管在swap_words中正确修改了调用者的栈上的值，但是在main中更本就不调用swap_words，
eax的值与我们的原意也不同

gcc -m32 -fno-strict-aliasing -O2 file1.c -o mytest :56781234
swap_words:
    pushl   %ebp
    movl    %esp,      %ebp
    movzwl  8(%ebp),   %eax
    movzwl  10(%ebp),  %edx
    movw    %ax,       10(%ebp)
    movw    %dx,       8(%ebp) 
    movl    8(%ebp),   %eax
    popl    %ebp    
    ret
正确计算了返回值

arm-gcc file1.c -o mytest: 56781234
swap_words:
    str     fp, [sp, #-4]!
    add     fp, sp, #0
    sub     sp, sp, #20 
    str     r0, [fp, #-16]      ;arg
    sub     r3, fp, #16        
    str     r3, [fp, #-12]      ;sp  
    ldr     r3, [fp, #-12]
    ldrh    r3, [r3, #0]        
    strh    r3, [fp, #-8]       ;arg[l]
    ldr     r3, [fp, #-12]
    add     r3, r3, #2
    ldrh    r3, [r3, #0]     
    strh    r3, [fp, #-6]       ;arg[h]
    ldr     r2, [fp, #-12]
    ldrh    r3, [fp, #-6]    
    strh    r3, [r2, #0]        ;arg:arg(l)arg(l)
    ldr     r3, [fp, #-12]
    add     r2, r3, #2
    ldrh    r3, [fp, #-8]    
    strh    r3, [r2, #0]        ;arg:arg(l)arg(h)
    ldr     r3, [fp, #-16]
    mov     r0, r3  
    add     sp, fp, #0
    ldmfd   sp!, {fp}
    bx      lr 

arm-gcc file1.c -O2 -o mytest: 12345678
swap_words:
    sub sp, sp, #8
    str r0, [sp, #4]
    add sp, sp, #8
    bx  lr  
因为实施了fstrict-aliasing优化，因此，swap_words中做了非常大的优化，以至于不做任何计算，直接返回r0

arm-gcc file1.c -O2 -fno-strict-aliasing -o mytest: 56781234
swap_words:
    sub sp, sp, #8
    str r0, [sp, #4]
    ldrh    r3, [sp, #4]
    ldrh    r2, [sp, #6]
    strh    r3, [sp, #6]    @ movhi   
    ldr r0, [sp, #4]
    add sp, sp, #8
    bx  lr  
当我们加上fno-strict-aliasing参数时，swap_words正确计算了返回值，尽管file1.c违背了strict-aliasing规则

先以如上实例阐述一些概念

通过以上的对比分析，我们看到当加入有效的fstrict-aliasing时，编译器做了大量的优化，由于我们的程序违背了strict-aliasing规则，但是这是由于我们程序员自身造成的，与编译器本身的优化无关。那么，如果我们既想编译器做出大量的优化，同时对于违背strict-aliasing规则的程序发出警告，该怎么办了？？？？加入-Wstrict-aliasing编译参.
例如：
arm-none-linux-gnueabi-gcc -O2 -Wstrict-aliasing file1.c -o mytest，会有如下输出：
warning: likely type-punning may break strict-aliasing rules
当我们遇到这类输出时一定要注意（要么我们就使用fno-strict-aliasing参数，例如内核,使用fno-strict-aliasing后只是在某些部分没有做优化，但不会导致编译出的程序与我们的原意相反）

关于c/c++中strict aliasing记录（重点是c）
aliasing is when more than one lvalue refers to the same memory location.

strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location. Enable this option in GCC with -fstrict-aliasing flag. Be sure that all code can safely run with this rule enabled. Enable strict aliasing related warnings with -Wstrict-aliasing, but do not expect to be warned in all cases. In order to discover aliasing problems as quickly as possible, -fstrict-aliasing should always be included in the compilation flags for GCC. Otherwise problems may only be visible at the highest optimization levels where it is the most difficult to debug（如果我们使用参数-fno-strict-warning，有些优化将不被实施，而且对于一些潜在的转换错误，编译器不会报出）

the compiler writers know what the strict aliasing rules are for. They are written to let compiler writers known when they can safely assume that a change made through one variable won't affect the value of another variable.
Conversely, the compiler must act very conservatively to access the memory when the compiler cannot assume that two object are not aliased.

在编译选项中加入-fstrict-aliasing的优势在于向编译器说明不同类型的lvalue将指向不相关的内存区域，编译器可以做大量的优化。在编译内核的编译选项CFLAGS中，加入了-fno-strict-aliasing，向编译器表明不同类型的lvalue可能指向相关的内存区域，因此编译器不会做出一些极端的优化而造成不安全（内核编译中优化选项为-O2, -O2优化时默认是-fstrict-aliasing，因此需要显
式的指出编译参数是-fno-strict-aliasing）

1、编译器设计者知道一些左值量会指向相关的内存区域，因此不会做出一些不安全的优化（即使编译选项显式的指出为-fstrict-aliasing,也不会做极端的优化，因为这些类型的左值量有可能指向相关的内存的区域）；
2、当我们给编译器加入-fstrict-aliasing选项时，就意味着使用编译器的程序员清晰地向编译器表明：不同类型的左值量不会指向相关的内存区域，编译器可以做出大量的优化，由此产生的不良后果由程序员负责，而非编译器设计者负责;
   当使用-fstrict-aliasing时，有如下基本示例表明不同类型的左值将不会指向相关的内存区域：
   (1)pointers to different built in types do not alias
   (2)pointers to aggregate or union types with differing tags do not alias
   (3)pointers to aggregate or union types which differ only by name may alias

在 1 中提到:一些左值量可能指向相关的内存区域，那么都有哪些可能呢？，如下：
    An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
    (1)— a type compatible with the effective type of the object,
    (2)— a qualified version of a type compatible with the effective type of the object,
    (3)— a type that is the signed or unsigned type corresponding to the effective type of the object,
    (4)— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
    (5)— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
    (6)— a character type.
即：
    (1)Things that are compatible types or differ only by the addition of any combination of signed, unsigned, or volatile. For most purposes compatible type just means the same type. If you want more details you can read the specs. (Example: If you get a pointer to long, and a pointer to const unsigned long they could point to the same thing.)
    (2)An aggregate (struct or class) or union type can alias types contained inside them. (Example: If a function gets passed a pointer to an int, and a pointer to a struct or union containing an int, or possibly containing another struct or union containing an int, or containing...ad infinitum, it's possible that the int* points to an int contained inside the struct or union pointed at by the other pointer.)
    (3)A character type. A char*, signed char*, or unsigned char* is specifically allowed by the specs to point to anything. That means it can alias anything in memory.
    (4)For C++ only, a possibly CV (const and/or volatile) qualified base class type of a dynamic type can alias the child type. (Example: if class dog has class animal for a base class, pointers or references to class dog and class animal can alias.)
    (5)Any lvalue has to be assumed to possibly alias to another lvalue if these rules say that they can alias. An aliasing issue is just as likely to come up with values passed by reference as it is with values passed as pointer to values. Additionally any combination of pointers and references have a possibility of aliasing.

看完上面的内容，我们来测试下：

示例file2.c, 此实例验证了上述的第二点（2）

typedef struct
{
    unsigned int a;
    unsigned int b;
    unsigned int c;
}Sample;

void test(unsigned int* values, 
          Sample* uniform, 
          unsigned int count)
{
    unsigned int i = 0;
    for(i = 0; i < count; i++)
    {
        values[i] = (unsigned int)uniform->b;
    }
}

gcc file2.c -o mytest

test:
    pushl   %ebp
    movl    %esp,         %ebp
    subl    $16,          %esp
    movl    $0,           -4(%ebp)       ;i      
    movl    $0,           -4(%ebp)
    jmp .L2 
.L3:
    movl    -4(%ebp),     %eax     ;i
    sall    $2,           %eax
    addl    8(%ebp),      %eax     ;values + 4 * i 
    movl    -4(%ebp),     %edx     ;i
    sall    $2,           %edx
    addl    8(%ebp),      %edx     ;values + 4 * i 
    movl    (%edx),       %ecx
    movl    12(%ebp),     %edx
    movl    4(%edx),      %edx     ;每次重新取得uniform->b
    leal    (%ecx,%edx),  %edx    
    movl    %edx,         (%eax)
    addl    $1,           -4(%ebp)
.L2:
    movl    -4(%ebp),     %eax     ;i      
    cmpl    16(%ebp),     %eax     ;i-count
    jb  .L3 
    leave   
    ret


gcc file2.c -O2 -o mytest
test:
    pushl   %ebp
    movl    %esp,        %ebp
    pushl   %esi
    movl    8(%ebp),     %edx
    pushl   %ebx
    movl    16(%ebp),    %ebx
    movl    12(%ebp),    %esi
    testl   %ebx,        %ebx
    je  .L4 
    xorl    %eax,        %eax
.L3:
    movl    4(%esi),     %ecx   ;每次重新取得uniform->b
    addl    %ecx, (%edx,%eax,4)
    addl    $1,          %eax
    cmpl    %eax,        %ebx
    ja  .L3 
.L4:
    popl    %ebx    
    popl    %esi    
    popl    %ebp    
    ret


arm-gcc file2.c -O2 -o mytest:
test:
    stmfd   sp!, {r4, r5}  
    subs    r5, r2, #0
    mov     r4, r1
    beq     .L4 
    mov     r1, #0
    mov     ip, r1
.L3:
    ldr     r3, [r0, ip]
    ldr     r2, [r4, #4]   ;每次重新取得uniform->b
    add     r1, r1, #1
    add     r3, r3, r2
    cmp     r5, r1
    str     r3, [r0, ip]
    add     ip, ip, #4
    bhi     .L3 
.L4:
    ldmfd   sp!, {r4, r5}
    bx      lr

=========================================================================

The most commonly accepted method of converting one type of object to another is by
using a union type;
实例file3.c, 
typedef union
{
        unsigned int u32;
        unsigned short int u16[2]; 
}U32;

unsigned int swap_words(unsigned int arg)
{
    U32 in; 
    unsigned short int lo; 
    unsigned short int hi; 

    in.u32 = arg;
    lo = in.u16[0];
    hi = in.u16[1];

    in.u16[0] = hi;
    in.u16[1] = lo;

    return in.u32;
}

gcc file3.c -S -o file3.s
swap_words:
pushl   %ebp
movl    %esp,      %ebp
subl    $16,       %esp
movl    8(%ebp),   %eax       ;arg
movl    %eax,      -8(%ebp)
movzwl  -8(%ebp),  %eax       ;arg
movw    %ax,       -2(%ebp)   ;arg(l) 
movzwl  -6(%ebp),  %eax       ;arg(h) 
movw    %ax,       -4(%ebp)   ;arg(h) 
movzwl  -4(%ebp),  %eax
movw    %ax,       -8(%ebp)   ;arg(h)arg(h)
movzwl  -2(%ebp),  %eax
movw    %ax,       -6(%ebp)   ;arg(l)arg(h)
movl    -8(%ebp),  %eax
leave   
ret


gcc file3.c -O2 -S -o file3.s
既做了优化，有没有错误,验证上述第二点
swap_words:
    pushl   %ebp
    movl    %esp,    %ebp
    movl    8(%ebp), %eax
    popl    %ebp    
    roll    $16,     %eax
    ret

arm-gcc file3.c -S -o file3.s
swap_words:
    str     fp, [sp, #-4]!
    add     fp, sp, #0
    sub     sp, sp, #20 
    str     r0, [fp, #-16]      ;arg
    ldr     r3, [fp, #-16]      ;arg
    str     r3, [fp, #-12]      ;arg
    ldrh    r3, [fp, #-12]      ;arg(l)
    strh    r3, [fp, #-8]       ;arg(l)
    ldrh    r3, [fp, #-10]      ;arg(h)
    strh    r3, [fp, #-6]       ;arg(h)
    ldrh    r3, [fp, #-6]       ;arg(h)
    strh    r3, [fp, #-12]      ;arg(h)arg(h)
    ldrh    r3, [fp, #-8]       ;arg(l)
    strh    r3, [fp, #-10]      ;arg(l)arg(h)
    ldr     r3, [fp, #-12]      ;arg(l)arg(h)
    mov     r0, r3
    add     sp, fp, #0
    ldmfd   sp!, {fp}
    bx      lr  

arm-gcc file3.c -O2 -S -o file3.s
//直接用了循环右移来实现，优化啊！！！
swap_words:
    mov r0, r0, ror #16 
    bx  lr

=============================================================================

Casting proper may be done between a pointer to a type and a pointer to an aggregate or union type which contains a member of a compatible type;


实例file4.c
unsigned int swap_words(unsigned int arg)
{
    U32* in = (U32*)&arg;
    unsigned short int lo = in->u16[0];
    unsigned short int hi = in->u16[1];

    in->u16[0] = hi;
    in->u16[1] = lo;
    return (in->u32);
}

gcc -S -o file4.s
swap_words:
    pushl   %ebp
    movl    %esp,     %ebp    
    subl    $16,      %esp    
    leal    8(%ebp),  %eax       
    movl    %eax,     -8(%ebp)   ;in
    movl    -8(%ebp), %eax
    movzwl  (%eax),   %eax       ;arg    
    movw    %ax,      -2(%ebp)   ;arg(l)
    movl    -8(%ebp), %eax
    movzwl  2(%eax),  %eax       ;arg(h) 
    movw    %ax,      -4(%ebp)   ;arg(h) 
    movl    -8(%ebp), %eax       ;in     
    movzwl  -4(%ebp), %edx       ;arg(h) 
    movw    %dx,      (%eax)     ;arg(h)arg(h)
    movl    -8(%ebp), %eax       ;in     
    movzwl  -2(%ebp), %edx       ;arg(l) 
    movw    %dx,      2(%eax)    ;arg(l)arg(h)
    movl    -8(%ebp), %eax
    movl    (%eax),   %eax       ;arg(l)arg(h)
    leave   
    ret

gcc -S -O2 -o file4.s
swap_words:
    pushl   %ebp
    movl    %esp,     %ebp    
    movzwl  8(%ebp),  %eax
    movzwl  10(%ebp), %edx
    movw    %ax,      10(%ebp)
    movw    %dx,      8(%ebp)
    movl    8(%ebp),  %eax
    popl    %ebp    
    ret


arm-gcc file4.c -S -o file4.s
swap_words:
    str     fp, [sp, #-4]!
    add     fp, sp, #0
    sub     sp, sp, #20 
    str     r0, [fp, #-16]  ;arg
    sub     r3, fp, #16 
    str     r3, [fp, #-12]  ;in
    ldr     r3, [fp, #-12]
    ldrh    r3, [r3, #0]    ;arg(l)
    strh    r3, [fp, #-8]   ;arg(l)
    ldr     r3, [fp, #-12]   
    ldrh    r3, [r3, #2]    ;arg(h)
    strh    r3, [fp, #-6]   ;arg(h)
    ldr     r2, [fp, #-12]
    ldrh    r3, [fp, #-6]   ;arg(h)
    strh    r3, [r2, #0]    ;arg(h)arg(h)
    ldr     r2, [fp, #-12]
    ldrh    r3, [fp, #-8]   ;arg(l)
    strh    r3, [r2, #2]    ;arg(l)arg(h)
    ldr     r3, [fp, #-12]
    ldr     r3, [r3, #0]    ;arg(l)arg(h)
    mov     r0, r3  
    add     sp, fp, #0
    ldmfd   sp!, {fp}
    bx      lr      

arm-gcc file4.c -O2 -S -o file4.s
swap_words:
    sub     sp, sp, #8
    add     r3, sp, #8
    str     r0, [r3, #-4]!  ;arg
    ldrh    r2, [sp, #4]    ;arg(l)
    ldrh    r1, [r3, #2]    ;arg(h)
    strh    r1, [sp, #4]    ;arg(h)arg(h)
    strh    r2, [r3, #2]    ;arg(l)arg(h)
    ldr     r0, [sp, #4]    ;arg(l)arg(h)
    add     sp, sp, #8
    bx      lr

============================================================================

实例file5.c（因为对于部分结构来说空间较大，用指针传递）
void swap_words(unsigned int *arg)
{
    U32* combined = (U32*)arg;
    unsigned int start = combined->u32;
    unsigned int lo = start >> 16;
    unsigned int hi = start << 16;

    unsigned int final = lo | hi;

    combined->u32 = final;
}

gcc file5.c -S -o file5.s
swap_words:
    pushl   %ebp
    movl    %esp,       %ebp    
    subl    $32,        %esp    
    movl    8(%ebp),    %eax    
    movl    %eax,       -4(%ebp)  ;combined 
    movl    -4(%ebp),   %eax
    movl    (%eax),     %eax    
    movl    %eax,       -8(%ebp)  ;start  
    movl    -8(%ebp),   %eax
    shrl    $16,        %eax    
    movl    %eax,       -12(%ebp) ;lo
    movl    -8(%ebp),   %eax
    sall    $16,        %eax    
    movl    %eax,       -16(%ebp) ;hi
    movl    -16(%ebp),  %eax      ;hi     
    movl    -12(%ebp),  %edx      ;lo     
    orl     %edx,       %eax      ;hi | lo
    movl    %eax,       -20(%ebp) ;final
    movl    -4(%ebp),   %eax
    movl    -20(%ebp),  %edx
    movl    %edx,       (%eax)    ;combined->u32 = final
    leave   
    ret

gcc file5.c -O2 -S -o file5.s  
swap_words:
    pushl   %ebp
    movl    %esp,    %ebp
    movl    8(%ebp), %eax
    roll    $16,     (%eax)
    popl    %ebp    
    ret

arm-gcc file5.c -S -o file5.s 
swap_words:
    str fp, [sp, #-4]!
    add fp, sp, #0
    sub sp, sp, #36 
    str r0, [fp, #-32]
    ldr r3, [fp, #-32]   
    str r3, [fp, #-24]   ;combined
    ldr r3, [fp, #-24]
    ldr r3, [r3, #0]     
    str r3, [fp, #-20]   ;start
    ldr r3, [fp, #-20]
    mov r3, r3, lsr #16 
    str r3, [fp, #-16]   ;lo
    ldr r3, [fp, #-20]
    mov r3, r3, asl #16 
    str r3, [fp, #-12]   ;hi
    ldr r2, [fp, #-16]
    ldr r3, [fp, #-12]
    orr r3, r2, r3
    str r3, [fp, #-8]    ;final
    ldr r2, [fp, #-24]
    ldr r3, [fp, #-8]
    str r3, [r2, #0]     ;combined->u32 = final
    add sp, fp, #0
    ldmfd   sp!, {fp}
    bx  lr  

arm-gcc file5.c -O2 -S -o file5.s
//注意与file4.c中使用同样编译选项的编译结果的区别(file5.c比较高效,
对于小结构体而言，file3.c同样的编译选项更高效)  
swap_words:
    ldr r3, [r0, #0]
    mov r3, r3, ror #16
    str r3, [r0, #0]
    bx  lr

==============================================================================

实例file6.c
Occasionally a programmer may encounter the following INVALID method for creating an alias with 
a pointer of a different type: (注意上述（2）的适用范围不包括如下情形）
typedef union
{
    unsigned short int* sp;
    unsigned int* wp;
}U32P;

unsigned int swap_words(unsigned int arg)
{
    U32P in = {.wp = &arg};
    const unsigned int hi = in.sp[1];
    const unsigned int lo = in.sp[0];

    in.sp[0] = hi;
    in.sp[1] = lo;

    return arg;
}

int main(void)
{
    int x = 0x12345678;
    int y = swap_words(x);

    printf("x:%08x  y:%08x\n", x, y);
    return 0;
}

gcc file6.c -o mytest: 12345678   56781234
swap_words:
    pushl   %ebp
    movl    %esp,      %ebp    
    subl    $16,       %esp    
    movl    $0,        -4(%ebp)
    leal    8(%ebp),   %eax
    movl    %eax,      -4(%ebp)  ;wp
    movl    -4(%ebp),  %eax      ;sp     
    addl    $2,        %eax      ;sp + 2  
    movzwl  (%eax),    %eax      
    movzwl  %ax,       %eax    
    movl    %eax,      -8(%ebp)  ;hi
    movl    -4(%ebp),  %eax
    movzwl  (%eax),    %eax    
    movzwl  %ax,       %eax    
    movl    %eax,      -12(%ebp) ;lo
    movl    -4(%ebp),  %eax
    movl    -8(%ebp),  %edx      ;hi     
    movw    %dx,       (%eax)    ;sp[0]  = hi
    movl    -4(%ebp),  %eax
    leal    2(%eax),   %edx
    movl    -12(%ebp), %eax
    movw    %ax,       (%edx)    ;sp[1] = lo
    movl    8(%ebp),   %eax      ;lo:hi
    leave   
    ret

gcc file6.c -O2 -o mytest: 12345678 12345678
swap_words:
    pushl   %ebp
    movl    %esp,     %ebp    
    movzwl  8(%ebp),  %eax      ;arg(l) 
    movzwl  10(%ebp), %edx      ;arg(h) 
    movw    %ax,      10(%ebp)  ;arg(l)arg(l)
    movl    8(%ebp),  %eax      ;arg(l)arg(l)
    movw    %dx,      8(%ebp)   ;arg(l)arg(h)
    popl    %ebp    
    ret
尽管在swap_words中正确修改了调用者的栈上的值，但是在main中更本就不调用用swap_words
eax的值与我们的原意也不同

arm-gcc -o mytest : 12345678 56781234
swap_words:
    str fp, [sp, #-4]!     
    add fp, sp, #0
    sub sp, sp, #28 
    str r0, [fp, #-24]     ;arg  
    mov r3, #0
    str r3, [fp, #-16]
    sub r3, fp, #24 
    str r3, [fp, #-16]    ;wp
    ldr r3, [fp, #-16]    ;sp
    add r3, r3, #2
    ldrh    r3, [r3, #0]  
    str r3, [fp, #-12]    ;hi
    ldr r3, [fp, #-16]    ;sp
    ldrh    r3, [r3, #0]
    str r3, [fp, #-8]     ;lo
    ldr r2, [fp, #-16]    ;sp
    ldr r3, [fp, #-12]    ;hi
    mov r3, r3, asl #16 
    mov r3, r3, lsr #16 
    strh    r3, [r2, #0]  ;arg(h)arg(h)
    ldr r3, [fp, #-16]
    add r2, r3, #2
    ldr r3, [fp, #-8]
    mov r3, r3, asl #16 
    mov r3, r3, lsr #16 
    strh    r3, [r2, #0]  ;arg(l)arg(h)
    ldr r3, [fp, #-24]    ;arg(l)arg(h)
    mov r0, r3
    add sp, fp, #0
    ldmfd   sp!, {fp}
    bx  lr  

arm-gcc -O2 -o mytest : 12345678 12345678
swap_words:
    sub sp, sp, #8
    str r0, [sp, #4]
    add sp, sp, #8
    bx  lr
因为实施了fstrict-aliasing优化，因此，swap_words中做了非常大的优化，以至于不做任何计算，直接返回r0  

The problem with this method is although U32P does in fact say that sp is an alias for wp, 
it does not say anything about the relationship between the values pointed to by sp and wp. 
This differs in a critical way from the experiment in "file3.c & file4.c"  which both define 
aliases for the values being pointed to, not the pointers themselves.

=========================================================================

对于上述情形（3）的说明：
It is always presumed that a char* may refer to an alias of any object. It is therefore quite safe, if perhaps a bit unoptimal (for architecture with wide loads and stores) to cast any pointer of any type to a char* type. (即由于char*可能与任何类型的左值相同，因此编译器不会做过分的优化，无论是否指定fstrict-aliasing参数选项都在安全范围内做优化，使用char*带来的问题是可能存在多次load/store，可能会降低性能)

如下代码，使用-O0, 与-O2编译出的程序执行效果相同，都可以交换字内的字节序(注意情形（3））
unsigned int swap_words(unsigned int arg)
{
    char* const cp = (char*)arg;
    const char c0 = cp[0];
    const char c1 = cp[1];
    const char c2 = cp[2];
    const char c3 = cp[3];

    cp[0] = c2;
    cp[1] = c3;
    cp[2] = c0;
    cp[3] = c1;

    return arg;
}

对于情形（3），相反的情况则不一定成立，即：
The converse is not true. Casting a char* to a pointer of any type other than a char* and dereferencing it is usually in volation of the strict aliasing rule.

大叔来了，回去洗澡，有点乱，再看看