Page 1 of 1

Aligned thread-locals initialized to incorrect value

Posted: Wed Jun 15, 2022 2:10 pm
by ian-h-chamberlain
Hello! I have been debugging an issue for a while in which I found the initial value of __thread variables to be wrong (building for the 3DS), under certain circumstances. The following code reproduces the issue. Notably, changing these seems to resolve it:
  • using ALIGN(8) or less for BUF_16
  • building with -O1 or higher

Code: Select all

#include <3ds.h>
#include <stdio.h>
#include <string.h>

typedef ALIGN(4) struct {
    u8 inner[3];
} Align4;

typedef ALIGN(16) struct {
    u8 inner[3];
} Align16;

static __thread Align4 BUF_4 = {.inner = {2, 2, 2}};
static __thread Align16 BUF_16 = {.inner = {1, 1, 1}};

int
main(int argc, char** argv)
{
    gfxInitDefault();
    consoleInit(GFX_TOP, NULL);

    BUF_16.inner[0] = 0;

    bool reproduced = false;

    printf("[");
    for (int i = 0; i < 3; i++) {
        if (BUF_4.inner[i] != 2) {
            reproduced = true;
        }
        printf("%d, ", BUF_4.inner[i]);
    }
    printf("]\n");

    if (reproduced) {
        printf("reproduced!\n");
    }
    else {
        printf("nope");
    }

    // Main loop
    while (aptMainLoop()) {
        gspWaitForVBlank();
        hidScanInput();

        u32 kDown = hidKeysDown();
        if (kDown & KEY_START)
            break;  // break in order to return to hbmenu

        // Flush and swap framebuffers
        gfxFlushBuffers();
        gfxSwapBuffers();
    }

    gfxExit();
    return 0;
}
From looking at objdump output, what I believe is happening (although I am no expert) is that the linker appears to generating incorrect offsets for the thread-local variables. The constant pool at the end of main looks like this:

Code: Select all

  100d2c:	eb0013c3 	bl	105c40 <__aeabi_read_tp>
  100d30:	e1a03000 	mov	r3, r0
  100d34:	e59f2114 	ldr	r2, [pc, #276]	; 100e50 <main+0x148> ; example use of the offsets, in this case BUF_16
  100d38:	e3a01000 	mov	r1, #0
  100d3c:	e7c31002 	strb	r1, [r3, r2] ; BUF_16.inner[0] = 0
...
  100e4c:	e8bd8800 	pop	{fp, pc}
  100e50:	00000024 	.word	0x00000024 ; offset for BUF_16
  100e54:	00000014 	.word	0x00000014 ; offset for BUF_4
  100e58:	00121000 	.word	0x00121000
  100e5c:	00121008 	.word	0x00121008
  100e60:	0012100c 	.word	0x0012100c
  100e64:	00121018 	.word	0x00121018
Whereas the thread-local initializer data looks like this, which (if I understand correctly) would seem to indicate that the offsets should be 0x1C and 0xC, respectively (including the 0x8 ARM thread-local offset). Hex-editing the binary to 0x1C and 0xC results in the expected behavior.

Code: Select all

Contents of section .tdata:
 12af0c 00000000 02020200 00000000 00000000  ................
 12af1c 00000000 01010100                    ........        
The reason I suspect the linker is that the object file itself has all zero offsets (I presume these get filled in during relocation of the object file during linking)?

Code: Select all

 140:	e24bd004 	sub	sp, fp, #4
 144:	e8bd8800 	pop	{fp, pc}
 148:	00000000 	.word	0x00000000 ; BUF_16
 14c:	00000000 	.word	0x00000000 ; BUF_4
 150:	00000000 	.word	0x00000000
 154:	00000008 	.word	0x00000008
 158:	0000000c 	.word	0x0000000c
 15c:	00000018 	.word	0x00000018
I'm hoping for any ideas about what might be causing this (am I accidentally creating UB or something?), or other workarounds to emit the proper offsets for thread-locals with alignment like this. I tried changing some ALIGN() directives in the 3dsx.ld linker script, but got inconsistent results and I'm a bit out of my depth when it comes to writing linker scripts, so I'm hoping someone here can point me in the right direction.

Re: Aligned thread-locals initialized to incorrect value

Posted: Thu Apr 20, 2023 3:00 pm
by WinterMute
Apologies for not approving this post sooner.

This issue was fixed with https://github.com/devkitPro/libctru/pull/504

Thanks for the PR