Problem with mathematic functions (pow)

Eke · Post by **Eke** » Wed Sep 15, 2010 10:09 pm

Hi,

I've been facing a random bug in genesis plus gx and I finally managed to figure it was caused by some corrupted table initialisation in my code, which surprisingly does not behave in a deterministic way (as it should) and will sometime end computing wrong value.

This particular code:

Code: Select all

  for (x=0; x<256; x++)
  {
    m = (1<<16) / pow(2, (x+1) * (ENV_STEP/4.0) / 8.0);
    m = floor(m);

    /* we never reach (1<<16) here due to the (x+1) */
    /* result fits within 16 bits at maximum */

    n = (int)m; /* 16 bits here */
    n >>= 4;    /* 12 bits here */
    if (n&1)    /* round to nearest */
      n = (n>>1)+1;
    else
      n = n>>1;
                /* 11 bits here (rounded) */
    n <<= 2;    /* 13 bits here (as in real chip) */

    /* 14 bits (with sign bit) */
    tl_tab[ x*2 + 0 ] = n;
    tl_tab[ x*2 + 1 ] = -tl_tab[ x*2 + 0 ];

}

is called during initialization to compute the tl_tab array and I have observed that sometime (randomly and quite rarely but still), one of the entries (random as well) would hold a completely wrong value (for example 0x10000000 instead of 0x1824).

By analysing more deeply, I figured that the pow function from the math library sometimes returned a totally wrong value under some unknown conditions, the strangest thing being that adding some kind of delay (in that case, a "fprintf" to debug the computed values) in the loop seems to fix the issue (or maybe it was just a coincidence and I didn't give it enough tries to reproduce it, since it's random)

Anyone having an idea about what might be happening ?

shagkur · Post by **shagkur** » Thu Sep 16, 2010 9:59 am

Hi,

Do you disable interrupts somewhere around the initialization of this table ? I remember having some issues with that in conjunction with another thread doing some FP calculations.

Eke · Post by **Eke** » Thu Sep 16, 2010 12:32 pm

No I didn't. Disabling interrupt during this loop indeed seems to fix the random incorrect values.
Does it mean the FP registers are somehow not saved/restored properly when the pow() function is interrupted ? Or maybe something within FPU ?

In all cases, my issue appears to be fixed by this workaround so thanks a lot !

shagkur · Post by **shagkur** » Thu Sep 16, 2010 1:26 pm

Well, tbh, it should somehow destroy the FP registers if you disable the interrupts.
There's an issue, marcan figured, when you have 2 threads for example. And both do FP calculations and in both threads you disable the interrupts before the calculation and enable it after the calculation.
This test case seems to corrupt the FP registers. I couldn't figure out until now what's the correlation between disabling the interrupts and the FP bit in the MSR.
So i'm a bit surprised you got yours working by just disabling the interrupts.

FP save/restore is done in a lazy way (sort of). This means they will only get saved/restored when you're trampling over a FP register. On every thread switch the corresponding FP bit in the MSR gets masked and will get unmasked at the moment where the FP exception handler is triggered to do the save/restore. This solution is chosen on the fact that saving/restoring the FP registers on every context switch is very expensive and not for every thread necessarily needed.

Anyways, i'm glad this workaround as fixed your problems

Eke · Post by **Eke** » Thu Sep 16, 2010 11:26 pm

My situation is a little bit different since I only use one thread (the main application thread).

Now, from what I've read about PPC exception, the FPU is generally disabled during the IRQ handler then restored when returning from exception, by saving MSR initial bits value in SRR1 then clearing the FP bit (bit 18) in MSR (the return from interrupt instruction restoring MSR bits from SRR1) before executing interrupt routine.

In libogc IRQ handler however, it seems the FP bit is not cleared when entering the handler and instead cleared on exception return:

in irq_hanlder.S, irq_exceptionhandler after c_irqdispatcher

Code: Select all

	EXCEPTION_EPILOG

	mfmsr	r4
	rlwinm	r4,r4,0,19,17           // bit 18 is cleared
	rlwinm	r4,r4,0,31,29
	mtmsr	r4                           // this clears FP bit in MSR (why here ?)
	isync

	lwz		r0,GPR0_OFFSET(sp)
	lwz		toc,GPR2_OFFSET(sp)

	lwz		r4,SRR0_OFFSET(sp)     
	mtsrr0	r4
	lwz		r4,SRR1_OFFSET(sp)     // retrieve stacked SRR1
	rlwinm	r4,r4,0,19,17        // bit 18 is cleared           (should not be done to keep the saved value of FP ?)
	mtsrr1	r4                         // this clears FP bit in SRR1 

	lwz		r4,GPR4_OFFSET(sp)
	lwz		r3,GPR3_OFFSET(sp)
	addi	sp,sp,EXCEPTION_FRAME_END
	rfi                                        // return from interrupt, MSR bits 16-31 are copied with SRR1 bits 16-31

maybe I am wrongly reading the assembly code but it seems to me it leaves FPU disabled when returning from interrupt ?

If I'm understanding well, the FPU is re-enabled when FPU exception occurs, which I imagine happen with the first FPU instruction executed with FPU disabled. Maybe this very first instruction is not executed correctly ?

EDIT: I just removed the line that clears SRR1 bit 18 in irq_exceptionhandler, recompiled libogc and it fixed my issue as well (without the need of IRQ_Disable/IRQ_Restore)
I'm not sure if it has any unknown side effect in libogc to have FPU enabled on exception exit though (for multi-threading implementation maybe ?)

EDIT2: Ive been studying libogc some more and it seems that FPU is disabled by default in thread contexts (see __lwp_thread_loadenv, called by __lwp_thread_start in lwp_threads.c). While it doesn't matter when using only one thread (the FPU is initially enabled), it would however disable FPU when switching threads and restoring thread context . According to the PowerPC 7xx user manual, when the FPU is disabled and a FPU instruction is dispatched, FPU exception occurs (which, in the case of libogc implementation, would re-enable FPU) but the instruction that caused the exception is not executed, which can obviousy lead to wrong results in floating point operations.

I believe this is what happen: actually, if external interrupt (irq_exceptionhandler), decrementer exception (dec_exceptionhandler) or thread switching (_cpu_context_switch, _cpu_context_switch_ex, _cpu_context_restore) occurs in the middle of such operation, in all cases, FPU is forced and left disabled and the first floating point instruction that is dispatched when the preempted thread returns is not executed properly.

On that note, in the case of thread switching, it seems to me there is actually no point in restoring the MSR since this is called in an exception callback (__thread_dispatch in decrementer exception) and it will be overriden with SRR1 content when returning from the exception (rfi instruction). This SRR1 register is reseted at the beginning of exception handler, with current MSR bits (which is recommended exception implementation afaik) thus it holds dispatched thread MSR. For an unknown reason, FP bit is forced to 1 in SRR1 after that (exceptionhandler_start) then generally cleared (except for FPU and default exception handler) just before calling rfi instruction, leaving FPU disabled when returning from exception. I might be wrong but it seems to me there is something weird in libogc regarding exception implementation and MSR/SRR1 registers handling.

shagkur · Post by **shagkur** » Fri Sep 17, 2010 9:10 am

The point with disabling the FP bit in MSR in the irq_exception handler is that we want to achieve lazy FP context switching. Having this bit enabled in every thread context would mean to do FP context switching on every thread context switch. Which in turn is a very expensive task then. Remember we do not only have the normal FP registers, we also have the paired-singles registers which have to be saved/restored all the time too. This means we'd have to save/restore 2x32 registers, with 64b size.

From my understanding about exception processing and dispatching is that in SRR0 is the address stored of the instruction which triggered the exception. It's true the instruction which caused the exception isn't executed at this point. This is especially true for the FP unavailable exception, but since the address of the excepting instruction is stored in SRR0, we'd return to it at the end of the exception handler. So if i'm not totally wrong this instruction should get executed then.

So to make it short, leaving the FP bit enabled will work for one thread but not multithreaded. Especially if other threads will do FP work too.

Nevertheless i'll take a closer look at this, perhaps i've overseen or misunderstood something respectively.

Well the fact is we're in real not in an exception handler anymore. At that point where the decrementer exception handler or interrupt exception handler is called we already left the exception handler by a rfi.
Although subsequent handlers may do a rfi as well. When entering one of the subsequent handlers address translation and other bits are turned on and SRR0 is set by the default exception handler for the selected exception.

Furthermore, regarding thread_dispatch, this function may get called not only in a preemptive way it may also get called in a cooperative way. That's why i've to save/restore the MSR bit as well.
As i described above, we're already left the exception and thread_dispatch would switch to a thread which was previously switched cooperative (i.e. by a mutex lock), the newly executing thread context would return to the calling function when leaving thread_dispatch and _not_ back to the exception handler. This works because as i said above, we already left the exception.

Perhaps for the FP context switch i disable the FP bit one time too much, so to say. I really need to take a closer look at this.

Eke · Post by **Eke** » Fri Sep 17, 2010 4:01 pm

So if i'm not totally wrong this instruction should get executed then.

I just verified in the official documentation and you are off course right , the value saved in SRR0 by the CPU when FPU unavailable exception occurs is the address of the floating-point instruction causing the exception so it is indeed executed once FP bit has been set back.

Ok, I think I got your idea about thread context switching: basically you use the FPU unavailable exception to know when FPU should be re-enabled and FPU registers saved/restored. When you are returning from IRQ, the same thread keeps running so __thread_dispatch_fp would simply save then restore the same set of FPU registers value, which in theory would leave the FPU state intact for the interrupted operation to compute correctly.

With that in mind, I can now see two reasons that could explain the wrong result i got:

(1) the FPU is not restored properly by _cpu_context_restore_fp: seems very unlikely.
(2) the IRQ callback makes use of the FPU (it indeed remains enabled during exception processing) or/and is altering some FPU registers, which off course corrupts the result of the interrupted operation that was in progress. Could it be that some IRQ callback are operating on 64-bits integer which is optimized by gcc to use FPU registers ? I've seen this discussed here.

About (2), it would explain why disabling IRQ fixes my issue but it would not explain why simply leaving FPU enabled at the end of IRQ processing (but still having interrupts enabled) also fixes it, which seems to indicate this is still something else going on...

shagkur · Post by **shagkur** » Fri Sep 17, 2010 4:37 pm

A short note to the FP context switch: In case there was no thread context switch, ie same thread keeps running after exception, there won't be a FP context switch. In thread_dispatch_fp (lwp_threads.c) i check the executing context against the currently used FP context. And only if thread context has changed it'll do the save/restore.

About (2): There's indeed an IRQ callbacks doing 64bit operations - the decrementer exception and its callback respectively, also the video interrupt callback. So perhaps or probably this is the real issue because gcc is optimizing to make use of a double for 64bit operations. Although i don't hope so this is really happening.

But if this is the case then i'll have to either find a way to prevent gcc from optimizing it this way or to rework the Kernel. But a rework of the kernel would be either to dangerous or it'll slow down the whole thing as hell because it'll force me then to do the FP context switch every time the thread context switches or even when an exception occurs. Old versions of the gcc obviously didn't optimize that way......

It's unfortunately nearly impossible to debug this to find the location where FP corruption occurs.

I've just taken a look into the disassembly of the above mentioned handlers, which use 64bit, and they both use GPRs for their operations. no use of FP registers for the load/store or whatsoever.

Eke · Post by **Eke** » Fri Sep 17, 2010 5:19 pm

shagkur wrote:A short note to the FP context switch: In case there was no thread context switch, ie same thread keeps running after exception, there won't be a FP context switch. In thread_dispatch_fp (lwp_threads.c) i check the executing context against the currently used FP context. And only if thread context has changed it'll do the save/restore.

yeah, I've seen this. It's indeed handled by this code:

Code: Select all

if(!__lwp_thread_isallocatedfp(exec)) {
 	if(_thr_allocated_fp) _cpu_context_save_fp(&_thr_allocated_fp->context);
 	_cpu_context_restore_fp(&exec->context);
 	_thr_allocated_fp = exec;
}

I have a question however: what happen when _thr_allocated_fp is NULL, which I think is the case on init or after calling __lwp_thread_deallocatefp ? It seems to me current FPU registers won't be saved anywhere and would be restored from a previous saved context, which might have nothing to do with the current thread context. Isn't that risky ? I'm probably missing something here though

EDIT: ok I find it, there is the "STATE" bit in the context preventing this.

shagkur wrote:About (2): There's indeed an IRQ callbacks doing 64bit operations - the decrementer exception and its callback respectively, also the video interrupt callback. So perhaps or probably this is the real issue because gcc is optimizing to make use of a double for 64bit operations. Although i don't hope so this is really happening. But if this is the case then i'll have to either find a way to prevent gcc from optimizing it this way or to rework the Kernel. But a rework of the kernel would be either to dangerous or it'll slow down the whole thing as hell because it'll force me then to do the FP context switch every time the thread context switches or even when an exception occurs. Old versions of the gcc obviously didn't optimize that way......

It's unfortunately nearly impossible to debug this to find the location where FP corruption occurs.
.

I agree, if this is what is indeed happening, there is nothing you could do on your side, 64-bits optimization is necessary and I guess the best way is to disable IRQ when the probability of the floating point operation to be interrupted is getting high, which is the case when calling pow() function in a loop, you just increase your chance to get interrupted in the middle of the function implementation

EDIT:

've just taken a look into the disassembly of the above mentioned handlers, which use 64bit, and they both use GPRs for their operations. no use of FP registers for the load/store or whatsoever.

damn, it must be something else then

Post by **Izhido** » Fri Sep 17, 2010 10:21 pm

Well, it looks like we don't really have a choice in the matter. We should have a

Code: Select all

void FPU_Init(void);

function that we'll need to call at the start of every thread that will use floating-point operations.

Not sure if we'll need another one to signal the end of it... that one's up to you, guys who actually know what you're talking about

.

devkitPro

Problem with mathematic functions (pow)

Problem with mathematic functions (pow)

Re: Problem with mathematic functions (pow)

Re: Problem with mathematic functions (pow)

Re: Problem with mathematic functions (pow)

Re: Problem with mathematic functions (pow)

Re: Problem with mathematic functions (pow)

Re: Problem with mathematic functions (pow)

Re: Problem with mathematic functions (pow)

Re: Problem with mathematic functions (pow)

Re: Problem with mathematic functions (pow)

Who is online