mem1 & mem2 management questions

Eke · Post by **Eke** » Tue Jan 21, 2014 8:59 am

Yes, it's actually done in current sbrk implementation when Arena2Low is equal to mem2_start.

Anyway, i tried a few different things but nothing worked, even when patching and recompiling libogc by hand, so I guess there is something else going on. Seems like the only way to go back to mem1 is to free objects in the reverse order they were allocated

Post by **WinterMute** » Mon Mar 17, 2014 1:44 pm

tueidj wrote:So much for "it's very rare to find any scenario where a custom memory manager has any significant benefit over the newlib allocator."

Snarky comments aren't really that helpful tbh. I had this discussion several times during commercial dev work as well and we never managed to find a situation where a custom allocator actually had a significant benefit. I used a custom allocator once to avoid the overhead of newlib in a GBA based multiboot project & it turned out that I'd significantly overestimated the overhead of using malloc.

Obviously it would be much better if mem1 wasn't effectively locked out once sbrk traverses regions and sbrk took account of trim but, as several people have found out when attempting to address the issue, it's really not as simple as it sounds. Of course, having said that, once it's figured out no doubt it will look simple.

BTW: the difference in speed between mem1 and mem2 is not due to being "shared with the arm7 processor and subject to bus arbitration."

The starlet code is running in mem2 which does make mem2 access from the powerpc measurably slower. It's faster if Starlet is restricted to it's own exclusive ram, much like the DS. It's also an arm9.

Mem1 is also shared by both CPUs and other hardware, the difference is it's simply a faster type of RAM (1T SRAM vs. GDDR3). Due to the nature of GDDR3, if your app is performing a lot of CPU<->MEM2 work (perhaps while using mem1 mainly for static textures) you can get a good performance improvement by calling L2Enhance() at the start of your app to activate 64-byte fetches for the L2 cache.

If I remember right, 64 byte fetches were enabled by default for Wii

Eke wrote: Seems like the only way to go back to mem1 is to free objects in the reverse order they were allocated

That seems rather odd tbh - dlmalloc is supposed to coalesce adjacent blocks so the order of deallocation shouldn't matter, trim should still get called at some point.

This is one of the things I always meant to investigate properly before adding the mem2 support in libogc. Unfortunately there was a massive disagreement over whether or not it happened and whether or not the block at the end of mem1 should be left in limbo.

dlmalloc is also supposed to be able to handle non-contiguous RAM and I never really got around to investigating fully. In theory trim could be called for a block in mem1 even if mem2 has been used so sbrk should probably be taking account of that as well.

Post by **WinterMute** » Tue Mar 18, 2014 5:53 pm

WinterMute wrote: dlmalloc is also supposed to be able to handle non-contiguous RAM and I never really got around to investigating fully. In theory trim could be called for a block in mem1 even if mem2 has been used so sbrk should probably be taking account of that as well.

Actually, that's nonsense, sbrk is called with an offset & doesn't know the address of the block being returned to the heap. Oh well.

Post by **WinterMute** » Wed Mar 19, 2014 2:31 am

So, after a bit of digging around it seems that something somewhere is allocating 4K immediately after the transition to mem2. Going to have to figure out where that's coming from.

I chucked the test code on github if anyone wants to have a play around. It's likely to need looking at in GDB to get anywhere though.

https://github.com/WinterMute/libogc_sbrk_test

tueidj · Post by **tueidj** » Mon Mar 31, 2014 4:05 pm

WinterMute wrote:Snarky comments aren't really that helpful tbh. I had this discussion several times during commercial dev work as well and we never managed to find a situation where a custom allocator actually had a significant benefit. I used a custom allocator once to avoid the overhead of newlib in a GBA based multiboot project & it turned out that I'd significantly overestimated the overhead of using malloc.

This problem has been around ever since mem2 support was added and it's taken this long to even get acknowledged, let alone fixed (and I'm sure this will be followed up by the usual "patches are welcome" BS. Sure they are, because it gives you an excuse to ask for donations without actually doing any work).
To say that a custom allocator can't do a better job than an implementation that potentially discards nearly all of MEM1 is ridiculous (plus you obviously never worked on an embedded project where dlmalloc's >20KB code size was prohibitive, that's low hanging fruit right there).

Obviously it would be much better if mem1 wasn't effectively locked out once sbrk traverses regions and sbrk took account of trim but, as several people have found out when attempting to address the issue, it's really not as simple as it sounds. Of course, having said that, once it's figured out no doubt it will look simple.

It is simple:
- discard the broken sbrk implementation
- create an mspace with base region of unused MEM1 (end of program data to top of MEM1), unlocked so it can be expanded
- make malloc() and friends use this mspace
- create an mmap implementation that handles 16KB pages of MEM2, using a bit array for tracking (~48MB / 16KB = 3072 pages = only 384 bytes).
- configure dlmalloc to use mmap for getting more core memory
- fix iosCreateHeap to use mmap (and add all the missing code to free heaps properly)

The starlet code is running in mem2 which does make mem2 access from the powerpc measurably slower. It's faster if Starlet is restricted to it's own exclusive ram, much like the DS. It's also an arm9.

I'd like to see those measurements. 99.9% of the time starlet/IOS is sitting in an idle loop, not touching the bus at all.
Starlet doesn't actually have any "exclusive ram" (I assume you mean the 128KB SRAM); it sits on the same bus as MEM2 and can be directly accessed by the powerpc when the correct bit is set appropriately.

If I remember right, 64 byte fetches were enabled by default for Wii

You remember wrong. It's actually advisable to not enable it unless you really know what you're doing, since all SDK and homebrew apps contain startup code to configure L2 access for 32 bytes and you can't switch down without a hard reset (courtesy of IOS via a title relaunch). So if your app launches another app, calling L2Enhance() will not end well.

devkitPro

mem1 & mem2 management questions

Re: mem1 & mem2 management questions

Re: mem1 & mem2 management questions

Re: mem1 & mem2 management questions

Re: mem1 & mem2 management questions

Re: mem1 & mem2 management questions

Who is online