Doom64 DS

Discostew · Post by **Discostew** » Mon Apr 30, 2012 6:29 pm

You could go and scale them down yourself via nearest-neighbor interpolation. That is, load the texture into RAM uncompressed, then make a new texture that is smaller in dimension and manually copy over the rows/columns of the original texture based on just how much scaling you want on it. A factor of 2 (from 128 to 64) is simply skipping over every other row/column when copying.

A compressed texture bigger than uncompressed? Assuming a 128x128 8-bit paletted texture using all 256 indices, that would equate to 20,480 bytes total. With compressed textures of the same size, the block and index data would only take up 1,536 bytes, so if it were larger, more than 18944 bytes (9472 colors) are used for the palette? It's understandable that the palette is the killer when it comes to the NDS's compressed texture format, but with a good algorithm to arrange the palette so multiple texel blocks can use the same color group, I'm sure that number can be reduced significantly. It would be a matter of examining the colors used in each texel block, comparing them to the colors in all the other blocks, and then arranging/editing the palette so only the unique groups of colors (from 2 to 4 colors each) remain.

However, since these textures are considered animated and paletted, you wouldn't be using the exact same palette each time, so each palette would most likely be unique to that frame of animation. Still, if you can keep the total amount of data for each frame as a compressed texture less than 16384 bytes (with a 128x128 8-bit texture), then you're saving space. Also be aware with compressed textures, that because the palette is the main factor here, you'll have to deal with the limited palette space of up to 96KB (VRAM E, F, G)

Using compressed textures does come at a cost. The data and index have to be specifically aligned within the VRAM banks for them to work. A location in Slot 0 has a direct position associated with the lower 64KB of Slot 1, and a location in Slot 2 has the same in the upper 64KB of Slot 1. Slot 3 cannot be used at all. The VRAM allocation work I did on videoGL library some time ago handles this arranging automatically, but you must be aware that any mixing of non-compressed and compressed textures may result in pockets being formed. Non-compressed textures of the appropriate size that are loaded after this mixing can fill these spots, so all is not lost.

elhobbs · Post by **elhobbs** » Mon Apr 30, 2012 6:30 pm

I do not think you need a particulary sophisticated scale algorith for the ds. the hardware is using point sampling to render the textures.

I tried a few different approaches for heretic (uses doom engine in case you are not familiar) and the sprites are the main issue. the available vram is so small and the data set of visible sprites is changing so quickly that it is too much work to move stuff in and out of vram - not to mention lost space due to fragmentation. building a per frame list (meaning texture data loaded sequentially into a buffer) of all textures in main ram (a 256k buffer if you are splitting AB and CD per frame) and dma copying it to vram was the most efficient approach I came up with. the dma copy is extremely quick - even for 2 vram banks.

Kaiser · Post by **Kaiser** » Mon Apr 30, 2012 7:40 pm

elhobbs wrote:I do not think you need a particulary sophisticated scale algorith for the ds. the hardware is using point sampling to render the textures.

I tried a few different approaches for heretic (uses doom engine in case you are not familiar) and the sprites are the main issue. the available vram is so small and the data set of visible sprites is changing so quickly that it is too much work to move stuff in and out of vram - not to mention lost space due to fragmentation. building a per frame list (meaning texture data loaded sequentially into a buffer) of all textures in main ram (a 256k buffer if you are splitting AB and CD per frame) and dma copying it to vram was the most efficient approach I came up with. the dma copy is extremely quick - even for 2 vram banks.

I've tried what you suggested a few days ago and as a result I got horrendous VBlank tearing every time I dma the buffer to the vram (is it because I am drawing geometry before I dma the texture buffer?) Though calling swiWaitForVBlank right before I call DC_FlushAll prevents the tearing from happening but as a result I get about ~5 FPS when fighting more than 5 unique monsters at once. I might try a different approach to this and see what happens.

sverx · Post by **sverx** » Mon Apr 30, 2012 8:55 pm

Kaiser wrote:
sverx wrote: a 128x128 pixel 256 colors texture is 16KB, so you can put 8 of them into a 128KB bank.[...]
In some cases, even 8 is not enough. The final level, for example, has you fighting against a large wave of monsters and usually are the ones with 128x128 sprites... in addition to unique frames and rotations.

In that case you could temporarily 'downsample' vertically or (/and, eventually) horizontally the textures. Or use a sort of texture mipmapping on distant objects only... or mix both

elhobbs · Post by **elhobbs** » Mon Apr 30, 2012 9:26 pm

I am actually using a modified version of the iphone doom renderer (updated for ds of course). the render piple-line is:
translate view
generate list of walls, flats, sprites by walking bsp and clipping against occlusion buffer
render walls
render flats
render sprites
render sky
load vram buffer to vram
glFlush

the iphone render is nice for the ds. it uses the segs to walk the bsp and determine what is visible, but it draws walls using the whole lines.

you do not need to wait for vblank using this approach as the vram you want to write to is not being used for the current frame being rendered. only unlock the two banks you plan to write to, then dma copy, then lock again.

I am also scaling the textures and keeping them cached in memory (letting z_malloc handle the caching). I constrain all of the textures to a few block sizes:
256x128
128x64
64x64
32x32
8x8

the ds screen is so little that you are never going to see full scale textures unless you are right on top of something.

I do want to add low res/mipmap textures for far away sprites, but I have not gotten around to it.

Kaiser · Post by **Kaiser** » Mon Apr 30, 2012 10:44 pm

Mines is almost the same as well, except I am drawing segs instead of lines and I have a function that's called before rendering that resets the viewport, depth and fog values. That could be responsible for the tearing... need to investigate that when I get home.

Kaiser · Post by **Kaiser** » Tue May 01, 2012 6:20 am

Okay, I am getting extremely frustrated with this. I am even going by code by code with how you have it in CHeretic and I am still getting very thick lines going across the screen. If this is correct, I am basically doing the following steps:

* Clear depth values, set viewport size and fog
* Setup view frustum
* Traverse BSP
* Draw lines/subsectors/sprites
* During line/subsector/sprite drawing, call Memcpy32 to copy the texture data into a texture buffer in which the texture buffer's array size is (256*256*2)
* Call DC_FlushAll once done
* Dma the texture buffer into destination (VRAM_A/B)
* Call glFlush
* Switch destination to (VRAM_C/D)
* Repeat

Not sure what I am doing wrong but it seems like if I don't call swiWaitForVBlank before dma'ing the texture buffer to the VRAM destination, I'll continue seeing these thick lines that cover up almost the entire middle portion of the screen. Even if I do, the framerate becomes completely unacceptable.

elhobbs · Post by **elhobbs** » Tue May 01, 2012 12:32 pm

Hard to tell without something to look at, but are you compensating for the texture addresses starting at vram_a vs vram_c on alternating frames? In regards to framerate, are you scaling the textures each frame or loading from disk each frame? 5 fps is really slow adding a vblank wait would not make you drop from 60, 30 or even 20 fps down to 5.

Post by **mtheall** » Thu May 03, 2012 4:10 am

Kaiser wrote: * During line/subsector/sprite drawing, call Memcpy32 to copy the texture data into a texture buffer in which the texture buffer's array size is (256*256*2)
* Call DC_FlushAll once done
* Dma the texture buffer into destination (VRAM_A/B)

Why are you copying twice? Wouldn't it be more efficient to just memcpy32 straight to vram? You wouldn't need the dma or the cache flush at all, and it could even be faster than the dma itself.

Kaiser · Post by **Kaiser** » Thu May 03, 2012 4:40 am

elhobbs wrote:Hard to tell without something to look at, but are you compensating for the texture addresses starting at vram_a vs vram_c on alternating frames? In regards to framerate, are you scaling the textures each frame or loading from disk each frame? 5 fps is really slow adding a vblank wait would not make you drop from 60, 30 or even 20 fps down to 5.

Hey, sorry for the delay...

Here's the code I used. It's somewhat messy but I cleaned it up so hopefully you can understand it. I simplified it so only the stuff in question is shown here. Note that this is from a local version and not from the SVN and I am not drawing sprites yet for this experiement, just textures which are always RGB16 and usually 64x64:

For the most part, textures are displayed correctly, it's just the problem of the lines and the lag during the DMA copy.

Code: Select all


static int gfx_tex_stride = 0;
static int gfx_texpal_stride = 0;
static byte gfx_tex_buffer[0x40000];
static byte gfx_pal_buffer[0x10000];
static uint32* gfx_base;

void memcpy32(void *dst, const void *src, uint wdcount) ITCM_CODE;

//
// R_LoadTexture
//

void R_LoadTexture(dtexture texture)
{

**********GETTING TEXTURE DATA*************

        ts = ((width * height) >> 1); // always RGB16
        memcpy32(gfx_tex_buffer + gfx_tex_stride, data, ts);

        gfxtextures.textures[texture] =
            GFX_TEXTURE(
            (TEXGEN_OFF | GL_TEXTURE_WRAP_S | GL_TEXTURE_WRAP_T),
            TEXTURE_SIZE_64,   // temp
            TEXTURE_SIZE_64,   // temp
            GL_RGB16,
            ((uint32*)gfx_base + (gfx_tex_stride >> 2)));

**********GETTING PALETTE DATA*************

            ps = (16 << 1); // always 16 color palette
            memcpy32(gfx_pal_buffer + gfx_texpal_stride, paldata, ps);
            gfxtextures.palettes[texture] = GFX_VRAM_OFFSET((VRAM_E + (gfx_texpal_stride >> 2)));
            gfx_texpal_stride += ps;
        }

        gfx_tex_stride += ts;
    }

    GFX_TEX_FORMAT = gfxtextures.textures[texture];
    GFX_PAL_FORMAT = gfxtextures.palettes[texture];
}


//
// R_DrawFrame
//

void R_DrawFrame(void)
{

*************SETUP VIEW FRUSTUM STUFF************

    gfx_base = (gametic & 1) ? (uint32*)VRAM_C : VRAM_A;

*************DRAW GEOMETRY************

    if(gfx_tex_stride > 0)
    {

        // here, calling swiWaitForVBlank before DC_FlushAll will prevent thick lines from covering the screen but also results in horrible lag

        DC_FlushAll();

        if(gametic & 1)
        {
            VRAM_C_CR   = VRAM_ENABLE;
            VRAM_D_CR   = VRAM_ENABLE;
        }
        else
        {
            VRAM_A_CR   = VRAM_ENABLE;
            VRAM_B_CR   = VRAM_ENABLE;
        }

        VRAM_E_CR   = VRAM_ENABLE;

        dmaCopyWords(0, (uint32*)gfx_tex_buffer, (uint32*)gfx_base, gfx_tex_stride);
        dmaCopyWords(0, (uint32*)gfx_pal_buffer, (uint32*)VRAM_E, gfx_texpal_stride);

        if(gametic & 1)
        {
            VRAM_C_CR   = VRAM_ENABLE | VRAM_C_TEXTURE;
            VRAM_D_CR   = VRAM_ENABLE | VRAM_D_TEXTURE;
        }
        else
        {
            VRAM_A_CR   = VRAM_ENABLE | VRAM_A_TEXTURE;
            VRAM_B_CR   = VRAM_ENABLE | VRAM_B_TEXTURE;
        }

        VRAM_E_CR   = VRAM_ENABLE | VRAM_E_TEX_PALETTE;
    }

**************POP MATRIX STUFF*******************

    GFX_FLUSH = 0;
}

devkitPro

Doom64 DS

Re: Doom64 DS

Re: Doom64 DS

Re: Doom64 DS

Re: Doom64 DS

Re: Doom64 DS

Re: Doom64 DS

Re: Doom64 DS

Re: Doom64 DS

Re: Doom64 DS

Re: Doom64 DS

Who is online