View unanswered posts | View active topics It is currently Thu Dec 14, 2017 8:57 am



Reply to topic  [ 17 posts ]  Go to page 1, 2  Next
 Heavy calculations on nds original 
Author Message

Joined: Thu Sep 22, 2011 5:47 pm
Posts: 13
Hello,
I'm new here and i'm not sure where to post :oops: ... (Move the topic if i made a mistake)

I once made a mandelbrot viewer for my pc and wanted to port it to my nds.
I have already made some simple nds programs and have years of c++ experience.
My problem is the time it takes to calculate on the nds.
I was thinking and i thought:
"Well... the arm9 takes a lot of time, but how about the arm7? It sits mostly idle."
So, is it possible (without overheating the arm7) to make a adapted arm7 binary with a copy of the calculation algorithm?
I've read it's possible to use fifo to communicate between the two.
And i think is should be possible for the arm7 to do half of the calculations to speed things up, the faster arm9 then has more time to put things on screen and to sync them.
Can i do this without breaking/overheating the arm7?

The scheme give a symbolic (no programming knowledge required) representation of the idea:

Scheme:
Code:
arm9:
--point1
now at pixel (x,y).
send(x,y)->arm7
x+1
y+1
--point 2
now at pixel(x,y)
calculate(x,y)
plot x,y to screen
x+1
y+1
check for reply from arm7, yes: plot to screen and goto point 1
else: goto point 2

arm7:
--point 1
message from arm9?
yes:
 -retrieve x,y
 -calculate
 -send x,y,value
do arm7 stuff
goto point 1


Help please.

_________________
We all ask nooby questions every once in a while.
Nobody's perfect, we all need help every once in a while.
Our questions might sound stupid to people but then, so do their's to us.
So please, think before calling someone you don't know a noob.


Thu Sep 22, 2011 6:03 pm
Profile

Joined: Thu Feb 03, 2011 10:47 pm
Posts: 210
Splitting your algorithm across the two CPU's in this manner will actually probably be slower than just doing it directly on the ARM9, due to the overhead of inter-processor communication. I would first look into optimizing your code. There are two main things I would do first: compile in ARM mode (instead of thumb), and set optimization to level 3 (O3). So your makefile would have the following changes:

Code:
ARCH  :=  -mthumb -mthumb-interwork -march=armv5te -mtune=arm946e-s

becomes
Code:
ARCH  :=  -marm -mthumb-interwork -march=armv5te -mtune=arm946e-s


Code:
CFLAGS  :=  -g -Wall -O2

becomes
Code:
CFLAGS  :=  -g -Wall -O3


These two things can give significant performance boosts. Alternative to the first Makefile change is to change your source file names from XXX.cpp to XXX.arm.cpp. This will cause only those cpp files to be compiled as ARM code.

After this, you might be interested into finding ways to improve your algorithm. It is generally not faster to split the workload across the CPU's; I don't think there's really a native way to enforce synchronization.


Thu Sep 22, 2011 6:44 pm
Profile

Joined: Wed Mar 31, 2010 6:05 pm
Posts: 212
the time to calculate one mandelbrot pixel may be more than the fifo synchronization time, especially if you use floating point maths. but since you havent even made your mandelbrot viewer yet on the arm9, it seems, you have no idea how fast it is running, making this a premature optimization. "putting things on screen and syncing it" will probably take a trivial amount of time compared to the computations.

at the very least, you should issue larger blocks than 1px to the arm7.

youre not going to overheat it.


Thu Sep 22, 2011 7:45 pm
Profile

Joined: Thu Feb 03, 2011 10:47 pm
Posts: 210
I think he has already run the code on NDS and has seen that it has room for improvement. I think that using -O3 and -marm are good first steps to seeing if the results are more acceptable, and I was suggesting to make optimizations to the algorithm after having tested using these options.

If it really came down to it and you really, really insisted on using the ARM7, I would send larger workloads than 1 pixel (maybe a workload of several lines, or hell, send it the whole workload you expect it to do). Additionally, I would keep in mind that the ARM9 runs at twice the clock speed as the ARM7, so it may make more sense to give 2/3 of the work to the ARM9 and 1/3 to the ARM7. Of course, this also means that you have to write two separate binaries (look at the arm7 and user fifo examples). Also keep in mind that the ARM7 code may optimize differently than the ARM9 code.


Thu Sep 22, 2011 8:05 pm
Profile
Site Admin

Joined: Tue Aug 09, 2005 3:21 am
Posts: 1210
Location: UK
The first step in speeding up mandelbrot on the DS is using fixed point and possibly the hardware divider - floating point on the DS is done by software emulation so it's not really particularly fast. Building ARM code rather than thumb will help a lot for this kind of thing.

Your algorithm for having the arm7 handle calculations is likely to slow things down - working pixel by pixel and waiting for results isn't good. The algorithm would need to be parallelised so the arm7 can work on part of the screen while the arm9 handles the rest but bear in mind that the arm7 only has 96K available for code, data and stack space. This sort of thing is fairly advanced and I'd prefer to see code that works on the arm9 before going down this road.

Nothing you do in code is going to overheat or break the arm7.

_________________
Help keep devkitPro toolchains free, Donate today

devkitPro IRC support
Personal Blog


Thu Sep 22, 2011 11:28 pm
Profile ICQ WWW

Joined: Wed Mar 31, 2010 6:05 pm
Posts: 212
if it runs fast enough that offloading a share of it to the arm7 even makes sense as an optimization, then you're close to the goal anyway. just optimize it by a factor of 2 and youll have nailed it.


Fri Sep 23, 2011 12:05 am
Profile

Joined: Thu Sep 22, 2011 5:47 pm
Posts: 13
I don't have much time so:
I use the escapetime algorithm so it has to be done pixel by pixel
I'm already using -03
And the arm9 doesn't wait for the arm7.
I just wanna know if this can block the irq's and other stuff the arm7 does.

_________________
We all ask nooby questions every once in a while.
Nobody's perfect, we all need help every once in a while.
Our questions might sound stupid to people but then, so do their's to us.
So please, think before calling someone you don't know a noob.


Fri Sep 23, 2011 5:58 am
Profile
Site Admin

Joined: Tue Aug 09, 2005 3:21 am
Posts: 1210
Location: UK
I don't think you quite understand what we're saying here.

This procedure you outlined in your first post describes offloading *all* the calculations to the arm7 and sending co-ordinates for each pixel, leaving the arm9 to just send/receive data and plot pixels.

Code:
arm9:
--point1
now at pixel (x,y).
send(x,y)->arm7
x+1
y+1
--point 2
now at pixel(x,y)
calculate(x,y)
plot x,y to screen
x+1
y+1
check for reply from arm7, yes: plot to screen and goto point 1
else: goto point 2

arm7:
--point 1
message from arm9?
yes:
 -retrieve x,y
 -calculate
 -send x,y,value
do arm7 stuff
goto point 1


In efffect what you're doing here is just adding extra steps for each pixel and performing the calculations on a slower processor. Logically this will perform at less than half the speed of what you're doing now - the arm7 is clocked at half the speed of the arm9. Plotting pixels is a miniscule part of the time taken for this code - it makes no sense to leave the arm9 almost completely idle while the arm7 does all the heavy lifting.

After looking up the escape time algorithm I don't see anything that indicates a dependence on other calculated values so there's no reason to send anything for every single pixel. It should be relatively straightforward to break this down into a procedure where each processor takes half the screen so all you really need to tell the arm7 is the start co-ordinate and let it send color data for each pixel back to the arm9. The libnds FIFO API will let you install a callback handler on the arm9 which should just plot a pixel, leaving the arm9 free to get on with calculating the colors for the other half of the screen.

I'm really not sure how much more detail I can get into without just writing the code for you which kind of defeats the object of the exercise.

_________________
Help keep devkitPro toolchains free, Donate today

devkitPro IRC support
Personal Blog


Fri Sep 23, 2011 2:26 pm
Profile ICQ WWW

Joined: Thu Feb 03, 2011 10:47 pm
Posts: 210
WinterMute, I don't think you read his code correctly. In the ARM9 section: Point 1 is sent to the ARM7, and then Point 2 is calculated on the AMR9. Afterward, it checks for a response from the ARM7. Then, it plots both points.

roelforg: Just because the algorithm goes pixel-by-pixel doesn't mean you can't describe a larger job for the ARM7 to do. Instead of sending just a pixel to work on, you can send a bulk of pixels (e.g. half of the screen like WinterMute suggested) to be worked on and have the ARM7 dump the results into a buffer, which the ARM9 can later read from and plot onto the screen. Or go the FIFO callback route.


Fri Sep 23, 2011 3:42 pm
Profile

Joined: Thu Sep 22, 2011 5:47 pm
Posts: 13
Mtheall is half right,
He's right about the arm9 doing processing too,
But it doesn't wait on the arm7, no message from arm7 will just cause the arm9 to go calculating again and checking next time.

It's a slow calc anyways,
On my 3ghz quadcore pc it takes 30s to calc a 256x256 surface so i wanted to make use of the nds's second cpu's ability to be controlled seperatly.

_________________
We all ask nooby questions every once in a while.
Nobody's perfect, we all need help every once in a while.
Our questions might sound stupid to people but then, so do their's to us.
So please, think before calling someone you don't know a noob.


Fri Sep 23, 2011 6:28 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 17 posts ]  Go to page 1, 2  Next

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
  Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software.
Get devkitPro at SourceForge.net. Fast, secure and Free Open Source software downloads