Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 20-Sep-2012, 21:45   #1
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default Android benchmark. Looking for testers

The current tests for FP performance benchmarking on Android (such as Linpack for Android) are IMO not very accurate. Many such tests are written using Java, and are not properly tuned to use the hardware.

Thus, I have implemented my own small benchmark using the NDK. I have tested it on a Snapdragon S3 based device already, and need testers for other systems. Currently, the test is in early stages and I am still experimenting. It is currently single-threaded but will be adding multi-threaded modes, as well as more comprehensive tests.

If you are interested, let me know and I will provide you with a self-signed APK. It will run on any device running ICS or higher. If you are worried about security of a non-market APK, note that the APK does not need internet access or SD card access. In fact, it does not ask for any special permissions at all. All you get is an application that has a "Run" button. You run it, and shows the result on screen.

Last edited by codedivine; 20-Sep-2012 at 22:44. Reason: mentioned NDK
codedivine is offline   Reply With Quote
Old 20-Sep-2012, 22:11   #2
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

Well, I already got enough testers on another forum within a few minutes .
So testing is closed for now. Thanks for reading!
codedivine is offline   Reply With Quote
Old 22-Sep-2012, 02:51   #3
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

Benchmark is now public. Check it at http://rgbench.com
Download and benchmark away
codedivine is offline   Reply With Quote
Old 22-Sep-2012, 05:57   #4
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

As a reference point, I got about 1750 megaflops on my Snapdragon S3 dual-core based device. Let me know if you have any questions.
codedivine is offline   Reply With Quote
Old 22-Sep-2012, 07:32   #5
Florin
Merrily dodgy
 
Join Date: Aug 2003
Location: The colonies
Posts: 1,403
NVIDIA

HTC One X International Tegra 3 AP33 ICS, 4 threads:
3376.0 MFlops

Asus Transformer Pad TF300TG Tegra 3 T30L ICS, 4 threads:
3374.0 MFlops
__________________
"A man generally has two reasons for doing a thing. One that sounds good, and a real one." - J.P. Morgan
Florin is offline   Reply With Quote
Old 22-Sep-2012, 21:32   #6
Nebuchadnezzar
Member
 
Join Date: Feb 2002
Location: Luxembourg
Posts: 442
Default

4600 (4 threads) on a Galaxy S3 for what it's worth.
Nebuchadnezzar is offline   Reply With Quote
Old 22-Sep-2012, 21:42   #7
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

My code had a major bug. Correcting and uploading a new version.
codedivine is offline   Reply With Quote
Old 22-Sep-2012, 22:22   #8
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

Uploaded a new version. Check version 1.1 on http://rgbench.com
codedivine is offline   Reply With Quote
Old 23-Sep-2012, 03:39   #9
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

Some sample results:

Galaxy S2X (running Snapdragon S3 dual-core): 450 MFlops
Nexus 7 : About 920 MFlops
codedivine is offline   Reply With Quote
Old 23-Sep-2012, 09:26   #10
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,631
Default

How about source code? And perhaps a disassembly listing of the binary so we can see why the compiler sucks?
Exophase is offline   Reply With Quote
Old 23-Sep-2012, 10:58   #11
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

Quote:
Originally Posted by Exophase View Post
so we can see why the compiler sucks?
Why are you assuming the compiler sucks? Were you expecting to reach 100% theoretical peak? The test is not doing only ALU computations. From my webpage:

Quote:
It performs a fp64 matrix multiply (hence MM). It is a fully multi-threaded benchmark written using the NDK in C++, and performs a tiled matrix multiply with multiple tile sizes and reports the best performance.
You are not going to get peak. And the search space I am searching over is also not the most optimal one (yet). My benchmark is not totally ideal and I know that. It was meant as a quick hack that is still substantially better indicator than the current benches used by the blogosphere such as "Linpack for Android".
codedivine is offline   Reply With Quote
Old 23-Sep-2012, 11:19   #12
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,631
Default

I'm not expecting you to reach peak, but a well written algorithm with a half decent compiler should be able to reach pretty damn close in matrix multiplication on a CPU. Far closer than you're reaching. Cortex-A9 should be able to issue a 64-bit FMAC in 2 cycles (and 64-bit FADD in 1 cycle), and you should have enough registers to cover its latency for a matrix multiplication kernel. Particularly if you don't compile it for the 16 register variant. No idea about Snapdragon, but I wasn't under the impression that it was any worse.

Nonetheless, why not post the source and disassembly listing so we can decide for ourselves how the compiler's doing (and how you're doing)? No one else who makes the benchmarks does such a thing, are you really going to deprive us of this rare opportunity?
Exophase is offline   Reply With Quote
Old 23-Sep-2012, 11:24   #13
Laurent06
Member
 
Join Date: Dec 2007
Posts: 425
Default

A9 issues DP instructions every other cycle, including fadd.

EDIT : This was wrong, double fadd only needs one cycle. The TRM is correct.
__________________
Speaking for myself.

Last edited by Laurent06; 12-Oct-2012 at 10:34.
Laurent06 is offline   Reply With Quote
Old 23-Sep-2012, 11:42   #14
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,631
Default

Quote:
Originally Posted by Laurent06 View Post
A9 issues DP instructions every other cycle, including fadd.
If so ARM is lying on their TRMs again :/

http://infocenter.arm.com/help/topic...h02s03s02.html
Exophase is offline   Reply With Quote
Old 23-Sep-2012, 12:28   #15
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

Thanks Exophase and Laurent. I am looking into it. I am now testing a better version which is already giving more than 2x the performance of v1.1 reported here.
The issue was that I was not tiling properly for optimal usage of the register file.

Re source code: I do aim to push the source code out at some point, but not for now.
codedivine is offline   Reply With Quote
Old 23-Sep-2012, 12:43   #16
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

Pushed v1.2 on the site. Getting about 1 gigaflop or so on my Snapdragon S3 now.
codedivine is offline   Reply With Quote
Old 23-Sep-2012, 14:19   #17
OlegSH
Member
 
Join Date: Jan 2010
Posts: 119
Default

TF201 Tegra3 1.3 Ghz 4 threads - 1881 Mflops
TF201 Tegra3 1.6 Ghz 4 threads - 2250 Mflops
GSIII Exynos 4412 1.4 Ghz 4 threads - 2189 Mflops
OlegSH is offline   Reply With Quote
Old 23-Sep-2012, 19:02   #18
Arwin
Now Officially a Top 10 Poster
 
Join Date: May 2006
Location: Maastricht, The Netherlands
Posts: 13,234
Default

How hard is it to run this? Wanted to try making my mother run it on her fairly cheap device, but she didn't get it. Something like seeing 1,2,4 and a green button next to one, that didn't seem to do anything ...
Arwin is offline   Reply With Quote
Old 23-Sep-2012, 21:37   #19
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

Quote:
Originally Posted by OlegSH View Post
TF201 Tegra3 1.3 Ghz 4 threads - 1881 Mflops
TF201 Tegra3 1.6 Ghz 4 threads - 2250 Mflops
GSIII Exynos 4412 1.4 Ghz 4 threads - 2189 Mflops
Thanks

Quote:
Originally Posted by Arwin View Post
How hard is it to run this? Wanted to try making my mother run it on her fairly cheap device, but she didn't get it. Something like seeing 1,2,4 and a green button next to one, that didn't seem to do anything ...
Well, the "1, 2, 4" is a dialog for choosing number of threads. I was having trouble correctly detecting number of supported threads on every phone so I made it a user dialog.

After you select the number of threads, you press the "Run" button. These are the only two user inputs.

However, it does take a while to run. On low-end phones, you should wait for say 10 minutes before it produces a result. Unfortunately, I currently don't display a progress bar, only a message saying "Running" for the entire run. So thats why your mom probably thought that the app is not doing anything after that.

Also, make sure the phone supports ARMv7 ISA. The benchmark does not support say ARM11 processors (well I can compile for them, but there were a few bugs on a few phones that I wanted to avoid related to loading the correct ISA code). The app might just crash or force close if you try to run it on ARM11.
codedivine is offline   Reply With Quote
Old 23-Sep-2012, 22:21   #20
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

Found an interesting link. These folks compiled ATLAS on a Pandaboard (Dual Cortex A9 @ 1 GHz). They get 800 MFlops on DGEMM using 1 core, but then only about 1200 mflops using both cores. Some kind of memory bandwidth issue?

http://www.vesperix.com/arm/atlas-ar...-a9/index.html
codedivine is offline   Reply With Quote
Old 24-Sep-2012, 11:15   #21
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

Any Snapdragon S4 users (such as US GS3, HTC One S etc) care to try the benchmark? Should be interesting.
codedivine is offline   Reply With Quote
Old 24-Sep-2012, 12:10   #22
ToTTenTranz
Naughty Boy!
 
Join Date: Jul 2008
Posts: 2,255
Default

AT&T One X here, with a Snapdragon S4.

1 thread: 730 MFlops
2 threads: 1460 MFlops
4 threads: 1333 MFlops (is it normal to get a performance break when the number of threads > number of cores?)

CPU usage is around 98% during the test.
ToTTenTranz is offline   Reply With Quote
Old 24-Sep-2012, 12:18   #23
codedivine
Member
 
Join Date: Jan 2009
Posts: 229
Default

Quote:
Originally Posted by ToTTenTranz View Post
AT&T One X here, with a Snapdragon S4.

1 thread: 730 MFlops
2 threads: 1460 MFlops
4 threads: 1333 MFlops (is it normal to get a performance break when the number of threads > number of cores?)

CPU usage is around 98% during the test.
Excellent! Thanks a lot!
Interesting to see much higher performance compared to a similarly clocked 1.5 GHz Snapdragon S3 dual-core (which gives about 1040 MFlops).

Yes some performance degradation was seen by users when threads>cores. Not sure if the almost 10% degradation you saw is normal though.
codedivine is offline   Reply With Quote
Old 24-Sep-2012, 14:32   #24
ToTTenTranz
Naughty Boy!
 
Join Date: Jul 2008
Posts: 2,255
Default

Quote:
Originally Posted by codedivine View Post
Excellent! Thanks a lot!
Interesting to see much higher performance compared to a similarly clocked 1.5 GHz Snapdragon S3 dual-core (which gives about 1040 MFlops).

Yes some performance degradation was seen by users when threads>cores. Not sure if the almost 10% degradation you saw is normal though.
But you just said you got a 1750 MFlops score with your S3. It seemed like the score was pretty low to me.
ToTTenTranz is offline   Reply With Quote
Old 24-Sep-2012, 16:41   #25
Rurouni
Member
 
Join Date: Sep 2008
Posts: 140
Default

Quote:
Originally Posted by ToTTenTranz View Post
But you just said you got a 1750 MFlops score with your S3. It seemed like the score was pretty low to me.
With ver. 1.2 the score was 1GFlops @1.5GHz for S3
Rurouni is online now   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 13:40.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.