If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Member
Join Date: Jan 2009
Posts: 229
|
The current tests for FP performance benchmarking on Android (such as Linpack for Android) are IMO not very accurate. Many such tests are written using Java, and are not properly tuned to use the hardware.
Thus, I have implemented my own small benchmark using the NDK. I have tested it on a Snapdragon S3 based device already, and need testers for other systems. Currently, the test is in early stages and I am still experimenting. It is currently single-threaded but will be adding multi-threaded modes, as well as more comprehensive tests. If you are interested, let me know and I will provide you with a self-signed APK. It will run on any device running ICS or higher. If you are worried about security of a non-market APK, note that the APK does not need internet access or SD card access. In fact, it does not ask for any special permissions at all. All you get is an application that has a "Run" button. You run it, and shows the result on screen. Last edited by codedivine; 20-Sep-2012 at 22:44. Reason: mentioned NDK |
|
|
|
|
|
#2 |
|
Member
Join Date: Jan 2009
Posts: 229
|
Well, I already got enough testers on another forum within a few minutes
So testing is closed for now. Thanks for reading! |
|
|
|
|
|
#3 |
|
Member
Join Date: Jan 2009
Posts: 229
|
|
|
|
|
|
|
#4 |
|
Member
Join Date: Jan 2009
Posts: 229
|
As a reference point, I got about 1750 megaflops on my Snapdragon S3 dual-core based device. Let me know if you have any questions.
|
|
|
|
|
|
#5 |
|
Merrily dodgy
Join Date: Aug 2003
Location: The colonies
Posts: 1,403
|
HTC One X International Tegra 3 AP33 ICS, 4 threads:
3376.0 MFlops Asus Transformer Pad TF300TG Tegra 3 T30L ICS, 4 threads: 3374.0 MFlops
__________________
"A man generally has two reasons for doing a thing. One that sounds good, and a real one." - J.P. Morgan |
|
|
|
|
|
#6 |
|
Member
Join Date: Feb 2002
Location: Luxembourg
Posts: 442
|
4600 (4 threads) on a Galaxy S3 for what it's worth.
|
|
|
|
|
|
#7 |
|
Member
Join Date: Jan 2009
Posts: 229
|
My code had a major bug. Correcting and uploading a new version.
|
|
|
|
|
|
#8 |
|
Member
Join Date: Jan 2009
Posts: 229
|
|
|
|
|
|
|
#9 |
|
Member
Join Date: Jan 2009
Posts: 229
|
Some sample results:
Galaxy S2X (running Snapdragon S3 dual-core): 450 MFlops Nexus 7 : About 920 MFlops |
|
|
|
|
|
#10 |
|
Senior Member
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,631
|
How about source code? And perhaps a disassembly listing of the binary so we can see why the compiler sucks?
|
|
|
|
|
|
#11 | |
|
Member
Join Date: Jan 2009
Posts: 229
|
Why are you assuming the compiler sucks? Were you expecting to reach 100% theoretical peak? The test is not doing only ALU computations. From my webpage:
Quote:
|
|
|
|
|
|
|
#12 |
|
Senior Member
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,631
|
I'm not expecting you to reach peak, but a well written algorithm with a half decent compiler should be able to reach pretty damn close in matrix multiplication on a CPU. Far closer than you're reaching. Cortex-A9 should be able to issue a 64-bit FMAC in 2 cycles (and 64-bit FADD in 1 cycle), and you should have enough registers to cover its latency for a matrix multiplication kernel. Particularly if you don't compile it for the 16 register variant. No idea about Snapdragon, but I wasn't under the impression that it was any worse.
Nonetheless, why not post the source and disassembly listing so we can decide for ourselves how the compiler's doing (and how you're doing)? No one else who makes the benchmarks does such a thing, are you really going to deprive us of this rare opportunity? |
|
|
|
|
|
#13 |
|
Member
Join Date: Dec 2007
Posts: 425
|
A9 issues DP instructions every other cycle, including fadd.
EDIT : This was wrong, double fadd only needs one cycle. The TRM is correct.
__________________
Speaking for myself. Last edited by Laurent06; 12-Oct-2012 at 10:34. |
|
|
|
|
|
#14 |
|
Senior Member
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,631
|
|
|
|
|
|
|
#15 |
|
Member
Join Date: Jan 2009
Posts: 229
|
Thanks Exophase and Laurent. I am looking into it. I am now testing a better version which is already giving more than 2x the performance of v1.1 reported here.
The issue was that I was not tiling properly for optimal usage of the register file. Re source code: I do aim to push the source code out at some point, but not for now. |
|
|
|
|
|
#16 |
|
Member
Join Date: Jan 2009
Posts: 229
|
Pushed v1.2 on the site. Getting about 1 gigaflop or so on my Snapdragon S3 now.
|
|
|
|
|
|
#17 |
|
Member
Join Date: Jan 2010
Posts: 119
|
TF201 Tegra3 1.3 Ghz 4 threads - 1881 Mflops
TF201 Tegra3 1.6 Ghz 4 threads - 2250 Mflops GSIII Exynos 4412 1.4 Ghz 4 threads - 2189 Mflops |
|
|
|
|
|
#18 |
|
Now Officially a Top 10 Poster
Join Date: May 2006
Location: Maastricht, The Netherlands
Posts: 13,234
|
How hard is it to run this? Wanted to try making my mother run it on her fairly cheap device, but she didn't get it. Something like seeing 1,2,4 and a green button next to one, that didn't seem to do anything ...
|
|
|
|
|
|
#19 | ||
|
Member
Join Date: Jan 2009
Posts: 229
|
Quote:
Quote:
After you select the number of threads, you press the "Run" button. These are the only two user inputs. However, it does take a while to run. On low-end phones, you should wait for say 10 minutes before it produces a result. Unfortunately, I currently don't display a progress bar, only a message saying "Running" for the entire run. So thats why your mom probably thought that the app is not doing anything after that. Also, make sure the phone supports ARMv7 ISA. The benchmark does not support say ARM11 processors (well I can compile for them, but there were a few bugs on a few phones that I wanted to avoid related to loading the correct ISA code). The app might just crash or force close if you try to run it on ARM11. |
||
|
|
|
|
|
#20 |
|
Member
Join Date: Jan 2009
Posts: 229
|
Found an interesting link. These folks compiled ATLAS on a Pandaboard (Dual Cortex A9 @ 1 GHz). They get 800 MFlops on DGEMM using 1 core, but then only about 1200 mflops using both cores. Some kind of memory bandwidth issue?
http://www.vesperix.com/arm/atlas-ar...-a9/index.html |
|
|
|
|
|
#21 |
|
Member
Join Date: Jan 2009
Posts: 229
|
Any Snapdragon S4 users (such as US GS3, HTC One S etc) care to try the benchmark? Should be interesting.
|
|
|
|
|
|
#22 |
|
Naughty Boy!
Join Date: Jul 2008
Posts: 2,255
|
AT&T One X here, with a Snapdragon S4.
1 thread: 730 MFlops 2 threads: 1460 MFlops 4 threads: 1333 MFlops (is it normal to get a performance break when the number of threads > number of cores?) CPU usage is around 98% during the test. |
|
|
|
|
|
#23 | |
|
Member
Join Date: Jan 2009
Posts: 229
|
Quote:
Interesting to see much higher performance compared to a similarly clocked 1.5 GHz Snapdragon S3 dual-core (which gives about 1040 MFlops). Yes some performance degradation was seen by users when threads>cores. Not sure if the almost 10% degradation you saw is normal though. |
|
|
|
|
|
|
#24 | |
|
Naughty Boy!
Join Date: Jul 2008
Posts: 2,255
|
Quote:
|
|
|
|
|
|
|
#25 | |
|
Member
Join Date: Sep 2008
Posts: 140
|
Quote:
|
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|