Author | Message |
Claggy Volunteer tester
Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0
|
mimo, Tom and Mark, please put the following app_config.xml in your Seti Beta project directories:
<app_config>
<app_version>
<app_name>setiathome_v8</app_name>
<plan_class>armv6l</plan_class>
<cmdline>-verbose</cmdline>
</app_version>
<app_version>
<app_name>setiathome_v8</app_name>
<plan_class>armv7l</plan_class>
<cmdline>-verbose</cmdline>
</app_version>
</app_config>
Claggy
ID: 56615 · |
|
MarkJ Volunteer tester
Send message Joined: 18 Oct 09 Posts: 48 Credit: 73,283 RAC: 0
|
app_config done.
I noticed the first of the 8.01 apps (armv7l) seems to have improved run time to 94,800 seconds. Not sure if this is because of new compile options or something else.
I have also attached one of my parallella to see how it goes. Its this host
ID: 56616 · |
|
MarkJ Volunteer tester
Send message Joined: 18 Oct 09 Posts: 48 Credit: 73,283 RAC: 0
|
Parallella
swp half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
BTW, this device can be used as GPU to enable its Epiphany massive-parallel part.
Try to build OpenCL app's sources for this device.
AFAIK no one did it so far and in my list it's always delayed due to other tasks.
Performance of its ARM part should be much lower than its massive-parallel chip.
From what I gather the OpenCL for the Epiphany is a sub-set so doesn't implement a number of functions.
A better approach would be to do fft on the Epiphany, supposedly they have a library for that or they were working on it. Unfortunately the Epiphany doesn't have much local memory so the Einstein guys couldn't get it to work.
ID: 56617 · |
|
Raistmer Volunteer tester
Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0
|
A better approach would be to do fft on the Epiphany, supposedly they have a library for that or they were working on it. Unfortunately the Epiphany doesn't have much local memory so the Einstein guys couldn't get it to work.
That would demand data arrays transfers back and forth - same issue why Brook+ AstroPulse doesn't do anything from mainloop on GPU. It could do FFT on GPU but cost of data transfer back and forth was too high to make such approach feasible. News about SETI opt app releases: https://twitter.com/Raistmer
ID: 56620 · |
|
mimo Volunteer tester
Send message Joined: 4 Aug 08 Posts: 11 Credit: 1,437,079 RAC: 0
|
mimo, Tom and Mark, please put the following app_config.xml in your Seti Beta project directories:
<app_config>
<app_version>
<app_name>setiathome_v8</app_name>
<plan_class>armv6l</plan_class>
<cmdline>-verbose</cmdline>
</app_version>
<app_version>
<app_name>setiathome_v8</app_name>
<plan_class>armv7l</plan_class>
<cmdline>-verbose</cmdline>
</app_version>
</app_config>
Claggy
when i put this to app_info i is giving errors ...
ID: 56626 · |
|
Tom Rinehart Volunteer tester
Send message Joined: 22 Jul 15 Posts: 21 Credit: 113,162 RAC: 0
|
You want to put it in app_config.xml, not app_info.xml.
ID: 56628 · |
|
Claggy Volunteer tester
Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0
|
mimo, Tom and Mark, please put the following app_config.xml in your Seti Beta project directories:
<app_config>
<app_version>
<app_name>setiathome_v8</app_name>
<plan_class>armv6l</plan_class>
<cmdline>-verbose</cmdline>
</app_version>
<app_version>
<app_name>setiathome_v8</app_name>
<plan_class>armv7l</plan_class>
<cmdline>-verbose</cmdline>
</app_version>
</app_config>
Claggy
when i put this to app_info i is giving errors ...
I said app_config.xml, NOT app_info.xml
Application configuration
Application configuration
This mechanism allows you to specify scheduling parameters for specific applications or app versions. It is available with 7.0.40+ client versions.
To do this, create an ASCII file app_config.xml in the project's directory, e.g. when the project is SETI@home: {BOINC Data directory}\projects\setiathome.berkeley.edu.
Claggy
ID: 56629 · |
|
Claggy Volunteer tester
Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0
|
I've rebuilt the armv7l app with:
./configure CFLAGS="-O3 -mfloat-abi=hard -mfpu=neon -funsafe-math-optimizations" BOINCDIR=/home/pi/boinc --enable-client --enable-static --disable-shared --disable-server --enable-fast-math --with-float-abi=hard --with-fpu=neon
I will start testing in on one of my Pi2s tonight. If it works better, I will send it to Eric to post.
- Tom
How are you testing the speed, and validity? Online, or offline via a Bench?
If you require the Bench program 'arm'erised i have already done it and can supply it.
Claggy
ID: 56631 · |
|
mimo Volunteer tester
Send message Joined: 4 Aug 08 Posts: 11 Credit: 1,437,079 RAC: 0
|
mimo, Tom and Mark, please put the following app_config.xml in your Seti Beta project directories:
<app_config>
<app_version>
<app_name>setiathome_v8</app_name>
<plan_class>armv6l</plan_class>
<cmdline>-verbose</cmdline>
</app_version>
<app_version>
<app_name>setiathome_v8</app_name>
<plan_class>armv7l</plan_class>
<cmdline>-verbose</cmdline>
</app_version>
</app_config>
Claggy
when i put this to app_info i is giving errors ...
I said app_config.xml, NOT app_info.xml
Application configuration
Application configuration
This mechanism allows you to specify scheduling parameters for specific applications or app versions. It is available with 7.0.40+ client versions.
To do this, create an ASCII file app_config.xml in the project's directory, e.g. when the project is SETI@home: {BOINC Data directory}\projects\setiathome.berkeley.edu.
Claggy
now its clear... i am blind it looks so i asked my wife to clean my glasses ...
ID: 56633 · |
|
Tom Rinehart Volunteer tester
Send message Joined: 22 Jul 15 Posts: 21 Credit: 113,162 RAC: 0
|
I've rebuilt the armv7l app with:
./configure CFLAGS="-O3 -mfloat-abi=hard -mfpu=neon -funsafe-math-optimizations" BOINCDIR=/home/pi/boinc --enable-client --enable-static --disable-shared --disable-server --enable-fast-math --with-float-abi=hard --with-fpu=neon
I will start testing in on one of my Pi2s tonight. If it works better, I will send it to Eric to post.
- Tom
How are you testing the speed, and validity? Online, or offline via a Bench?
If you require the Bench program 'arm'erised i have already done it and can supply it.
Claggy
Claggy -
I'm just running the apps on Beta WUs using an app_info.xml file. They are all about the same size, so my thinking was I'd notice a significant difference in performance. I'd be interested in Bench.
It looks like my current build with the above configuration is still running at about the same speed as the 8.01 version on Beta. It is not done yet, but looking at its progress it should finish in the same time as the 8.01 apps. It makes me think there is an issue with the code itself. I don't think the seti neon or vfp code is getting used. I'm going to start looking through the code to try to better understand what needs to be done to use it.
On another note, I have three Pi2's running on Beta (and one Pi1). It is interesting in that they (Pi2s) are all running at different speeds. My fastest one is completing the WUs with the 8.01 armv6l app in about 94k seconds (it hasn't received any 8.01 armv7l ones but I suspect they would be the same). It was taking 100k seconds with the 8.00 armv6l app and about 110k seconds with the 8.00 armv7l app. I have another Pi2 that is running the 8.01 apps in about 101k seconds, and a third Pi2 that is running the 8.01 apps at 124k seconds. All three are setup with different hardware (TFT display or not) and network connections (ethernet vs. two different wifi adaptors). There must be some extra overhead on the slower machines. They are both on wifi. It is just strange.
- Tom
ID: 56634 · |
|
Tom Rinehart Volunteer tester
Send message Joined: 22 Jul 15 Posts: 21 Credit: 113,162 RAC: 0
|
Found an old email from Josef W Segur regarding fftw3 and wisdom. This was for MB v7 so there may be additional fft lengths used in v8. I am not sure if the v8 app will check for a wisdom file or not. Maybe Tom or Claggy could let me know what directory and file name it expects or does it use the default one (according to fftw docs its called /etc/fftw/wisdom)
sudo apt-get update
sudo apt-get install libfftw3-dev
cd /etc/fftw
sudo fftwf-wisdom -v -o wisdom.sah ko131072e10 cob131072 cob65536 cob16384 cob32768 cob8192 cob4096 cob2048 cob1024 cob512 cob256 cob128 cob64 cob32 cob16 cob8 cif32768
CPU features
Pi B+
half thumb fastmult vfp edsp java tls
Pi2
half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
Parallella
swp half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
Mark -
I don't know the answer if the seti app would use a wisdom file, but it is worth looking into. It makes a significant difference with the Einstein app on a Pi2. Something like 60k seconds with it vs. 80k without.
- Tom
ID: 56635 · |
|
MarkJ Volunteer tester
Send message Joined: 18 Oct 09 Posts: 48 Credit: 73,283 RAC: 0
|
Mark -
I don't know the answer if the seti app would use a wisdom file, but it is worth looking into. It makes a significant difference with the Einstein app on a Pi2. Something like 60k seconds with it vs. 80k without.
- Tom
From the fftw documentation it could use any of the following ways of getting it:
int fftw_import_system_wisdom(void);
int fftw_import_wisdom_from_filename(const char *filename);
int fftw_import_wisdom_from_string(const char *input_string);
int fftw_import_wisdom(int (*read_char)(void *), void *data);
ID: 56638 · |
|
Claggy Volunteer tester
Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0
|
It looks like my current build with the above configuration is still running at about the same speed as the 8.01 version on Beta. It is not done yet, but looking at its progress it should finish in the same time as the 8.01 apps. It makes me think there is an issue with the code itself. I don't think the seti neon or vfp code is getting used. I'm going to start looking through the code to try to better understand what needs to be done to use it.
The Stock app code does function choices, then uses the fastest function, so if the vfp, or Neon function is fastest it gets used, if it's not supported then it doesn't get tested or used:
setiathome_v8 8.00 Revision: 3304 g++ (Raspbian 4.9.2-10) 4.9.2
libboinc: BOINC 7.7.0
Work Unit Info:
...............
WU true angle range is : 2.726845
features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_BaseLineSmooth (no other)
v_GetPowerSpectrum 0.003427 0.00000 test
vfp_GetPowerSpectrum 0.001719 0.00000 test
neon_GetPowerSpectrum 0.002665 0.00000 test
vfp_GetPowerSpectrum 0.001719 0.00000 choice
v_ChirpData 0.209031 0.00000 test
fpu_ChirpData 0.177218 0.94721 test
fpu_opt_ChirpData 0.226306 0.00000 test
v_ChirpData 0.209031 0.00000 choice
v_Transpose 0.185392 0.00000 test
v_Transpose2 0.097511 0.00000 test
v_Transpose4 0.054585 0.00000 test
v_Transpose8 0.098157 0.00000 test
fftwf_transpose 0.032431 0.00000 test
v_pfTranspose2 0.085047 0.00000 test
v_pfTranspose4 0.048811 0.00000 test
v_pfTranspose8 0.080527 0.00000 test
v_vfpTranspose2 0.095259 0.00000 test
fftwf_transpose 0.032431 0.00000 choice
FPU opt folding 0.005680 0.00000 test
opt VFP folding 0.004789 0.17086 test
opt NEON folding 0.004191 0.00000 test
opt NEON folding 0.004191 0.00000 choice
Test duration 36.46 seconds
There are two other ChirpData functions that i've had disabled for Arm on Linux because they didn't work (they are impossibly fast in the function choice test, and then often don't find any signals), vfp_ChirpData, and neon_ChirpData,
I understand they don't work because they use the softfloat calling convention that Android uses, and not the hardfloat calling convention that Linux Arm hardfloat uses:
post 50921
I still havent had time to make hardfloat API builds work. The assembly files have function preambles that are soft-float specific.
The best bet is to trim off the preambles and put the assembly code inline in C++ files inside C or C++ functions.
Feel free to implement at your leisure.
The best speedup will probably come from fixing vfp_ChirpData, neon_ChirpData and probably analyzeFuncs_vfp.S and analyzeFuncs_neon.S, then reversing my analyzeFuncs_vector.cpp change:
https://setisvn.ssl.berkeley.edu/trac/changeset/2858
And from getting fftw 3.3.4 to detect and use Neon when available.
Claggy
ID: 56639 · |
|
Claggy Volunteer tester
Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0
|
Here is my Bench (from beginning of January) of two apps, one just using CFLAGS="-O3", and the other using that and --enable-fast-math, there is a small 1% to 5% speedup from using fast mathes:
KWSN-Linux-MBbench v2.1.08
Running on raspberrypi at Tue 05 Jan 2016 23:04:00 UTC
----------------------------------------------------------------
Starting benchmark run...
----------------------------------------------------------------
Listing wu-file(s) in /testWUs :
new_reference_work_unit.wu
PG0009_v8.wu
PG0395_v8.wu
PG0444_v8.wu
PG1327_v8.wu
Listing executable(s) in /APPS :
setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf
setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf
Listing executable in /REF_APPS :
setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf
----------------------------------------------------------------
Current WU: new_reference_work_unit.wu
----------------------------------------------------------------
Skipping default app setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf, displaying saved result(s)
Elapsed Time: ....................... 50449 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf -st -verb -nog 50035.85 sec 49714.69 sec 337.10 sec
Elapsed Time : ...................... 50035 seconds
Speed compared to default : ......... 100 %
-----------------
Comparing results
Result : Strongly similar, Q= 100.0%
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf -st -verb -nog 47781.10 sec 47463.23 sec 333.29 sec
Elapsed Time : ...................... 47781 seconds
Speed compared to default : ......... 105 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.87%
----------------------------------------------------------------
Done with new_reference_work_unit.wu
====================================================================
Current WU: PG0009_v8.wu
----------------------------------------------------------------
Running default app with command :... setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf -st -verb -nog 8075.48 sec 8024.92 sec 51.59 sec
Elapsed Time: ....................... 8076 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf -st -verb -nog 8108.60 sec 8058.81 sec 51.00 sec
Elapsed Time : ...................... 8108 seconds
Speed compared to default : ......... 99 %
-----------------
Comparing results
Result : Strongly similar, Q= 100.0%
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf -st -verb -nog 7938.20 sec 7888.43 sec 50.74 sec
Elapsed Time : ...................... 7938 seconds
Speed compared to default : ......... 101 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.98%
----------------------------------------------------------------
Done with PG0009_v8.wu
====================================================================
Current WU: PG0395_v8.wu
----------------------------------------------------------------
Running default app with command :... setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf -st -verb -nog 9424.71 sec 9373.88 sec 52.21 sec
Elapsed Time: ....................... 9425 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf -st -verb -nog 9457.92 sec 9407.01 sec 51.71 sec
Elapsed Time : ...................... 9458 seconds
Speed compared to default : ......... 99 %
-----------------
Comparing results
Result : Strongly similar, Q= 100.0%
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf -st -verb -nog 8953.89 sec 8904.34 sec 51.62 sec
Elapsed Time : ...................... 8954 seconds
Speed compared to default : ......... 105 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.90%
----------------------------------------------------------------
Done with PG0395_v8.wu
====================================================================
Current WU: PG0444_v8.wu
----------------------------------------------------------------
Running default app with command :... setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf -st -verb -nog 8874.02 sec 8822.46 sec 52.49 sec
Elapsed Time: ....................... 8874 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf -st -verb -nog 8931.91 sec 8876.90 sec 53.95 sec
Elapsed Time : ...................... 8932 seconds
Speed compared to default : ......... 99 %
-----------------
Comparing results
Result : Strongly similar, Q= 100.0%
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf -st -verb -nog 8434.35 sec 8379.83 sec 52.95 sec
Elapsed Time : ...................... 8435 seconds
Speed compared to default : ......... 105 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.97%
----------------------------------------------------------------
Done with PG0444_v8.wu
====================================================================
Current WU: PG1327_v8.wu
----------------------------------------------------------------
Running default app with command :... setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf -st -verb -nog 10414.17 sec 10280.96 sec 126.07 sec
Elapsed Time: ....................... 10414 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf -st -verb -nog 10421.65 sec 10298.74 sec 125.72 sec
Elapsed Time : ...................... 10422 seconds
Speed compared to default : ......... 99 %
-----------------
Comparing results
Result : Strongly similar, Q= 100.0%
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf -st -verb -nog 10263.50 sec 10139.68 sec 126.73 sec
Elapsed Time : ...................... 10264 seconds
Speed compared to default : ......... 101 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.95%
----------------------------------------------------------------
Done with PG1327_v8.wu
====================================================================
Hosts CPU data ...
model name : ARMv7 Processor rev 5 (v7l)
Done with Benchmark run! Removing temporary files!
Claggy
ID: 56640 · |
|
Claggy Volunteer tester
Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0
|
I've rebuilt the armv7l app with:
./configure CFLAGS="-O3 -mfloat-abi=hard -mfpu=neon -funsafe-math-optimizations" BOINCDIR=/home/pi/boinc --enable-client --enable-static --disable-shared --disable-server --enable-fast-math --with-float-abi=hard --with-fpu=neon
I will start testing in on one of my Pi2s tonight. If it works better, I will send it to Eric to post.
- Tom
add -march=armv7-a and -mtune=cortex-a7 for PI 2 it will be even better
Here's a Bench (on only the VLAR PGv8 Wu) of my first two apps, and of two more with your suggestions,
Tom's options didn't change the speed at all, while mimo's sped the app by a small amount (~25 secs over a runtime of 8000 secs):
KWSN-Linux-MBbench v2.1.08
Running on raspberrypi at Wed 10 Feb 2016 23:34:32 UTC
----------------------------------------------------------------
Starting benchmark run...
----------------------------------------------------------------
Listing wu-file(s) in /testWUs :
PG0009_v8.wu
Listing executable(s) in /APPS :
setiathome-8.0r3304_abihard_fpu_neon_unsafe_math_fast_math.armv7l-unknown-linux-gnueabihf
setiathome-8.0r3304_abihard_fpu_neon_unsafe_math_fast_math_march_armv7-a_mtune_cortex-a7.armv7l-unknown-linux-gnueabihf
setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf
setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf
Listing executable in /REF_APPS :
setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf
----------------------------------------------------------------
Current WU: PG0009_v8.wu
----------------------------------------------------------------
Skipping default app setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf, displaying saved result(s)
Elapsed Time: ....................... 8076 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304_abihard_fpu_neon_unsafe_math_fast_math.armv7l-unknown-linux-gnueabihf
./setiathome-8.0r3304_abihard_fpu_neon_unsafe_math_fast_math.armv7l-unknown-linux-gnueabihf 8005.88 sec 7956.04 sec 50.78 sec
Elapsed Time : ...................... 8006 seconds
Speed compared to default : ......... 100 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.98%
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304_abihard_fpu_neon_unsafe_math_fast_math_march_armv7-a_mtune_cortex-a7.armv7l-unknown-linux-gnueabihf
./setiathome-8.0r3304_abihard_fpu_neon_unsafe_math_fast_math_march_armv7-a_mtune_cortex-a7.armv7l-unknown-linux-gnueabihf 7974.05 sec 7921.95 sec 51.84 sec
Elapsed Time : ...................... 7974 seconds
Speed compared to default : ......... 101 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.98%
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3304.armv7l-unknown-linux-gnueabihf -st -verb -nog 8119.92 sec 8070.41 sec 50.93 sec
Elapsed Time : ...................... 8120 seconds
Speed compared to default : ......... 99 %
-----------------
Comparing results
Result : Strongly similar, Q= 100.0%
----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3304_fast_math.armv7l-unknown-linux-gnueabihf -st -verb -nog 8003.22 sec 7953.26 sec 50.60 sec
Elapsed Time : ...................... 8003 seconds
Speed compared to default : ......... 100 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.98%
----------------------------------------------------------------
Done with PG0009_v8.wu
====================================================================
Hosts CPU data ...
model name : ARMv7 Processor rev 5 (v7l)
Done with Benchmark run! Removing temporary files!
Claggy
ID: 56641 · |
|
mimo Volunteer tester
Send message Joined: 4 Aug 08 Posts: 11 Credit: 1,437,079 RAC: 0
|
It looks like my current build with the above configuration is still running at about the same speed as the 8.01 version on Beta. It is not done yet, but looking at its progress it should finish in the same time as the 8.01 apps. It makes me think there is an issue with the code itself. I don't think the seti neon or vfp code is getting used. I'm going to start looking through the code to try to better understand what needs to be done to use it.
The Stock app code does function choices, then uses the fastest function, so if the vfp, or Neon function is fastest it gets used, if it's not supported then it doesn't get tested or used:
setiathome_v8 8.00 Revision: 3304 g++ (Raspbian 4.9.2-10) 4.9.2
libboinc: BOINC 7.7.0
Work Unit Info:
...............
WU true angle range is : 2.726845
features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_BaseLineSmooth (no other)
v_GetPowerSpectrum 0.003427 0.00000 test
vfp_GetPowerSpectrum 0.001719 0.00000 test
neon_GetPowerSpectrum 0.002665 0.00000 test
vfp_GetPowerSpectrum 0.001719 0.00000 choice
v_ChirpData 0.209031 0.00000 test
fpu_ChirpData 0.177218 0.94721 test
fpu_opt_ChirpData 0.226306 0.00000 test
v_ChirpData 0.209031 0.00000 choice
v_Transpose 0.185392 0.00000 test
v_Transpose2 0.097511 0.00000 test
v_Transpose4 0.054585 0.00000 test
v_Transpose8 0.098157 0.00000 test
fftwf_transpose 0.032431 0.00000 test
v_pfTranspose2 0.085047 0.00000 test
v_pfTranspose4 0.048811 0.00000 test
v_pfTranspose8 0.080527 0.00000 test
v_vfpTranspose2 0.095259 0.00000 test
fftwf_transpose 0.032431 0.00000 choice
FPU opt folding 0.005680 0.00000 test
opt VFP folding 0.004789 0.17086 test
opt NEON folding 0.004191 0.00000 test
opt NEON folding 0.004191 0.00000 choice
Test duration 36.46 seconds
There are two other ChirpData functions that i've had disabled for Arm on Linux because they didn't work (they are impossibly fast in the function choice test, and then often don't find any signals), vfp_ChirpData, and neon_ChirpData,
I understand they don't work because they use the softfloat calling convention that Android uses, and not the hardfloat calling convention that Linux Arm hardfloat uses:
post 50921
I still havent had time to make hardfloat API builds work. The assembly files have function preambles that are soft-float specific.
The best bet is to trim off the preambles and put the assembly code inline in C++ files inside C or C++ functions.
Feel free to implement at your leisure.
The best speedup will probably come from fixing vfp_ChirpData, neon_ChirpData and probably analyzeFuncs_vfp.S and analyzeFuncs_neon.S, then reversing my analyzeFuncs_vector.cpp change:
https://setisvn.ssl.berkeley.edu/trac/changeset/2858
And from getting fftw 3.3.4 to detect and use Neon when available.
Claggy
apps which i am using now are using fftw3 builded with neon,arm7 and cortex opts ,plus fast and unsafe math
ID: 56642 · |
|
MarkJ Volunteer tester
Send message Joined: 18 Oct 09 Posts: 48 Credit: 73,283 RAC: 0
|
apps which i am using now are using fftw3 builded with neon,arm7 and cortex opts ,plus fast and unsafe math
Your average run time looks around 123,000 sec.
I'm running the stock 8.01 (It gets both armv6l and armv7l apps) coming in around 97,000 sec. I have attached another Pi2 to compare against. The one with the 97k seconds has an fftw wisdom file. If the second Pi2 comes in slower then we'll know if its using the wisdom file.
ID: 56738 · |
|
MarkJ Volunteer tester
Send message Joined: 18 Oct 09 Posts: 48 Credit: 73,283 RAC: 0
|
I'm running the stock 8.01 (It gets both armv6l and armv7l apps) coming in around 97,000 sec. I have attached another Pi2 to compare against. The one with the 97k seconds has an fftw wisdom file. If the second Pi2 comes in slower then we'll know if its using the wisdom file.
The one without a wisdom file has turned in 3 work units so far and they have come in around 95k-96k seconds. That sure looks like the app isn't using a wisdom file to me. It should be easy to fix, just add one line of code to read the system wisdom file in. I quoted the code from the fftw docs above.
Maybe Tom you could add it and get a new app to Eric please. The build options you used for the 8.01 app should be fine.
ID: 56802 · |
|
Tom Rinehart Volunteer tester
Send message Joined: 22 Jul 15 Posts: 21 Credit: 113,162 RAC: 0
|
ID: 56813 · |
|
Tom Rinehart Volunteer tester
Send message Joined: 22 Jul 15 Posts: 21 Credit: 113,162 RAC: 0
|
Eric Korpela helped me with the fix for building the app from the latest code. It is simple:
svn checkout https://setisvn.ssl.berkeley.edu/svn/seti_boinc seti_boinc
cd seti_boinc
nano client/vector/analyzeFuncs_vfp_aux.cpp
at the end of the local includes section add:
#include "fp_arm.h"
save the file and then build the apps like normal:
./_autosetup
./configure CFLAGS="-O3" CXXFLAGS="-O3" BOINCDIR=/home/pi/boinc --enable-client --enable-static --disable-shared --disable-server --enable-fast-math
make
I'm currently testing the apps and will send him the latest builds once they come back successful.
- Tom
ID: 56814 · |
|