SETI applications for NVIDIA GPU improvement - how you can help

Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 14 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1802781 - Posted: 15 Jul 2016, 22:33:12 UTC - in response to Message 1802619.  


Also, on my rig running Cuda50, there are no GPU tasks reported in Times.txt
Don't know if that is intentional or not.

Script recognizes only older version of CUDA. zi missed there. I'll add new CUDA build recognition in next version.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1802781 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1802782 - Posted: 15 Jul 2016, 22:34:49 UTC - in response to Message 1802427.  

What would the perfect report look like for you?
I'm guessing you only want for: Result type = 9 (GPU?)
Do you only want for guppi? or Arecibo_VLARs also?
How many decimal points do you want for AR (Parameter), and times?
Do you want it sorted by AR?
Cheers, Rob :-)

Actually this script for your convenience cause you asked for tool to look into AR field en masse.
Testing type I currently need I usually post here or in beta news thread. There will be new build soon that requires testing.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1802782 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1802783 - Posted: 15 Jul 2016, 22:35:56 UTC - in response to Message 1802619.  
Last modified: 15 Jul 2016, 22:37:26 UTC


FYI, I saved a copy of my: client_state.xml & Times.txt

send them to me.

P.S. Ah, and before doing that make sure missed tasks not overflowed ones. Script recognize and excludes overflows automatically. Cause they have distorted timings.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1802783 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1802809 - Posted: 16 Jul 2016, 0:00:18 UTC

Please test this build: https://cloud.mail.ru/public/9ifo/XQTdqmfJJ
Look for CPU usage vs r3486.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1802809 · Report as offensive     Reply Quote
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1802888 - Posted: 16 Jul 2016, 5:25:56 UTC - in response to Message 1802780.  

I never run script in BOINC's own dir. Copy xml file into another dir and run there.
Also, check if task missed in Times.txt actually presents in client_state.xml and has stored stderr content in that file.
I shut down the client and made a copy of client_status.xml,
5 tasks were in status "uploading" and only 3 showed up in Times.txt
For both of the missing, they are in client_state.xml and can see the AR value (ie: ar=0.427524) in the stderr.

P.S. Ah, and before doing that make sure missed tasks not overflowed ones. Script recognize and excludes overflows automatically. Cause they have distorted timings.
If by overflows, you mean:
SETI@Home Informational message -9 result_overflow
both of the missing in Times.txt have it in their stderr.
So it seems (to me) that it works as intended...except for cuda50 tasks are missing (on my 2nd rig that I am using to compare results to your latest NV_SoG revision).
ID: 1802888 · Report as offensive     Reply Quote
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1802909 - Posted: 16 Jul 2016, 9:31:43 UTC - in response to Message 1802809.  

Please test this build: https://cloud.mail.ru/public/9ifo/XQTdqmfJJ
Look for CPU usage vs r3486.
Wow, that's a big improvement in completion time for non-VLARs as compared to r3484 on GTX 750 Ti!

NV_SoG_r3484: 15-16 mins
NV_SoG_r3486: 12.5mins
with no commandline for either

It's even the 1st time that it's better than cuda50 (3tasks=~39mins)
Keep up the great work!
ID: 1802909 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 4868
Credit: 85,281,665
RAC: 126
Finland
Message 1802989 - Posted: 16 Jul 2016, 17:53:51 UTC - in response to Message 1802809.  
Last modified: 16 Jul 2016, 18:04:09 UTC

Please test this build: https://cloud.mail.ru/public/9ifo/XQTdqmfJJ
Look for CPU usage vs r3486.


http://setiathome.berkeley.edu/result.php?resultid=5042270028 Here's one that was crunched with this version. Stderr misses the normal output of used parameters but includes a lot of other other information. The CPU use don't seem to be any different to the previous r3486 version. The parameters used were:
-sbs 384 -use_sleep -hp -instances_per_device 1 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32


It was run on my GTX970.

In general all these SoG applications are very sensitive to the CPU load. If it reaches 100% the driver reset is bound to happen although I have now the TdrDelay set at 8 seconds. So I have reserved 1.6 CPU cores per MB task because this is also my daily driver so extra head room is required. I will set up now the non-Boinc use limit to 18% to ease the effect of other running applications.
ID: 1802989 · Report as offensive     Reply Quote
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1803078 - Posted: 17 Jul 2016, 4:34:18 UTC - in response to Message 1802783.  

send them to me.
P.S. Ah, and before doing that make sure missed tasks not overflowed ones. Script recognize and excludes overflows automatically. Cause they have distorted timings.
Here are the files for a situation where I have 8GPU and 2 CPU tasks being blocked from Uploading.
Only 1 CPU file is in Times.txt
and there is no Overflow in client_state.xml
https://www.dropbox.com/sh/5jbnvivo9mtcq5g/AADIpSRrHx_rnPzaDyecZEqsa?dl=0
I'm tired but I double-checked so I don't know if I forgot something.
Cheers,
Rob zzzz soon
ID: 1803078 · Report as offensive     Reply Quote
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1803178 - Posted: 17 Jul 2016, 20:33:20 UTC - in response to Message 1803078.  

Commandlines are the same, no -tt 90 nor -use_sleep

r.3486 Final

WU true angle range is : 0.008880
http://setiathome.berkeley.edu/result.php?resultid=5046873396
Run time 18 min 25 sec
CPU time 16 min 46 sec

WU true angle range is : 0.009471
http://setiathome.berkeley.edu/result.php?resultid=5046872611
Run time 21 min 1 sec
CPU time 20 min 48 sec

r3430
WU true angle range is : 0.008136
http://setiathome.berkeley.edu/result.php?resultid=5044042638
Run time 20 min 29 sec
CPU time 18 min 29 sec

WU true angle range is : 0.008136
http://setiathome.berkeley.edu/result.php?resultid=5044042824
Run time 20 min 58 sec
CPU time 17 min 56 sec

There is some variance as they speed up when teamed up with non guppi. From what I can see they are comparable. Non-guppi times are exactly the same in both versions. None of the r3486final non guppi have yet validated. If you want those, we will need to wait for my wingmen.
ID: 1803178 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1803202 - Posted: 17 Jul 2016, 22:22:36 UTC - in response to Message 1803078.  

send them to me.
P.S. Ah, and before doing that make sure missed tasks not overflowed ones. Script recognize and excludes overflows automatically. Cause they have distorted timings.
Here are the files for a situation where I have 8GPU and 2 CPU tasks being blocked from Uploading.
Only 1 CPU file is in Times.txt
and there is no Overflow in client_state.xml
https://www.dropbox.com/sh/5jbnvivo9mtcq5g/AADIpSRrHx_rnPzaDyecZEqsa?dl=0
I'm tired but I double-checked so I don't know if I forgot something.
Cheers,
Rob zzzz soon

Task names to check?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1803202 · Report as offensive     Reply Quote
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1803217 - Posted: 18 Jul 2016, 2:17:06 UTC - in response to Message 1803178.  

Commandlines no sleep nor -tt 90

r3486Final
WU true angle range is : 0.426888
http://setiathome.berkeley.edu/result.php?resultid=5046867280
Run time 12 min 24 sec
CPU time 8 min 29 sec

r3430
WU true angle range is : 0.423408
http://setiathome.berkeley.edu/result.php?resultid=5044010448
Run time 12 min 34 sec
CPU time 7 min 6 sec


do you want -use_sleep and -tt 90?
ID: 1803217 · Report as offensive     Reply Quote
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1803222 - Posted: 18 Jul 2016, 3:28:21 UTC - in response to Message 1803202.  

send them to me.
P.S. Ah, and before doing that make sure missed tasks not overflowed ones. Script recognize and excludes overflows automatically. Cause they have distorted timings.
Here are the files for a situation where I have 8GPU and 2 CPU tasks being blocked from Uploading.
Only 1 CPU file is in Times.txt
and there is no Overflow in client_state.xml
https://www.dropbox.com/sh/5jbnvivo9mtcq5g/AADIpSRrHx_rnPzaDyecZEqsa?dl=0
I'm tired but I double-checked so I don't know if I forgot something.
Cheers,
Rob zzzz soon

Task names to check?

All the ones with a <stderr> that isn't task: 28jn10ac.17896.6206.12.39.68
ID: 1803222 · Report as offensive     Reply Quote
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1803235 - Posted: 18 Jul 2016, 4:48:47 UTC - in response to Message 1803217.  

-use_sleep -tt 90

WU true angle range is : 0.426932
http://setiathome.berkeley.edu/result.php?resultid=5046873026
Run time 12 min 56 sec
CPU time 6 min 18 sec

commandline plus -tt 90
WU true angle range is : 0.426891
http://setiathome.berkeley.edu/result.php?resultid=5046873467
Run time 12 min 54 sec
CPU time 6 min 31 sec


WU true angle range is : 0.006725
http://setiathome.berkeley.edu/result.php?resultid=5047342941
Run time 21 min 8 sec
CPU time 20 min 57 sec

r3430
WU true angle range is : 0.007948
http://setiathome.berkeley.edu/result.php?resultid=5043925269
Run time 20 min 47 sec
CPU time 17 min 54 sec

ok that should be it. will keep running this version for now
ID: 1803235 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1803476 - Posted: 19 Jul 2016, 10:19:55 UTC
Last modified: 19 Jul 2016, 10:22:29 UTC

Another test to do by volunteers:

On beta (or in anonymous platform mode on main) take 8.16 OpenCL app or later and add -tt F parameter to command line.
increase F value (default is 15 that means 15 ms target time for partial PulseFind kernel) and watch for host usability (that is, GUI lags, missing letters at typing and so on). At what F value lags appear?
From performance point of view it's better to have longer kernels but this can result in GUI lags. This testing needed to establish best possible default value for unattended run.

EDIT: describe your host config (preferably with link to host on beta) along with report, please.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1803476 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1803510 - Posted: 19 Jul 2016, 14:21:50 UTC

Please test in single task per GPU mode with and w/o -use_sleep for comparison:
https://cloud.mail.ru/public/6wp7/cgfuAXmnc
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1803510 · Report as offensive     Reply Quote
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1803523 - Posted: 19 Jul 2016, 21:39:38 UTC - in response to Message 1803510.  

Please test in single task per GPU mode with and w/o -use_sleep for comparison:
https://cloud.mail.ru/public/6wp7/cgfuAXmnc


Is this a new version or still part of the older r3486Final?
ID: 1803523 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1803533 - Posted: 19 Jul 2016, 22:04:05 UTC - in response to Message 1803523.  

Please test in single task per GPU mode with and w/o -use_sleep for comparison:
https://cloud.mail.ru/public/6wp7/cgfuAXmnc


Is this a new version or still part of the older r3486Final?


revision the same but binary is new.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1803533 · Report as offensive     Reply Quote
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1803577 - Posted: 20 Jul 2016, 1:16:19 UTC

Ok, I downloaded your new app and have it running

Single instance per card

No commandline

WU true angle range is : 1.155181
http://setiathome.berkeley.edu/result.php?resultid=5049630122
Run time 2 min 35 sec
CPU time 39 sec


WU true angle range is : 2.527829
http://setiathome.berkeley.edu/result.php?resultid=5049227148
Run time 2 min 30 sec
CPU time 30 sec


WU true angle range is : 0.427635
http://setiathome.berkeley.edu/result.php?resultid=5049222197
Run time 4 min 24 sec
CPU time 4 min 22 sec


WU true angle range is : 0.228564
http://setiathome.berkeley.edu/result.php?resultid=5049610671
Run time 7 min 58 sec
CPU time 7 min 56 sec

WU true angle range is : 0.007531
http://setiathome.berkeley.edu/result.php?resultid=5049222194
Run time 10 min 15 sec
CPU time 10 min 12 sec


WU true angle range is : 0.006300
http://setiathome.berkeley.edu/result.php?resultid=5049227551
Run time 10 min 58 sec
CPU time 10 min 51 sec


SETI@Home Informational message -9 result_overflow
WU true angle range is : 0.415784
http://setiathome.berkeley.edu/result.php?resultid=5049216828
Run time 11 sec
CPU time 8 sec

Only -use_sleep only

WU true angle range is : 1.132067
http://setiathome.berkeley.edu/result.php?resultid=5049583939
Run time 2 min 36 sec
CPU time 27 sec

WU true angle range is : 1.087592
http://setiathome.berkeley.edu/result.php?resultid=5049563469
Run time 3 min 9 sec
CPU time 50 sec


WU true angle range is : 0.315959
http://setiathome.berkeley.edu/result.php?resultid=5049554025
Run time 5 min 55 sec
CPU time 2 min 27 sec


WU true angle range is : 0.315959
http://setiathome.berkeley.edu/result.php?resultid=5049553957
Run time 5 min 57 sec
CPU time 2 min 27 sec


WU true angle range is : 0.007006
http://setiathome.berkeley.edu/result.php?resultid=5049583950
Run time 11 min
CPU time 3 min 44 sec


WU true angle range is : 0.007006
http://setiathome.berkeley.edu/result.php?resultid=5049583887
Run time 10 min 55 sec
CPU time 3 min 42 sec

I'll let them run for a while without any commandlines, what else would you like to see?
ID: 1803577 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1803627 - Posted: 20 Jul 2016, 7:16:36 UTC - in response to Message 1803577.  

Thanks for such good structured and detailed report.
Nothing else with this build. I'll post another one soon - would be very good if you could do just similar run with it too.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1803627 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1803649 - Posted: 20 Jul 2016, 11:04:14 UTC - in response to Message 1803222.  
Last modified: 20 Jul 2016, 11:05:52 UTC

send them to me.
P.S. Ah, and before doing that make sure missed tasks not overflowed ones. Script recognize and excludes overflows automatically. Cause they have distorted timings.
Here are the files for a situation where I have 8GPU and 2 CPU tasks being blocked from Uploading.
Only 1 CPU file is in Times.txt
and there is no Overflow in client_state.xml
https://www.dropbox.com/sh/5jbnvivo9mtcq5g/AADIpSRrHx_rnPzaDyecZEqsa?dl=0
I'm tired but I double-checked so I don't know if I forgot something.
Cheers,
Rob zzzz soon

Task names to check?

All the ones with a <stderr> that isn't task: 28jn10ac.17896.6206.12.39.68


<stderr_txt>
 result map took 35.93ms, num_iter_wo_sync=12, mean_per_iter=2.994ms

icfft=194302, FFTLength=4096, sync before triplet result map took 6.859ms, num_iter_wo_sync=3, mean_per_iter=2.286ms

icfft=194315, FFTLength=2048, sync before triplet result map took 35.8ms, num_iter_wo_sync=12, mean_per_iter=2.983ms

icfft=194319, FFTLength=2048, sync before triplet result map took 6.871ms, num_iter_wo_sync=3, mean_per_iter=2.29ms

icfft=194333, FFTLength=4096, sync before triplet result map took 34.14ms, num_iter_wo_sync=13, mean_per_iter=2.626ms

icfft=194336, FFTLength=4096, sync before triplet result map took 6.721ms, num_iter_wo_sync=3, mean_per_iter=2.24ms

icfft=194349, FFTLength=512, sync before triplet result map took 33.37ms, num_iter_wo_sync=13, mean_per_iter=2.567ms

icfft=194355, FFTLength=512, sync before triplet result map took 6.464ms, num_iter_wo_sync=3, mean_per_iter=2.155ms

icfft=194371, FFTLength=4096, sync before triplet result map took 34.14ms, num_iter_wo_sync=12, mean_per_iter=2.845ms

icfft=194374, FFTLength=4096, sync before triplet result map took 6.75ms, num_iter_wo_sync=3, mean_per_iter=2.25ms

icfft=194387, FFTLength=2048, sync before triplet result map took 35.11ms, num_iter_wo_sync=12, mean_per_iter=2.926ms

icfft=194391, FFTLength=2048, sync before triplet result map took 7.229ms, num_iter_wo_sync=3, mean_per_iter=2.41ms

icfft=194405, FFTLength=4096, sync before triplet result map took 36.23ms, num_iter_wo_sync=12, mean_per_iter=3.019ms

icfft=194408, FFTLength=4096, sync before triplet result map took 7.57ms, num_iter_wo_sync=3, mean_per_iter=2.523ms

icfft=194421, FFTLength=1024, sync before triplet result map took 35.89ms, num_iter_wo_sync=12, mean_per_iter=2.991ms



there is no stderr header available with AR info so such records are skipped.
verbose builds not suitable for performance analysis BTW. They are special-purpose ones.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1803649 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 14 · Next

Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.