Message boards :
Number crunching :
SETI applications for NVIDIA GPU improvement - how you can help
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 14 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
New set of builds for Windows (r3482) available here: https://cloud.mail.ru/public/J5f8/HGuG3Vp4R Please test if iGPU memory leak issues fixed. Also please test how GUI responds on different GPUs with completely default options (that is, w/o any additional tuning and single task per GPU). If lags acceptable? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Up and running. Don't have a iGPU but not noticing any lags. CPU usage starts 50% and slowly increases until work unit is done. non-guppis are running about 4 min 30sec, GUPPI running almost 8 minutes Both non-modified. CPU time is almost identical to GPU time, varies anywhere from 9 sec to 30 sec difference. Do you want -v 6 info or what would you like posted or linked? This is the machine I have r3482 installed on http://setiathome.berkeley.edu/results.php?hostid=8033686 |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Up and running. How this compares with r3430 ?
add -use_sleep to see any CPU time savings.
Yes, single VLAR, mid-AR and VHAR outputs with -v 6 would be good to see.
Thanks, will look for performance statistics. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
without -use_sleep or -v6 WU true angle range is : 203.352810 http://setiathome.berkeley.edu/result.php?resultid=5025774166 WU true angle range is : 0.740043 http://setiathome.berkeley.edu/result.php?resultid=5025858919 WU true angle range is : 0.425325 http://setiathome.berkeley.edu/result.php?resultid=5025774167 WU true angle range is : 0.008864 http://setiathome.berkeley.edu/result.php?resultid=5025858917 Now with -use_sleep and -v 6 CPU usage fluctuates. starts high at 90% and decreases down to mid 70s. Occasionally some get to 20% but 3/4 stay up in 70 -9 overflow WU true angle range is : 0.442808 http://setiathome.berkeley.edu/result.php?resultid=5025913520 WU true angle range is : 1.265541 http://setiathome.berkeley.edu/result.php?resultid=5026007359 WU true angle range is : 1.046780 http://setiathome.berkeley.edu/result.php?resultid=5025913444 still pending validation WU true angle range is : 0.417197 http://setiathome.berkeley.edu/result.php?resultid=5025913443 WU true angle range is : 0.011463 http://setiathome.berkeley.edu/result.php?resultid=5025956743 WU true angle range is : 0.007666 http://setiathome.berkeley.edu/result.php?resultid=5025956762 I'll keep an eye out for a high AR work unit I can't compare r3482 to r3430 as I wasn't running single work units. I would need to go back and downgrade to do that. Eventually I will do that but before that, I want to try and modify the commandlines and run multiple instances so I can see how they compare to my normal processing. edit... The addition of -high_perf doesn't seem to do much if anything for this build. I see no change in time with it |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Now app enables high_perf path where needed on point basis. Seems for your GPU it does it right: Fftlength=32,pass=3:Tune: sum=34696.7(ms); min=5.576(ms); max=23.86(ms); mean=14.96(ms); s_mean=11.54; sleep=0(ms); delta=176; N=2320; usual Fftlength=32,pass=4:Tune: sum=34696.7(ms); min=5.576(ms); max=23.86(ms); mean=14.96(ms); s_mean=11.54; sleep=0(ms); delta=176; N=2320; usual Fftlength=32,pass=5:Tune: sum=34696.7(ms); min=5.576(ms); max=23.86(ms); mean=14.96(ms); s_mean=11.54; sleep=0(ms); delta=176; N=2320; usual Fftlength=64,pass=3:Tune: sum=21931.8(ms); min=2.502(ms); max=17.74(ms); mean=14.61(ms); s_mean=13.87; sleep=15(ms); delta=275; N=1501; usual Fftlength=64,pass=4:Tune: sum=21931.8(ms); min=2.502(ms); max=17.74(ms); mean=14.61(ms); s_mean=13.87; sleep=15(ms); delta=275; N=1501; usual Fftlength=64,pass=5:Tune: sum=21931.8(ms); min=2.502(ms); max=17.74(ms); mean=14.61(ms); s_mean=13.87; sleep=15(ms); delta=275; N=1501; usual Fftlength=128,pass=3:Tune: sum=11697.2(ms); min=1.391(ms); max=20.93(ms); mean=11.47(ms); s_mean=12.46; sleep=15(ms); delta=680; N=1020; usual Fftlength=128,pass=4:Tune: sum=11697.2(ms); min=1.391(ms); max=20.93(ms); mean=11.47(ms); s_mean=12.46; sleep=15(ms); delta=680; N=1020; usual Fftlength=128,pass=5:Tune: sum=11697.2(ms); min=1.391(ms); max=20.93(ms); mean=11.47(ms); s_mean=12.46; sleep=15(ms); delta=680; N=1020; usual Fftlength=256,pass=3:Tune: sum=11239.6(ms); min=0.7013(ms); max=20.58(ms); mean=10.47(ms); s_mean=11.63; sleep=0(ms); delta=680; N=1073; usual Fftlength=256,pass=4:Tune: sum=11239.6(ms); min=0.7013(ms); max=20.58(ms); mean=10.47(ms); s_mean=11.63; sleep=0(ms); delta=680; N=1073; usual Fftlength=256,pass=5:Tune: sum=11239.6(ms); min=0.7013(ms); max=20.58(ms); mean=10.47(ms); s_mean=11.63; sleep=0(ms); delta=680; N=1073; usual Fftlength=512,pass=3:Tune: sum=11053.3(ms); min=0.3725(ms); max=10.35(ms); mean=8.236(ms); s_mean=9.647; sleep=0(ms); delta=1363; N=1342; usual Fftlength=512,pass=4:Tune: sum=11053.3(ms); min=0.3725(ms); max=10.35(ms); mean=8.236(ms); s_mean=9.647; sleep=0(ms); delta=1363; N=1342; usual Fftlength=512,pass=5:Tune: sum=11053.3(ms); min=0.3725(ms); max=10.35(ms); mean=8.236(ms); s_mean=9.647; sleep=0(ms); delta=1363; N=1342; usual Fftlength=1024,pass=3:Tune: sum=26917.3(ms); min=0.1883(ms); max=13.13(ms); mean=11.29(ms); s_mean=11.98; sleep=0(ms); delta=2395; N=2385; high_perf Fftlength=1024,pass=4:Tune: sum=26917.3(ms); min=0.1883(ms); max=13.13(ms); mean=11.29(ms); s_mean=11.98; sleep=0(ms); delta=2395; N=2385; high_perf Fftlength=1024,pass=5:Tune: sum=26917.3(ms); min=0.1883(ms); max=13.13(ms); mean=11.29(ms); s_mean=11.98; sleep=0(ms); delta=2395; N=2385; high_perf Fftlength=2048,pass=3:Tune: sum=19290.1(ms); min=1.704(ms); max=4.652(ms); mean=4.222(ms); s_mean=4.222; sleep=0(ms); delta=1; N=4569; high_perf Fftlength=4096,pass=3:Tune: sum=17411.2(ms); min=0.8059(ms); max=2.15(ms); mean=1.906(ms); s_mean=1.898; sleep=0(ms); delta=1; N=9137; high_perf Fftlength=8192,pass=3:Tune: sum=21859.6(ms); min=1.033(ms); max=1.705(ms); mean=1.196(ms); s_mean=1.182; sleep=0(ms); delta=1; N=18275; usual Also, with default sleep time of 15ms you GPU just can't provide long enough kernels to enable sleep w/o big performance loss so sleep mostly disabled. All this tunable though but default behavior seems OK. Well, just setup best-performance config as you did for 3430 and compare times there. There are new options available but first step would be to see how good ap handles this new parameter space at defaults. So, same tuning line as for 3430 as start (and same number of simultaneous tasks). SETI apps news We're not gonna fight them. We're gonna transcend them. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Ok, needed some more BLC to compare For Non-guppi with old commandline in r3430, the time to complete is same. Woo Hoo!! For GUPPI with old commandline, is about 120 seconds longer for combined work than r3430. However it is faster than r3480 so there is improvement. Going to finish up what I have now for Main and move to Beta for test on 4bit Temps too high outside to run during day now, so don't expect any more result until after dark. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Here is r3484 https://cloud.mail.ru/public/LqtQ/YBsLcr8nf. Please test, especially iGPU one. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Here is new set of binaries (r3486): https://cloud.mail.ru/public/A7yG/k8yoZrFvr Please test. For all but iGPU only cosmetic change - no need to provide -use_sleep directly if some of options from sleep set defined. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Stubbles Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0 |
Ok, needed some more BLC to compare Want to try my batch file to send selected (suspended) CPU tasks to GPU queue? |
Stubbles Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0 |
Here is r3484 https://cloud.mail.ru/public/LqtQ/YBsLcr8nf. as requested in: msg 1801527 If you'd like them to be of a specific kind of vlars, I can do so since my little batch program transfers those that I have Suspended.Both. Actually - all ARs range. But usually very VLARs cause troubles with usability. using: NV_SoG_r3484 (no commandline) processed: 4 vlars in sequence (1WU/GPU) Results: - No lag whatsoever on GTX 750 Ti with Win10 - forgot to check AR blc3_2bit_guppi_57451_69387_HIP117559_0023.18368.0.17.26.243.vlar_2 00:26:05 (00:25:56) - <true_angle_range>0.00823 blc3_2bit_guppi_57451_70772_HIP117779_0027.22322.831.17.26.168.vlar_1 00:25:34 (00:25:28) - <true_angle_range>0.01140 jn10ab.24885.885.5.32.100.vlar_0 00:31:50 (00:31:45) - <true_angle_range>0.01140 02jn10ab.24885.885.5.32.227.vlar_1 00:31:51 (00:31:45) (before test run, non-vlar processed in ~16min +/-2) Let me know if you'd like me to report with linked URL in the future (as Zalster did above) As for a bigger batch (with a wider AR range), I'll wait for your modified perl script ...as I find it tedious to go digging for AR values. Just realized that CPU usage was mostly in the 90s% as I had left 6 core running vlars ...and sometimes it reached 100% for a few secs at a time. For future tests, I'll reduce CPU to ~80% unless specified otherwise. [edit]Should[/e] I install SoG_r3486? PS: I have another rig that is almost identical (except it's Win7 & 8GB ram). It is currently setup with Lunatics v0.45 beta3 with the Cuda50 apps with 2WU/GPU. I can also make changes to that one for testing if you'd like...or use it to compare NV_SoG to the "gold standard" for GTX 750 Ti. |
PERPLEXER ~ Thomas Huettinger Send message Joined: 25 Jan 05 Posts: 11 Credit: 395,156,213 RAC: 371 |
Hey Boss! Thank you for your work! Have a nice summer :) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
[quote]Here is r3484 [code]- No lag whatsoever on GTX 750 Ti with Win10 Fine!
yep, most representable results for midAR,VHAR,VLAR, GBT would be good to have with links to particular resuls (stderr contains additional info about ho app made choices for particular card). soon but not today perhaps, I haven't my netbook ith me currently.
For CPU time consumption decrese try -use_sleep.
better let it run on beta.
would be good to compare timings indeed. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Some short guidlines how to increase app performance with new params tuning (beware - GUI lags can appear in result). 1. Increase value for -tt F option (default is 15). Better to increase with step of sleep quantum (if sleep enabled). Default sleep quantum 15 (ms). If sleep not enabled no need to use such step. 2. If sleep used and CPU is Intel or AMD non-APU try to decrease sleep quantum via -sleep_quantul option (default is 15). This will make sleep more effective if real sleep quantum for particular host less than 15 ms. I hope to write more detailed instructions with some insights in algorithm besides those options on Lunatics if will have some time for that. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Here it is: http://lunatics.kwsn.info/index.php?action=downloads;sa=view;down=498 SETI apps news We're not gonna fight them. We're gonna transcend them. |
Stubbles Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0 |
Here it is: http://lunatics.kwsn.info/index.php?action=downloads;sa=view;down=498 All I get is a few hundred identical lines of: AR=-1ElT=0.000000Rev=0ResType=0 The file ExtractTimes_v5.7z doesn't seem to match the description "Info from client_state.xml extraction tool" |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Here it is: http://lunatics.kwsn.info/index.php?action=downloads;sa=view;down=498 Times.txt content? I checked for AVX CPU+iGPU MB task mix - it worked OK. P.S. Also check if you have some tasks completed and still not reported (suspend network communications to accumulate such tasks). Or your client_state.xml will not contain any info to extract SETI apps news We're not gonna fight them. We're gonna transcend them. |
Stubbles Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0 |
In case anyone is interested, I made a small script to import the output into Excel: cd /D C:\ProgramData\BOINC perl.exe ExtractTimes_v5.pl>null "C:\Program Files (x86)\Microsoft Office\Office12\EXCEL.EXE" .\times.txtJust copy those 3 lines into a file on your desktop with a filename that finishes with .cmd @Raistmer: Here's what I am currently getting: Task name Result Type Version Parameter Elaps CPUTime 06jn10aj.17549.53905.6.33.198_0 9 3484 0.374938 1,039.1 1,028.3 07jn10ab.29140.248450.3.30.124_1 9 3484 0.740539 746.9 481.4 07jn10ab.29140.248450.3.30.65_0 9 3484 0.740539 740.8 474.1 07jn10ab.29140.248450.3.30.163_0 9 3484 0.740539 739.8 474.5 07jn10ab.29140.248450.3.30.63_0 9 3484 0.740539 745.9 479.2 07jn10ab.29140.248450.3.30.13_0 9 3484 0.740539 739.9 474.7 05dc10ac.29918.149181.12.39.133_1 9 3484 0.34361 1,028.1 1,023.4 05dc10ac.29918.149181.12.39.224_0 9 3484 0.34361 987.3 982.0 07jn10ab.29140.248450.3.30.69_0 9 3484 0.740539 745.8 478.9 blc4_2bit_guppi_57451_64686_HIP117463_0009.20909.831.17.26.179.vlar_1 1 3330 0.008851 8,724.5 8,654.7What would the perfect report look like for you? I'm guessing you only want for: Result type = 9 (GPU?) Do you only want for guppi? or Arecibo_VLARs also? How many decimal points do you want for AR (Parameter), and times? Do you want it sorted by AR? Cheers, Rob :-) |
Stubbles Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0 |
Hey Raistmer! There might be a small bug with the Perl script: ExtractTimes_v5.pl When Boinc Client is in "Suspend network activity" as required by script above, on 3 occasions with large batches of tasks in status: "Uploading, it was listing 1to(3?) less GPU tasks than the ones I have in status: "Uploading" () The last time it occured, I investigated it when 1 GPU task was missing in Times.txt. When ordered by "Completed" times in BoincTasks' History tab, the task not in Times.txt is NOT the first one or the last one for the GPU batch. I haven't seen the issue affecting CPU tasks. I tested afterwards with small batches when the first task to finish is either a CPU or GPU task, but the issue didn't appear. I will let it run another big batch to see if it happens again. If it does, I'll stop the Boinc Client manually to see if it fixes the issue. Let me know if there's another scenario that I didn't think of, R :-) |
Stubbles Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0 |
Hey Raistmer! It happened again even after shutting down the Boinc Client; 1 CPU (3rd) & 1 GPU (6th) task are not showing up in Times.txt (I rearranged the BoincTasks output so that the columns line up) Time elapsed (CPU) Completed Status Name app 00:32:06 (00:31:56) 2016-Jul-14 8:18:22 PM Uploading 21dc10aa.23912.8656.9.36.35.vlar_1 8.12 setiathome_v8 (opencl_nvidia_SoG) 02:27:53 (02:24:56) 2016-Jul-14 8:14:34 PM Uploading blc4_2bit_guppi_57451_65998_HIP117463_0013.31811.0.18.27.222.vlar_0 8.00 setiathome_v8 02:12:04 (02:09:15) 2016-Jul-14 8:02:15 PM Uploading blc4_2bit_guppi_57451_65342_HIP117463_0011.22853.416.18.27.24.vlar_1 8.00 setiathome_v8 02:25:28 (02:22:48) 2016-Jul-14 7:58:15 PM Uploading blc4_2bit_guppi_57451_65998_HIP117463_0013.31811.0.18.27.221.vlar_1 8.00 setiathome_v8 02:36:26 (02:33:32) 2016-Jul-14 7:54:15 PM Uploading blc4_2bit_guppi_57451_65014_HIP117463_OFF_0010.31703.831.18.27.103.vlar_1 8.00 setiathome_v8 00:31:19 (00:31:08) 2016-Jul-14 7:41:15 PM Uploading 21dc10aa.23912.8656.9.36.17.vlar_1 8.12 setiathome_v8 (opencl_nvidia_SoG) 02:27:03 (02:24:16) 2016-Jul-14 7:35:33 PM Uploading blc4_2bit_guppi_57451_65670_HIP117463_OFF_0012.27821.416.18.27.169.vlar_0 8.00 setiathome_v8 00:32:04 (00:31:54) 2016-Jul-14 7:09:20 PM Uploading 21dc10aa.23912.8656.9.36.34.vlar_0 8.12 setiathome_v8 (opencl_nvidia_SoG) 00:20:16 (00:20:03) 2016-Jul-14 6:37:03 PM Uploading 08jn10af.16639.22971.10.37.87_1 8.12 setiathome_v8 (opencl_nvidia_SoG) FYI, I saved a copy of my: client_state.xml & Times.txt Also, on my rig running Cuda50, there are no GPU tasks reported in Times.txt Don't know if that is intentional or not. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Hey Raistmer! I never run script in BOINC's own dir. Copy xml file into another dir and run there. Also, check if task missed in Times.txt actually presents in client_state.xml and has stored stderr content in that file. SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.