Message boards :
Number crunching :
SETI applications for NVIDIA GPU improvement - how you can help
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 14 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
For those who experienced GUI lags with stock build: please try this one: https://cloud.mail.ru/public/GNQz/oRyyF1VQp it hopefully has usability improvements for those GPUs than left on the edge on prev release. Also, it could be faster than current stock. for example, this is the test of ATi siblings on my C-60: WU : PG0009_v7.wu MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3472.exe : Elapsed 1173.279 secs CPU 195.828 secs setiathome_8.12_windows_intelx86__opencl_ati5_sah.exe : Elapsed 1740.647 secs CPU 304.810 secs SETI apps news We're not gonna fight them. We're gonna transcend them. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Raistmer, you labelled the exe wrong. with all previous versions it was OpenCl_NV_r34XX_SoG.exe you changed this so it now reads OpenCl_NV_SoG_r34XX.exe I didn't notice this when I downloaded it and install as I thought you had followed the usual nomenclature. However, this cause a dumping all my current SoG work and errors listed in the event log. I was able to trace the error and noticed the discrepancy. Don't know if you might want to correct this before others try this and get all their work units dumped |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
executable name and its link inside aistub are the same or different? [If I will go to auto-creation of archive the rev number will be the last, just as in this build] SETI apps news We're not gonna fight them. We're gonna transcend them. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
executable name and its link inside aistub are the same or different? I correct the app_info with the changed revision number and then import everything from the zip folder. I've never used the aistub before so can't comment on that. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I noticed that as well, when upgrading from r3430 to r3472 for Beta3. Like Zalster, I base the installer aistubs on my own previous work, rather than starting from scratch with the packaged aistub - but I've learned to do the search/replace on the complete file name, not just the revision number. Even so, it took me a while to work out why the display flickered more than usual when I did it... MB8_win_x86_SSE3_OpenCL_NV_r3430_SoG.exe MB8_win_x86_SSE3_OpenCL_NV_SoG_r3472.exe But I've just checked the release package, and it is internally self-consistent. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Lets explore -use_sleep_ex N option little more. Here is ATI HD5 build for that: https://cloud.mail.ru/public/Lf64/zuZPHgv5y It's "verbose" one so don't expect amazing performance from it but it can tell you smth new about your system. Proposed usage: to add -use_sleep_ex 1 (or any other sleep time you want to explore) -v 6 (here 6 is mandatory cause exactly verbosity level 6 reserved for timing output). What output looks like and what to observe: starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=32768, deltaP3=547, NDRange={8,1,64}, WG={8,1,8},single_period_size=1.3MB, WG num=8, CU num=2 Partial PulseFind_3 (before buffer read): Awaited 36 iterations for completion Kernel PULSE_PARTIAL execution time: 537.014(ms); min=537.014(ms); max=537.014(ms); mean=537.014(ms); sleep=1(ms); delta=547; Niterations=1 starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=16384, deltaP3=273, NDRange={16,1,32}, WG={16,1,4},single_period_size=1.3MB, WG num=8, CU num=2 Partial PulseFind_3 (before buffer read): Awaited 19 iterations for completion Kernel PULSE_PARTIAL execution time: 279.959(ms); min=279.959(ms); max=279.959(ms); mean=279.959(ms); sleep=1(ms); delta=273; Niterations=1 starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=16384, deltaP3=272, NDRange={16,1,32}, WG={16,1,4},single_period_size=1.3MB, WG num=8, CU num=2 Partial PulseFind_3 (before buffer read): Awaited 18 iterations for completion Kernel PULSE_PARTIAL execution time: 254.941(ms); min=254.941(ms); max=279.959(ms); mean=267.45(ms); sleep=1(ms); delta=272; Niterations=2 And so on. For experiment I propose only let say first 10 such output items are needed. I added basic profiling abilities so now app can print time spent for particular kernel (in this case - partial PulseFind one). take "execution time" and divide it to "awaited iterations for completion". You will get estimate of how long single iteration takes. What I found so far on my C-60: if -use_sleep_ex 0 used iteration time ~0.75us So, Sleep(0)+ overhead from event handling low enough. if -use_sleep_ex 1 used iteration time ~15ms (!) So, Sleep(1) takes ~15ms instead of promised 1ms (!!!!) You can imagine why -use_sleep that uses Sleep(1) has so big impact on performance on high-performanceGPU hosts. To get adequate estimate make single kernel call last enough time to provide iterations count ~10 or more. As one can see if kernel run for let say 7us and still single iteration done one can get Sleep(1) time estimation ~7us instead of anything real. From other side if kernel takes (like in my example above) 537ms and 36 Sleep(1) iterations done for it 537/36~15 (ms) - estimation I got. To increase single kernel call execution time decrease value of -period_iterations_num N option as usual. For this example I used 10 for my C-60. More speedy cards will require value of 1 and artifical slowdown such as -sbs 48 or smth alike. What to test: how real sleep time will scale with increase of Sleep(N) N value? how it will react on increased priority (-hp switch added to command line). 15 ms is very character time - its ~size of OS time quantum. With increased priority it can change cause process can preemt baing high-priority one - need to check. Also, my host is AMD old generation APU. Maybe other families will handle Sleep(1) better?... I did similar testing some time ago but there was no direct profiling info from kernel that time. Now we have such info right in stderr. P.S. And, of course, use VLAR tasks cause VLAR has biggest PulseFind kernels. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Hi Raistmer, . . I think I have dropbox sorted out, here are links to the result text files. https://www.dropbox.com/home?preview=stderr_local_slot+0.zip https://www.dropbox.com/home?preview=stderr_trial2_last+WU+error.zip https://www.dropbox.com/home?preview=stderr_trial2_WU2_48per.zip . . I am slow but I think I might get everything working. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
thanks. Soon will be more modern NV build (like AMD Hd5 one I posted recently) to explore. Look prev post how to manage that testing. EDIT: unfortunately, your links require Dropbox login. Make them public ones. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Rasputin42 Send message Joined: 25 Jul 08 Posts: 412 Credit: 5,834,661 RAC: 0 |
It's "verbose" one so don't expect amazing performance from it but it can tell you smth new about your system. If that makes it slow, would it not also affect the figures, it is supposed to report? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It's "verbose" one so don't expect amazing performance from it but it can tell you smth new about your system. No. I doesn't. It make it slow overall due to added output overhead. But each kernel call executes on same speed as before so you get correct info about kernel execution times. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
thanks. Soon will be more modern NV build (like AMD Hd5 one I posted recently) to explore. Look prev post how to manage that testing. . . This should work. https://www.dropbox.com/s/63pvtzut2dh1hnt/stderr_trial2_last%20WU%20error.zip?dl=0 https://www.dropbox.com/s/nmwx8re4xpm4bt6/stderr_local_slot%200.zip?dl=0 https://www.dropbox.com/s/s7h3p99w68w0juu/stderr_trial2_WU2_48per.zip?dl=0 . . Hope this helps. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
They are downloadable, thanks. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Here is NV sibling of posted earlier HD5 ATi build. https://cloud.mail.ru/public/HUAE/soM11FDVh Please look this post for info what to do with it. So far I found that on my C-60 changing -use_sleep_ex N from 1 to 4 including almost doesn't change real sleep time. It remains ~15ms. How it will react on CPU load and priority change - to be explored. P.S. here is small Perl script for relevant data extraction from stderr.txt: $path="stderr.txt"; $results="times_iterations.txt"; open (RES, ">".$results); open (IN, $path); while (<IN>) { if(/Partial PulseFind_3(.*)Awaited (\d+) iterations/){ @iterations=(@iterations,$2); } if(/Kernel PULSE_PARTIAL execution time: (\d+\.\d+)/ || /Kernel PULSE_PARTIAL execution time: (\d+)/ ){ @exec_time=(@exec_time,$1); } } print RES "excec_time\titerations\n"; foreach $iter (@iterations){ print RES $exec_time[$i]."\t".$iterations[$i]."\n"; $i++; } SETI apps news We're not gonna fight them. We're gonna transcend them. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Here is NV sibling of posted earlier HD5 ATi build. . . OK, to which file do I add that script? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Here is NV sibling of posted earlier HD5 ATi build. . . OK . . I have downloaded the new version r3475 and extracted it to a sub-folder on the seti drive. I have edited the file: . . mb_cmdline_win_x86_SSE3_OpenCL_NV.txt with the command line . . -use_sleep_ex 1 -sbs 256 -v 6 -period_iterations_num 100 . . I have saved the script you posted to this notepad file: https://www.dropbox.com/s/yus8dzjuoyny0ik/Raistmer_Perl_script.txt?dl=0 . . I can change back to SoG with 0.45 installer Beta(3) but how do I make it use this app instead of the included r3472? And where do I add the script file? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
. . I can change back to SoG with 0.45 installer Beta(3) but how do I make it use this app instead of the included r3472? And where do I add the script file? If you have Perl it can be used for speedup data extraction from stderr.txt. If not do it by hands. here http://lunatics.kwsn.info/index.php?action=downloads;sa=view;down=497 I put some small Perl interpreter in pack. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Sleeping behavior greatly reworked. I updated corresponding post about this option( http://lunatics.kwsn.info/index.php/topic,1808.msg60933.html#msg60933 ). New builds to test usability of new approach to sleep will be awailable soon, stay tuned. SETI apps news We're not gonna fight them. We're gonna transcend them. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
     I can change back to SoG with 0.45 installer Beta(3) but how do I make it use this app instead of the included r3472? Copy the files from the new package to SETI@home directory (<BOINC_Data>\projects\setiathome.berkeley.edu\)     (probably you can skip the .dll files - they should be the same) Edit app_info.xml with Notepad Global replace (Ctrl+Home Ctrl+H) the old .exe name by the new MB8_win_x86_SSE3_OpenCL_NV_r3475.exe Global replace (Ctrl+Home Ctrl+H) the old MultiBeam_Kernels_rXXXX.cl name by the new MultiBeam_Kernels_r3475.cl (Don't type anything, only use Copy/Paste from real filenames to avoid mistakes.) Save the edited app_info.xml Restart BOINC P.S. Don't use the included MB8_win_x86_SSE3_OpenCL_NV.aistub since it have only: <version_num>800</version_num>  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
     I can change back to SoG with 0.45 installer Beta(3) but how do I make it use this app instead of the included r3472? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
faster version of timing extraction script: $path="stderr.txt"; $results="times_iterations.txt"; open (RES, ">".$results); open (IN, $path); print RES "exec_time\titerations\n"; while (<IN>) { if(/Partial PulseFind_3(.*)Awaited (\d+) iterations/){ $iter=$2; } else{ if(/Kernel PULSE_PARTIAL execution time: (\d+\.\d+)/ || /Kernel PULSE_PARTIAL execution time: (\d+)/ ){ print RES $1."\t".$iter."\n"; }} } SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.