Message boards :
News :
Distributing 4-bit workunits
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,266,661 RAC: 8 ![]() |
The displayed exit codes are somewhat erratic at the moment, following a botched attempt to clean them up a couple of weeks ago. Eric, if you could possibly deploy Christian's second attempt at a cleanup (this morning) when you have a moment, I think we should get better tools for diagnosis. Most of the error/exit numbers reported here should be -1, I think. |
![]() Send message Joined: 10 Mar 12 Posts: 1659 Credit: 12,789,696 RAC: 7,691 ![]() ![]() |
And here comes another batch of these crazy errors. This time destroying my chances to get anything more for a long time on the opencl_nvidia_sah plan_class. http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24092008 http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24091989 http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24092024 http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24092001 http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24092045 So, now the "Max tasks per day", is destroyed for both opencl_nvidia_sah, and opencl_nvidia_SoG. No more tasks for me. Thanks for the coffee :-) |
Send message Joined: 15 Jun 16 Posts: 45 Credit: 1,836,741 RAC: 0 ![]() |
And here comes another batch of these crazy errors. I notice WU8655361 now has a CPU processing it. Will be interesting to see what result the CPU gives; and to see if the opencl_intel_gpu_sah x86_64-pc-linux-gnu system errors out like the other OpenCL systems have, or if it gives the same result as the FX 8350 Windows CPU system. Grant Darwin NT. |
![]() Send message Joined: 10 Mar 12 Posts: 1659 Credit: 12,789,696 RAC: 7,691 ![]() ![]() |
And here comes another batch of these crazy errors. Well, it seems as if CPU's always returns "-9 result_overflow" on these tasks, and GPU's always returns "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance" Just got another one, this time on the iGPU: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24092537 I think something needs to be fixed/adjusted, when it comes to these kind of tasks. |
![]() Send message Joined: 10 Mar 12 Posts: 1659 Credit: 12,789,696 RAC: 7,691 ![]() ![]() |
Wow, now it's all VLARs, both Arecibo and Guppi. I just checked my tasks, and wondered why they all were taking such a long time to crunch. Now I know :-) It's OK, I'm not complaining. |
Send message Joined: 15 Jun 16 Posts: 45 Credit: 1,836,741 RAC: 0 ![]() |
Well, it seems as if CPU's always returns "-9 result_overflow" on these tasks, and GPU's always returns "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance" It's the way the OpenCL application handles them. With CUDA the result is- SETI@Home Informational message -9 result_overflow NOTE: The number of results detected equals the storage space allocated. It just treats it as a noisy WU, where as the OpenCL application treats it as an error. 8655374 If I don't get a CPU or CUDA wingman soon i'll lose my couple of Credits. Or do you get Credit if you're the only one to complete a WU? Grant Darwin NT. |
Send message Joined: 15 Jun 16 Posts: 45 Credit: 1,836,741 RAC: 0 ![]() |
Wow, now it's all VLARs, both Arecibo and Guppi. I just checked my tasks, and wondered why they all were taking such a long time to crunch. Also explains why my system has become a bit laggy when typing & scrolling. And yeah, Aricebo VLARs really do make the GPU have to work for it. Grant Darwin NT. |
![]() Send message Joined: 10 Mar 12 Posts: 1659 Credit: 12,789,696 RAC: 7,691 ![]() ![]() |
Yup, they are tough, and I really hope the toaster is worth it :-) WARNING!! "THIS IS A SIGNATURE", of the "IT MAY CHANGE AT ANY MOMENT" type. It may, or may not be considered insulting, all depending upon HOW SENSITIVE THE VIEWER IS, to certain inputs to/from the nervous system. |
![]() Send message Joined: 10 Mar 12 Posts: 1659 Credit: 12,789,696 RAC: 7,691 ![]() ![]() |
Geeze, that all VLARs storm was painful. APR crashing for all GPU apps being run during that storm. Now it seems to be over though. Of course GUPPIs are still VLARs, but they are not half as painful to crunch on the GPU as Arecibo VLARs are. WARNING!! "THIS IS A SIGNATURE", of the "IT MAY CHANGE AT ANY MOMENT" type. It may, or may not be considered insulting, all depending upon HOW SENSITIVE THE VIEWER IS, to certain inputs to/from the nervous system. |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 26,981,856 RAC: 717 ![]() ![]() |
I'm beginning to think we should double the size of the "GPU sanity check" threshold on subsequent versions. ![]() |
Send message Joined: 16 Apr 06 Posts: 13 Credit: 335,690 RAC: 86 ![]() |
I'm beginning to think we should double the size of the "GPU sanity check" threshold on subsequent versions. Can somebody please explain to me what this means Out of interest is doubling the work unit size being down to lower the amount of returned to the servers? |
![]() Send message Joined: 11 Nov 12 Posts: 853 Credit: 2,999,363 RAC: 267 ![]() |
I meant to ask this before but on my GTX560, the SoG units are running at 95% CPU. Any way to lower this? CUDA 50 run at less tan 20% CPU PS I am running stock. |
Send message Joined: 15 Jun 16 Posts: 45 Credit: 1,836,741 RAC: 0 ![]() |
I meant to ask this before but on my GTX560, the SoG units are running at 95% CPU. Running stock myself. The "-use sleep" option reduces CPU usage (for longer run times), someone else will need to explain were it goes (I suspect in the mb_cmdline-8.12_windows_intel__opencl_nvidia_SoG.txt file in the C:\ProgramData\BOINC\projects\setiweb.ssl.berkeley.edu_beta folder, but it's just a guess). On my system I just freed up 1 CPU core; with my 2 GPUs things are still a little sluggish if there are 2 SoG WUs running, but the rest of the time things aren't too bad. Grant Darwin NT. |
Send message Joined: 15 Jun 16 Posts: 45 Credit: 1,836,741 RAC: 0 ![]() |
Out of interest is doubling the work unit size being down to lower the amount of returned to the servers? Nope, apparently it helps reduce the level of noise in the WUs when crunching, making it easier to find signals, and reduce the likely hood of finding false signals). Grant Darwin NT. |
![]() Send message Joined: 11 Nov 12 Posts: 853 Credit: 2,999,363 RAC: 267 ![]() |
I meant to ask this before but on my GTX560, the SoG units are running at 95% CPU. Yes I have the "-use_sleep" option on my machines at main, but was not sure if I coukd use it here, or indeed which file to use it in. I have tried the file you suggested as that was where I was thinking as well. This machine is only an old dual core, and unthinkingly I let the other core crunch!! |
![]() Send message Joined: 11 Nov 12 Posts: 853 Credit: 2,999,363 RAC: 267 ![]() |
I meant to ask this before but on my GTX560, the SoG units are running at 95% CPU. Well unless I did it wrong I don't think "-use_sleep" goes in the mb_cmdline-8.12_windows_intel__opencl_nvidia_SoG.txt file, that seems to put the GPU to sleep and the time left to quickly rise to over 4000 hours!! |
Send message Joined: 30 Dec 13 Posts: 258 Credit: 12,340,341 RAC: 4 ![]() |
Thing is, since you are running stock, there isn't a commadline txt file for where to put that command. Now, if you were to install an app_info with instructions on where to find all the different parts, then you would have a reference line to the txt file for commandlines. But since they want non-app_info and non app_config runs, I don't suggest you put those in. Yes, I did that when they first started these and got reprimanded for doing such, lol... |
![]() Send message Joined: 11 Nov 12 Posts: 853 Credit: 2,999,363 RAC: 267 ![]() |
Yes I realise now, unfortunately I see no reason for a GPU WU to use 95% of the CPU and it seems the "opencl_nvidia_sah" does the same. So I will set NNT and let the units crunch till all finished then gracefully retire. |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,266,661 RAC: 8 ![]() |
Actually, if everyone would learn to read client_state.xml, there's a perfectly good <file_ref> <file_name>mb_cmdline-8.12_windows_intel__opencl_nvidia_sah.txt</file_name> <open_name>mb_cmdline.txt</open_name> </file_ref> referenced in the app_version for the 'opencl_nvidia_sah' plan class, and I have no doubt there would be another one for 'opencl_nvidia_SoG' as well if you make the obvious replacement. And once you know which version-specific file name you're looking for, you can quickly find mb_cmdline-8.12_windows_intel__opencl_nvidia_sah.txt in the project directory. All there, ready for use. (yes, all that verified by doing a "project reset" on a dormant host, and then allowing work to download into the empty directory. Everything copied and pasted from the resulting file downloads, except the letters S-o-G in my presumption about the other app under testing) |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
I'm beginning to think we should double the size of the "GPU sanity check" threshold on subsequent versions. And this would mean just to disable autocorr sanity check completely. Cause there were common lower values for misbehaving GPUs with Arecibo tasks. I'll block this check in next build. News about SETI opt app releases: https://twitter.com/Raistmer |
©2019 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.