Distributing 4-bit workunits

Message boards : News : Distributing 4-bit workunits
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 0
United Kingdom
Message 58665 - Posted: 17 Jun 2016, 22:42:36 UTC

The displayed exit codes are somewhat erratic at the moment, following a botched attempt to clean them up a couple of weeks ago.

Eric, if you could possibly deploy Christian's second attempt at a cleanup (this morning) when you have a moment, I think we should get better tools for diagnosis.

Most of the error/exit numbers reported here should be -1, I think.
ID: 58665 · Report as offensive
.
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1659
Credit: 12,549,090
RAC: 6,819
Sweden
Message 58668 - Posted: 17 Jun 2016, 23:36:26 UTC

And here comes another batch of these crazy errors. This time destroying my chances to get anything more for a long time on the opencl_nvidia_sah plan_class.

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24092008
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24091989
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24092024
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24092001
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24092045

So, now the "Max tasks per day", is destroyed for both opencl_nvidia_sah, and opencl_nvidia_SoG.

No more tasks for me. Thanks for the coffee :-)
ID: 58668 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 15 Jun 16
Posts: 45
Credit: 1,836,741
RAC: 0
Australia
Message 58669 - Posted: 18 Jun 2016, 2:02:24 UTC - in response to Message 58668.  

And here comes another batch of these crazy errors.


I notice WU8655361 now has a CPU processing it. Will be interesting to see what result the CPU gives; and to see if the opencl_intel_gpu_sah x86_64-pc-linux-gnu system errors out like the other OpenCL systems have, or if it gives the same result as the FX 8350 Windows CPU system.
Grant
Darwin NT.
ID: 58669 · Report as offensive
.
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1659
Credit: 12,549,090
RAC: 6,819
Sweden
Message 58671 - Posted: 18 Jun 2016, 16:08:21 UTC - in response to Message 58669.  

And here comes another batch of these crazy errors.


I notice WU8655361 now has a CPU processing it. Will be interesting to see what result the CPU gives; and to see if the opencl_intel_gpu_sah x86_64-pc-linux-gnu system errors out like the other OpenCL systems have, or if it gives the same result as the FX 8350 Windows CPU system.

Well, it seems as if CPU's always returns "-9 result_overflow" on these tasks, and GPU's always returns "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance"

Just got another one, this time on the iGPU:
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24092537

I think something needs to be fixed/adjusted, when it comes to these kind of tasks.
ID: 58671 · Report as offensive
.
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1659
Credit: 12,549,090
RAC: 6,819
Sweden
Message 58672 - Posted: 18 Jun 2016, 19:18:56 UTC
Last modified: 18 Jun 2016, 19:20:27 UTC

Wow, now it's all VLARs, both Arecibo and Guppi. I just checked my tasks, and wondered why they all were taking such a long time to crunch.

Now I know :-)

It's OK, I'm not complaining.
ID: 58672 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 15 Jun 16
Posts: 45
Credit: 1,836,741
RAC: 0
Australia
Message 58673 - Posted: 18 Jun 2016, 23:08:33 UTC - in response to Message 58671.  
Last modified: 18 Jun 2016, 23:14:14 UTC

Well, it seems as if CPU's always returns "-9 result_overflow" on these tasks, and GPU's always returns "ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance"

It's the way the OpenCL application handles them.
With CUDA the result is-
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected equals the storage space allocated.
It just treats it as a noisy WU, where as the OpenCL application treats it as an error.

8655374
If I don't get a CPU or CUDA wingman soon i'll lose my couple of Credits.
Or do you get Credit if you're the only one to complete a WU?
Grant
Darwin NT.
ID: 58673 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 15 Jun 16
Posts: 45
Credit: 1,836,741
RAC: 0
Australia
Message 58676 - Posted: 19 Jun 2016, 0:21:53 UTC - in response to Message 58672.  

Wow, now it's all VLARs, both Arecibo and Guppi. I just checked my tasks, and wondered why they all were taking such a long time to crunch.

Now I know :-)

It's OK, I'm not complaining.

Also explains why my system has become a bit laggy when typing & scrolling.

And yeah, Aricebo VLARs really do make the GPU have to work for it.
Grant
Darwin NT.
ID: 58676 · Report as offensive
.
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1659
Credit: 12,549,090
RAC: 6,819
Sweden
Message 58677 - Posted: 19 Jun 2016, 0:33:47 UTC - in response to Message 58676.  


And yeah, Aricebo VLARs really do make the GPU have to work for it.

Yup, they are tough, and I really hope the toaster is worth it :-)
WARNING!! "THIS IS A SIGNATURE", of the "IT MAY CHANGE AT ANY MOMENT" type. It may, or may not be considered insulting, all depending upon HOW SENSITIVE THE VIEWER IS, to certain inputs to/from the nervous system.
ID: 58677 · Report as offensive
.
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1659
Credit: 12,549,090
RAC: 6,819
Sweden
Message 58679 - Posted: 19 Jun 2016, 19:31:13 UTC

Geeze, that all VLARs storm was painful. APR crashing for all GPU apps being run during that storm.

Now it seems to be over though. Of course GUPPIs are still VLARs, but they are not half as painful to crunch on the GPU as Arecibo VLARs are.
WARNING!! "THIS IS A SIGNATURE", of the "IT MAY CHANGE AT ANY MOMENT" type. It may, or may not be considered insulting, all depending upon HOW SENSITIVE THE VIEWER IS, to certain inputs to/from the nervous system.
ID: 58679 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 26,960,376
RAC: 1,429
United States
Message 58680 - Posted: 19 Jun 2016, 20:16:27 UTC

I'm beginning to think we should double the size of the "GPU sanity check" threshold on subsequent versions.
ID: 58680 · Report as offensive
Speedy
Volunteer tester

Send message
Joined: 16 Apr 06
Posts: 13
Credit: 335,597
RAC: 888
New Zealand
Message 58682 - Posted: 20 Jun 2016, 5:29:17 UTC - in response to Message 58680.  
Last modified: 20 Jun 2016, 5:29:46 UTC

I'm beginning to think we should double the size of the "GPU sanity check" threshold on subsequent versions.

Can somebody please explain to me what this means
Out of interest is doubling the work unit size being down to lower the amount of returned to the servers?
ID: 58682 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester

Send message
Joined: 11 Nov 12
Posts: 851
Credit: 2,992,010
RAC: 238
United Kingdom
Message 58684 - Posted: 20 Jun 2016, 7:22:11 UTC

I meant to ask this before but on my GTX560, the SoG units are running at 95% CPU.

Any way to lower this?

CUDA 50 run at less tan 20% CPU

PS I am running stock.
ID: 58684 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 15 Jun 16
Posts: 45
Credit: 1,836,741
RAC: 0
Australia
Message 58685 - Posted: 20 Jun 2016, 8:07:05 UTC - in response to Message 58684.  

I meant to ask this before but on my GTX560, the SoG units are running at 95% CPU.

Any way to lower this?

CUDA 50 run at less tan 20% CPU

PS I am running stock.

Running stock myself.
The "-use sleep" option reduces CPU usage (for longer run times), someone else will need to explain were it goes (I suspect in the mb_cmdline-8.12_windows_intel__opencl_nvidia_SoG.txt file in the C:\ProgramData\BOINC\projects\setiweb.ssl.berkeley.edu_beta folder, but it's just a guess).
On my system I just freed up 1 CPU core; with my 2 GPUs things are still a little sluggish if there are 2 SoG WUs running, but the rest of the time things aren't too bad.
Grant
Darwin NT.
ID: 58685 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 15 Jun 16
Posts: 45
Credit: 1,836,741
RAC: 0
Australia
Message 58686 - Posted: 20 Jun 2016, 8:09:26 UTC - in response to Message 58682.  

Out of interest is doubling the work unit size being down to lower the amount of returned to the servers?

Nope, apparently it helps reduce the level of noise in the WUs when crunching, making it easier to find signals, and reduce the likely hood of finding false signals).
Grant
Darwin NT.
ID: 58686 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester

Send message
Joined: 11 Nov 12
Posts: 851
Credit: 2,992,010
RAC: 238
United Kingdom
Message 58689 - Posted: 20 Jun 2016, 12:53:45 UTC - in response to Message 58685.  

I meant to ask this before but on my GTX560, the SoG units are running at 95% CPU.

Any way to lower this?

CUDA 50 run at less tan 20% CPU

PS I am running stock.

Running stock myself.
The "-use sleep" option reduces CPU usage (for longer run times), someone else will need to explain were it goes (I suspect in the mb_cmdline-8.12_windows_intel__opencl_nvidia_SoG.txt file in the C:\ProgramData\BOINC\projects\setiweb.ssl.berkeley.edu_beta folder, but it's just a guess).
On my system I just freed up 1 CPU core; with my 2 GPUs things are still a little sluggish if there are 2 SoG WUs running, but the rest of the time things aren't too bad.


Yes I have the "-use_sleep" option on my machines at main, but was not sure if I coukd use it here, or indeed which file to use it in.

I have tried the file you suggested as that was where I was thinking as well.

This machine is only an old dual core, and unthinkingly I let the other core crunch!!
ID: 58689 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester

Send message
Joined: 11 Nov 12
Posts: 851
Credit: 2,992,010
RAC: 238
United Kingdom
Message 58692 - Posted: 20 Jun 2016, 15:38:43 UTC - in response to Message 58689.  

I meant to ask this before but on my GTX560, the SoG units are running at 95% CPU.

Any way to lower this?

CUDA 50 run at less tan 20% CPU

PS I am running stock.

Running stock myself.
The "-use sleep" option reduces CPU usage (for longer run times), someone else will need to explain were it goes (I suspect in the mb_cmdline-8.12_windows_intel__opencl_nvidia_SoG.txt file in the C:\ProgramData\BOINC\projects\setiweb.ssl.berkeley.edu_beta folder, but it's just a guess).
On my system I just freed up 1 CPU core; with my 2 GPUs things are still a little sluggish if there are 2 SoG WUs running, but the rest of the time things aren't too bad.


Yes I have the "-use_sleep" option on my machines at main, but was not sure if I coukd use it here, or indeed which file to use it in.

I have tried the file you suggested as that was where I was thinking as well.

This machine is only an old dual core, and unthinkingly I let the other core crunch!!

Well unless I did it wrong I don't think "-use_sleep" goes in the mb_cmdline-8.12_windows_intel__opencl_nvidia_SoG.txt file, that seems to put the GPU to sleep and the time left to quickly rise to over 4000 hours!!
ID: 58692 · Report as offensive
Zalster
Volunteer tester

Send message
Joined: 30 Dec 13
Posts: 258
Credit: 12,340,341
RAC: 59
United States
Message 58693 - Posted: 20 Jun 2016, 17:02:02 UTC - in response to Message 58692.  

Thing is, since you are running stock, there isn't a commadline txt file for where to put that command.

Now, if you were to install an app_info with instructions on where to find all the different parts, then you would have a reference line to the txt file for commandlines.

But since they want non-app_info and non app_config runs, I don't suggest you put those in.

Yes, I did that when they first started these and got reprimanded for doing such, lol...
ID: 58693 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester

Send message
Joined: 11 Nov 12
Posts: 851
Credit: 2,992,010
RAC: 238
United Kingdom
Message 58694 - Posted: 20 Jun 2016, 17:27:25 UTC
Last modified: 20 Jun 2016, 17:29:01 UTC

Yes I realise now, unfortunately I see no reason for a GPU WU to use 95% of the CPU and it seems the "opencl_nvidia_sah" does the same.

So I will set NNT and let the units crunch till all finished then gracefully retire.
ID: 58694 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 0
United Kingdom
Message 58695 - Posted: 20 Jun 2016, 18:02:21 UTC

Actually, if everyone would learn to read client_state.xml, there's a perfectly good

    <file_ref>
        <file_name>mb_cmdline-8.12_windows_intel__opencl_nvidia_sah.txt</file_name>
        <open_name>mb_cmdline.txt</open_name>
    </file_ref>

referenced in the app_version for the 'opencl_nvidia_sah' plan class, and I have no doubt there would be another one for 'opencl_nvidia_SoG' as well if you make the obvious replacement.

And once you know which version-specific file name you're looking for, you can quickly find

mb_cmdline-8.12_windows_intel__opencl_nvidia_sah.txt

in the project directory. All there, ready for use.

(yes, all that verified by doing a "project reset" on a dormant host, and then allowing work to download into the empty directory. Everything copied and pasted from the resulting file downloads, except the letters S-o-G in my presumption about the other app under testing)
ID: 58695 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 58696 - Posted: 20 Jun 2016, 19:04:17 UTC - in response to Message 58680.  

I'm beginning to think we should double the size of the "GPU sanity check" threshold on subsequent versions.

And this would mean just to disable autocorr sanity check completely.
Cause there were common lower values for misbehaving GPUs with Arecibo tasks.
I'll block this check in next build.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 58696 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : News : Distributing 4-bit workunits


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.