Message boards :
SETI@home Enhanced :
Occasional WUs stall
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 16 Aug 05 Posts: 79 Credit: 71,936,490 RAC: 0 ![]() |
Occasionally some WUs stall; sending 'quit' to offending Boinc restarts WU. Unfortunately, this happens on only one computer (ID: 58772, the only one I have with AVX) and only on WUs processed on CPU. Here are the messages: Nov 11, 2012 2:02:06 PM desktopapplication1.DesktopApplication1$MyThread run SEVERE: WU 05ap10al.12456.3753.140733193388039.14.158 is not advancing on 192.168.1.81 Prev: 0.320191, Present: 0.320191 Fraction done in 900.703605 seconds. Nov 11, 2012 2:02:27 PM desktopapplication1.DesktopApplication1$MyThread sendQuit SEVERE: Sent quit to 192.168.1.81 Nov 11, 2012 5:39:45 PM desktopapplication1.DesktopApplication1$MyThread run SEVERE: WU 05ap10al.8909.16841.140733193388038.14.111 is not advancing on 192.168.1.81 Prev: 0.339201, Present: 0.339201 Fraction done in 900.675553 seconds. Nov 11, 2012 5:40:05 PM desktopapplication1.DesktopApplication1$MyThread sendQuit SEVERE: Sent quit to 192.168.1.81 Nov 10, 2012 5:21:15 PM desktopapplication1.DesktopApplication1$MyThread run SEVERE: WU 05ap10al.31522.1708.140733193388038.14.39 is not advancing on 192.168.1.81 Prev: 0.395996, Present: 0.395996 Fraction done in 965.567264 seconds. Nov 10, 2012 5:21:36 PM desktopapplication1.DesktopApplication1$MyThread sendQuit SEVERE: Sent quit to 192.168.1.81 Nov 10, 2012 10:53:03 PM desktopapplication1.DesktopApplication1$MyThread run SEVERE: WU 05ap10al.5819.11115.140733193388037.14.231 is not advancing on 192.168.1.81 Prev: 0.405652, Present: 0.405652 Fraction done in 916.725705 seconds. Nov 10, 2012 10:53:23 PM desktopapplication1.DesktopApplication1$MyThread sendQuit SEVERE: Sent quit to 192.168.1.81 Nov 10, 2012 11:08:40 PM desktopapplication1.DesktopApplication1$MyThread run SEVERE: WU 05ap10al.8909.16432.6.14.10 is not advancing on 192.168.1.81 Prev: 0.215151, Present: 0.210665 Fraction done in 936.702477 seconds. Nov 10, 2012 11:08:40 PM desktopapplication1.DesktopApplication1$MyThread sendQuit SEVERE: Sent quit to 192.168.1.81 Nov 11, 2012 6:30:58 AM desktopapplication1.DesktopApplication1$MyThread run SEVERE: WU ap_22my12ad_B5_P0_00179_20121105_13391.wu is not advancing on 192.168.1.81 Prev: 0.225225, Present: 0.225225 Fraction done in 901.340058 seconds. Nov 11, 2012 6:31:19 AM desktopapplication1.DesktopApplication1$MyThread sendQuit SEVERE: Sent quit to 192.168.1.81 |
Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0 ![]() |
Occasionally some WUs stall; sending 'quit' to offending Boinc restarts WU. Unfortunately, this happens on only one computer (ID: 58772, the only one I have with AVX) and only on WUs processed on CPU. Here are the messages: I've turned the WU names into links to your task details, for the convenience of anyone else who wants to see the outcome, etc. So far, I haven't been able to spot anything unusual about the chosen functions or other things in the stderr information. Whatever the problem, it differs from the long time one where a few hosts sometimes hang within the function testing. The last one you listed is of course an AP v6 task done on GPU, all that proves is nothing is ever as near perfect as we hope. I'm not familiar with whatever you're using as a watchdog timer. It's certainly a good idea, can you clarify?. I tried to get Dr. Anderson to implement an option in the BOINC API which would have served that function, but he didn't see the need. Joe |
©2023 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.