Message boards :
AstroPulse :
AP 7.01 checkpoints wrong and cause lockup
Message board moderation
Author | Message |
---|---|
Send message Joined: 6 Mar 09 Posts: 8 Credit: 72,401 RAC: 0 ![]() |
Hi, I have my checkpoints set at 3 minutes. When the astro pulse 7.01 task ( opencln vidia 100)writes its first checkpoint, after about 25 minutes runtime or 5 sec cpu time, the progress is reset to 0.900 % (it was at about 1.5%)and no more increase happens.No further checkpoints are written. It uses 1 nvidia and 0.0896 CPUs |
Send message Joined: 6 Mar 09 Posts: 8 Credit: 72,401 RAC: 0 ![]() |
Correction: 2nd Checkpoint written at 11 sec cpu time, progress jumps to 1.801% but otherwise stuck(till the next checkpoint,i guess) |
Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0 ![]() |
The Astropulse GPU builds only show progress every 0.9%, up until the first checkpoint if an app doesn't report it's progress Boinc will estimate it, this might show as progress, then a drop to Zero, before showing real progress: http://boinc.berkeley.edu/gitweb/?p=boinc-v2.git;a=commit;h=9136a369d4e15cc727c06b55b50c833e184bf9fc client: if app doesn't report fraction done, estimate it http://boinc.berkeley.edu/gitweb/?p=boinc-v2.git;a=commit;h=34f252870310b18c7cbe3e71573daff6b01e768c client: if app doesn't report fraction done, estimate fraction done in a way that converges to but never reaches 100%. Claggy |
Send message Joined: 6 Mar 09 Posts: 8 Credit: 72,401 RAC: 0 ![]() |
However, the checkpoint intervals are not 3 minutes(as i set it) but more like 25minutes. Also, they are not at regular intervals(cpu and real time). It increases at every checkpoint by 0.901%. So, how do i know, howmuch progress is really made. More concerning is the long time between checkpoints. |
Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0 ![]() |
However, the checkpoint intervals are not 3 minutes(as i set it) but more like 25minutes. That is, every 3 minutes an app may checkpoint, NOT that it must checkpoint at 3 minutes, apps are programmed to checkpoint at particular points, if the app hasn't reached the point where it may checkpoint, then it can't, your GPU is very slow, you're just going to have to put up with it only checkpointing every 25 minutes. and if you keep interrupting it before it gets to the first checkpoint, then of cause it won't make progress ### Restart at 0.00 percent. Claggy |
Send message Joined: 6 Mar 09 Posts: 8 Credit: 72,401 RAC: 0 ![]() |
Yes, my gpu is slow. This is why the software needs to compensate for the vast amount of different gpu speeds, that are out there. Other projects can do that. It is important, not to loose to much computing time, when switching occurs. If this project does not want slow ish gpus, than exclude them. ALL Glory to the latest,greatest , most expensive cards that exits, f*** the rest. |
Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0 ![]() |
The project doesn't have the resources to make OpenCL GPU apps, so it is using Raistmer's builds. Those definitely don't waste time on non-essential things, and indeed have been somewhat aimed at high performance crunching. It of course would be possible to add more frequent progress updates contingent on a command line parameter, or perhaps automatic for slower GPUs. This is Beta testing, so requests for enhancement are certainly allowed. But please understand that Raistmer's efforts are volunteered and his view of what's most important may differ from yours. Joe |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
I would say it's quite misleading and irritating enough conclusion. Until recently efforts were made to support even pre-OpenCL ATi GPUs (via Brook+), not only almost whole range of OpenCL ones. Pay more attention to forum discussions or pay to some hired programmer instead to increase that support. Issue reporting and suggestions are welcomed, whining - not. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Yes, my gpu is slow. Before doing some advance in computations app need to do some preparations for that. At resume those preparations need to be repeated. This takes time. If slow GPU can't work uniterrupted long enough it will repeat those preparations only making no progress. In such case it's useless indeed for this project, but not because of GPU is slow, but because user of that GPU can't provide uniterrupted work for this GPU. Make your choice. Also, to save state (make checkpoint) some data should be returned from GPU to host memory. This is slowest memory channel through all. Hence checkpoint chosen to minimize or fully exclude such transfers. Making them will allow slowest GPU to make some progress (also even more slowly that currently provided they work uniterrupted), but also will slowdown ALL of other types of GPU. Hence it's not viable for generalized app. Separate app for slowest GPUs that can't work uniterrupted definitely technically possible. But Resources for such development should be provided. Average consumer approach will not work for volunteer. One can blame corporation for ugly drivers they release - they sell devices that just trash w/o drivers. But here we don't make profit. Wanna help - you are welcomed. But it's not consumer support forum. News about SETI opt app releases: https://twitter.com/Raistmer |
©2021 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.