Message boards :
News :
Distributing 4-bit workunits
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Send message Joined: 15 Jun 16 Posts: 45 Credit: 1,836,741 RAC: 0 ![]() |
For me SaH caused some Error while computing. Never saw those with the SoG. Are you running multiple WUs at the same time? Here in Beta I've left everything alone, so it's all stock with default settings. At the moment average turnaround time isn't a good indicator as my account settings were copied over from Main, and I've since dropped my cache down to a bit under 4 hours. But by application; Application Average processing rate SETI@home v8 8.01 windows_intelx86 (cuda42) 161.70 SETI@home v8 8.01 windows_intelx86 (cuda50) 143.28 SETI@home v8 8.12 windows_intelx86 (opencl_nvidia_sah) 124.11 SETI@home v8 8.12 windows_intelx86 (opencl_nvidia_SoG) 102.56 My initial batch of work was all SoG, and all Guppie, bar 1 WU which was CUDA50 and 24mr10ac. I don't know what happened overnight, but all of my current CUDA50 work is 24mr10ac bar 1 Guppie. Most of the CUDA 42 work is 24mr10ac with several Guppies mixed in. And the SaH tasks are mostly 24mr10ac with some Guppies. On the tasks I have seen run, SaH Guppie 30-32min CUDA42 Guppie 38-40min CUDA50 Guppie 37 min SoG Guppie 24-29min CUDA42 24mr10ac 16-17min CUDA50 24mr10ac 15 min When the crap hits the fan, it doesn't spread evenly. With multiple work types going out (shortie, mid range, VLAR, Guppies (and their short mid & long run times) the manager can easily select the slowest application. And running more than 1 WU at a time has a big effect on processing times (eg getting a Guppie & a Arecibo WU on the same GPU) can double the processing time for some WUs. It would probably be good on Beta if it were able to just keep cycling through each application (10 WUs to this application, 10 to that, 10 to the next & so on), that way APR for a given host and it's applications might end up being closer to more realistic values. But whether it would take days, weeks or months, depending on the availability of work types being split, to give reasonably representative values I've no idea. Grant Darwin NT. |
Send message Joined: 30 Dec 13 Posts: 258 Credit: 12,340,341 RAC: 4 ![]() |
I don't know if you were asking me or Tut? In the beginning I was running multiple with an app_config and app_info, commandlines. But after Raistmer wanted stock I removed all of those and ran singles per GPU. Interestingly, I never got any SoG once I did that. Got Cuda 42, Cuda 50 and SaH Guppi work units SoG 12-13 minutes SaH was running 12-14 minutes (however errors were reproducable) cuda 42 34-35 minutes cuda 50 32-35 minutes Nonguppi cuda 42 and 50 6-7 minutes Sah 5-6 minutes SoG 5-6 minutes Currently testing his SoG r 3472 |
Send message Joined: 15 Jun 16 Posts: 45 Credit: 1,836,741 RAC: 0 ![]() |
I don't know if you were asking me or Tut? King Tut. But it's interesting to see that the performance for you is similar to mine; it's just not as obvious on your system because of the extra horsepower you've got. Grant Darwin NT. |
![]() Send message Joined: 10 Mar 12 Posts: 1659 Credit: 12,789,696 RAC: 7,691 ![]() ![]() |
As can be seen from my stderrs I'm running 3 at a time on my GTX980, and 1 on the iGPU. Same settings as on main, since that is the optimal settings for my GTX980 Strix. Something I've discovered after thousands upon thousands of WU's being crunched with different number of WU's at a time, and different settings in the commandline file. Same settings in the commandline files as on main too. |
Send message Joined: 15 Jun 16 Posts: 45 Credit: 1,836,741 RAC: 0 ![]() |
As can be seen from my stderrs I'm running 3 at a time on my GTX980 Learn something new every day. Managed to find it the 3rd time through. Grant Darwin NT. |
Send message Joined: 30 Dec 13 Posts: 258 Credit: 12,340,341 RAC: 4 ![]() |
We could just send binary data rather than encoded data to get most of the same benefit without loading the splitters or servers. Maybe I'll try that next. Is this something we will see here soon or is that for a future endeavor? |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 26,981,856 RAC: 717 ![]() ![]() |
It won't be until after this test at the earliest. Since we're not bandwidth constrained right now it's not a huge priority. That said, it would be nice if the could get enough additional people to be bandwidth constrained again. ![]() |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Example of false sanity check triggering task: http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=8646966 Задание щёлкните для деталей Компьютер Отправлен Время подтверждения или крайний срок объяснить Статус Время выполнения (сек) Время ЦП (сек) Очки Приложение 24065974 77866 15 Jun 2016, 14:07:06 UTC 15 Jun 2016, 18:26:01 UTC Ошибка при расчёте 332.84 32.06 --- SETI@home v8 v8.10 (opencl_nvidia_sah) x86_64-pc-linux-gnu 24065975 77035 15 Jun 2016, 13:36:49 UTC 15 Jun 2016, 15:52:29 UTC Ошибка при расчёте 315.12 10.22 --- SETI@home v8 v8.12 (opencl_ati5_SoG_nocal) windows_intelx86 24065976 79247 15 Jun 2016, 13:40:00 UTC 15 Jun 2016, 16:14:52 UTC Ошибка при расчёте 312.62 10.55 --- SETI@home v8 v8.12 (opencl_nvidia_sah) windows_intelx86 24075232 75035 16 Jun 2016, 9:34:24 UTC 16 Jun 2016, 18:30:41 UTC Ошибка при расчёте 320.12 17.45 --- SETI@home v8 v8.10 (opencl_ati5_cat132) x86_64-pc-linux-gnu 24075233 78555 16 Jun 2016, 11:29:47 UTC 16 Jun 2016, 18:42:04 UTC Ошибка при расчёте 318.85 15.65 --- SETI@home v8 v8.12 (opencl_ati5_cat132) windows_intelx86 24075278 75681 16 Jun 2016, 9:17:13 UTC 17 Jun 2016, 2:31:22 UTC Ошибка при расчёте 328.81 23.49 --- SETI@home v8 v8.12 (opencl_ati5_SoG_cat132) windows_intelx86 24084056 75292 17 Jun 2016, 4:07:33 UTC 9 Aug 2016, 9:07:15 UTC В процессе --- --- --- SETI@home v8 v8.12 (opencl_nvidia_SoG) windows_intelx86 24084057 78121 17 Jun 2016, 4:33:35 UTC 17 Jun 2016, 5:13:36 UTC Ошибка при расчёте 313.53 9.80 --- SETI@home v8 v8.12 (opencl_nvidia_SoG) windows_intelx86 24084104 79164 17 Jun 2016, 4:07:25 UTC 9 Aug 2016, 9:07:07 UTC В процессе --- --- --- SETI@home v8 v8.04 windows_intelx86 24093360 --- --- --- Неотправлен --- --- --- --- News about SETI opt app releases: https://twitter.com/Raistmer |
![]() Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0 ![]() |
Example of false sanity check triggering task: Did you save the wu-file for later testing ? And here is another one with a different problem : http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=8646877 CPU produces early -9 overflow and GPU apps run into false sanity check. _\|/_ U r s |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,266,661 RAC: 8 ![]() |
WU files are still on the server. http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=8646966 http://boinc2.ssl.berkeley.edu/beta/download/19c/blc3_2bit_guppi_57451_20612_HIP62472_0007.30887.416.18.21.170.vlar http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=8646877 http://boinc2.ssl.berkeley.edu/beta/download/6e/blc3_2bit_guppi_57451_20612_HIP62472_0007.30887.416.18.21.130.vlar Same section of the same tape, I see. |
![]() Send message Joined: 10 Mar 12 Posts: 1659 Credit: 12,789,696 RAC: 7,691 ![]() ![]() |
Picked up 5 errors on the SaH app I got two of those too today: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24083965 http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24084056 |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
No need. App reacted absolutely correct in own boundaries. No bug or whatever. The question is how often such tasks will occur. Boundaries of what to consider as error state maybe need to be changed. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() Send message Joined: 10 Mar 12 Posts: 1659 Credit: 12,789,696 RAC: 7,691 ![]() ![]() |
Edited to actually read what I meant :-) Hmm, strange behaviour. Today again, the server starting to send out lots of opencl_nvidia_sah. Now it seems to send out 50/50 or so, of opencl_nvidia_sah and opencl_nvidia_SoG. It had previously settled down fine on opencl_nvidia_SoG as being the fastest app, and looking at "Application details for host 75292", it is still very clear that opencl_nvidia_SoG is very much the fastest app. So, the question is, why this unnecessary "re-testing" of opencl_nvidia_sah? From Application details for host 75292: SETI@home v8 8.12 windows_intelx86 (opencl_nvidia_sah) Number of tasks completed 178 Max tasks per day 64 Number of tasks today 59 Consecutive valid tasks 37 Average processing rate 133.75 GFLOPS Average turnaround time 0.43 days SETI@home v8 8.12 windows_intelx86 (opencl_nvidia_SoG) Number of tasks completed 2039 Max tasks per day 86 Number of tasks today 86 Consecutive valid tasks 54 Average processing rate 169.90 GFLOPS Average turnaround time 0.20 days |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Hmm, strange behaviour. Today again, the server starting to send out lots of opencl_intel_gpu_sah. It's app for totally different device. For iGPU, Intel's GPU, not NV. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() Send message Joined: 10 Mar 12 Posts: 1659 Credit: 12,789,696 RAC: 7,691 ![]() ![]() |
Hmm, strange behaviour. Today again, the server starting to send out lots of opencl_intel_gpu_sah. So much for checking what I write before I post :-) I wrote the wrong plan_class name of course. The text should of course had been: Hmm, strange behaviour. Today again, the server starting to send out lots of opencl_nvidia_sah. Now it seems to send out 50/50 or so, of opencl_nvidia_sah and opencl_nvidia_SoG. It had previously settled down fine on opencl_nvidia_SoG as being the fastest app, and looking at "Application details for host 75292", it is still very clear that opencl_nvidia_SoG is very much the fastest app. So, the question is, why this unnecessary "re-testing" of opencl_nvidia_sah? The Application details for host 75292 is still the right ones though. Next time I will try to check what I write, before I post it :-) |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Well, if difference not too big there is the good chance to get work under another plan class until number of completed tasks will be not too big. Regarding current testing it seems mostly new tasks in play currently and we run them under all apps already. So, no immediate failures and initial testing completed with success. Do we need second phase and collect statistics for completed results per se to check if all thresholds for splitting/processing set correctly for these increased sensitivity tasks with possibly different SNR? Eric? News about SETI opt app releases: https://twitter.com/Raistmer |
![]() Send message Joined: 10 Mar 12 Posts: 1659 Credit: 12,789,696 RAC: 7,691 ![]() ![]() |
Well, if difference not too big there is the good chance to get work under another plan class until number of completed tasks will be not too big. Hmm, I would have thought that 2039 completed (opencl_nvidia_SoG) tasks versus 178 completed opencl_nvidia_sah, would have really finished the race of which app was the fastest. But obviously, that's not the case, even though opencl_nvidia_SoG has an APR of 169.90 GFLOPS, and opencl_nvidia_sah only reaches an APR of 133.75 GFLOPS. Yeah well, time will tell.... |
![]() Send message Joined: 10 Mar 12 Posts: 1659 Credit: 12,789,696 RAC: 7,691 ![]() ![]() |
Ah, now I get it. I have had three of these autocorr sanity check errors today, all of them with the SoG app, and those errors are driving down my "Max tasks per day", so I can't get any more SoG tasks. That's why the Non Sog OpenCL app come back into play all of a sudden. SETI@home v8 8.12 windows_intelx86 (opencl_nvidia_SoG) Number of tasks completed 2053 Max tasks per day 36 Number of tasks today 100 Consecutive valid tasks 3 Average processing rate 173.37 GFLOPS Average turnaround time 0.20 days It's no good, when a "faked", or at least an error that hasn't anything to do with the local computers function, drives down the Max tasks per day like that, and force it to start running a slower app instead. |
Send message Joined: 15 Jun 16 Posts: 45 Credit: 1,836,741 RAC: 0 ![]() |
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=8646466 4294967295 (0xffffffff) Unknown exit code for OpenCL applications, processed as a noisy WU by SETI@home v8 v8.04 x86_64-pc-linux-gnu Grant Darwin NT. |
![]() Send message Joined: 10 Mar 12 Posts: 1659 Credit: 12,789,696 RAC: 7,691 ![]() ![]() |
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=8646466 Yup, too many of those recently. Not good at all. |
©2019 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.