Distributing 4-bit workunits

Message boards : News : Distributing 4-bit workunits
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 15 Jun 16
Posts: 45
Credit: 1,836,741
RAC: 0
Australia
Message 58629 - Posted: 16 Jun 2016, 2:49:34 UTC - in response to Message 58627.  

For me SaH caused some Error while computing. Never saw those with the SoG.

I need to test Raistmer newest r3472 so I'll be back tomorrow and see if I can get some more SoG here.

Like you, the server seems to think SaH is better but I know looking at the times, SoG was better.

I think all those errors while downloading threw a confused the server into thinking SaH was better since only 5 with it vs 50 (download errors)

Well, I don't know really. It seems to go up and down here, which one is faster. Same AR can at times be faster with opencl_nvidia_SoG, other times, same AR is faster with opencl_nvidia_sah. That goes for both Arecibo and Guppi tasks.

I think I'm getting a tiny bit confused here.

Are you running multiple WUs at the same time?

Here in Beta I've left everything alone, so it's all stock with default settings.
At the moment average turnaround time isn't a good indicator as my account settings were copied over from Main, and I've since dropped my cache down to a bit under 4 hours.

But by application;
       Application                                Average processing rate
SETI@home v8 8.01 windows_intelx86 (cuda42)              161.70
SETI@home v8 8.01 windows_intelx86 (cuda50)              143.28
SETI@home v8 8.12 windows_intelx86 (opencl_nvidia_sah)   124.11
SETI@home v8 8.12 windows_intelx86 (opencl_nvidia_SoG)   102.56


My initial batch of work was all SoG, and all Guppie, bar 1 WU which was CUDA50 and 24mr10ac.
I don't know what happened overnight, but all of my current CUDA50 work is 24mr10ac bar 1 Guppie. Most of the CUDA 42 work is 24mr10ac with several Guppies mixed in. And the SaH tasks are mostly 24mr10ac with some Guppies.

On the tasks I have seen run,
SaH Guppie       30-32min
CUDA42 Guppie    38-40min
CUDA50 Guppie    37   min
SoG Guppie       24-29min
CUDA42 24mr10ac  16-17min
CUDA50 24mr10ac  15   min


When the crap hits the fan, it doesn't spread evenly. With multiple work types going out (shortie, mid range, VLAR, Guppies (and their short mid & long run times) the manager can easily select the slowest application.
And running more than 1 WU at a time has a big effect on processing times (eg getting a Guppie & a Arecibo WU on the same GPU) can double the processing time for some WUs.


It would probably be good on Beta if it were able to just keep cycling through each application (10 WUs to this application, 10 to that, 10 to the next & so on), that way APR for a given host and it's applications might end up being closer to more realistic values.
But whether it would take days, weeks or months, depending on the availability of work types being split, to give reasonably representative values I've no idea.
Grant
Darwin NT.
ID: 58629 · Report as offensive
Zalster
Volunteer tester

Send message
Joined: 30 Dec 13
Posts: 258
Credit: 12,340,341
RAC: 59
United States
Message 58630 - Posted: 16 Jun 2016, 3:16:19 UTC - in response to Message 58629.  
Last modified: 16 Jun 2016, 3:16:54 UTC

I don't know if you were asking me or Tut?

In the beginning I was running multiple with an app_config and app_info, commandlines.

But after Raistmer wanted stock I removed all of those and ran singles per GPU.

Interestingly, I never got any SoG once I did that.

Got Cuda 42, Cuda 50 and SaH

Guppi work units
SoG 12-13 minutes
SaH was running 12-14 minutes (however errors were reproducable)
cuda 42 34-35 minutes
cuda 50 32-35 minutes

Nonguppi
cuda 42 and 50 6-7 minutes
Sah 5-6 minutes
SoG 5-6 minutes

Currently testing his SoG r 3472
ID: 58630 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 15 Jun 16
Posts: 45
Credit: 1,836,741
RAC: 0
Australia
Message 58631 - Posted: 16 Jun 2016, 3:27:45 UTC - in response to Message 58630.  
Last modified: 16 Jun 2016, 3:28:01 UTC

I don't know if you were asking me or Tut?

King Tut.

But it's interesting to see that the performance for you is similar to mine; it's just not as obvious on your system because of the extra horsepower you've got.
Grant
Darwin NT.
ID: 58631 · Report as offensive
.
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1659
Credit: 12,549,090
RAC: 6,819
Sweden
Message 58632 - Posted: 16 Jun 2016, 4:22:28 UTC

As can be seen from my stderrs I'm running 3 at a time on my GTX980, and 1 on the iGPU.

Same settings as on main, since that is the optimal settings for my GTX980 Strix. Something I've discovered after thousands upon thousands of WU's being crunched with different number of WU's at a time, and different settings in the commandline file.

Same settings in the commandline files as on main too.
ID: 58632 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 15 Jun 16
Posts: 45
Credit: 1,836,741
RAC: 0
Australia
Message 58633 - Posted: 16 Jun 2016, 4:48:46 UTC - in response to Message 58632.  

As can be seen from my stderrs I'm running 3 at a time on my GTX980

Learn something new every day.
Managed to find it the 3rd time through.
Grant
Darwin NT.
ID: 58633 · Report as offensive
Zalster
Volunteer tester

Send message
Joined: 30 Dec 13
Posts: 258
Credit: 12,340,341
RAC: 59
United States
Message 58634 - Posted: 16 Jun 2016, 5:22:53 UTC - in response to Message 58620.  

We could just send binary data rather than encoded data to get most of the same benefit without loading the splitters or servers. Maybe I'll try that next.


Is this something we will see here soon or is that for a future endeavor?
ID: 58634 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 26,960,376
RAC: 1,429
United States
Message 58638 - Posted: 16 Jun 2016, 15:39:55 UTC - in response to Message 58634.  

It won't be until after this test at the earliest. Since we're not bandwidth constrained right now it's not a huge priority.

That said, it would be nice if the could get enough additional people to be bandwidth constrained again.
ID: 58638 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 58645 - Posted: 17 Jun 2016, 9:02:06 UTC

Example of false sanity check triggering task:
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=8646966
Задание
щёлкните для деталей	Компьютер	Отправлен	Время подтверждения
или крайний срок 
объяснить	Статус	Время выполнения
(сек)	Время ЦП
(сек)	Очки	Приложение
24065974	77866	15 Jun 2016, 14:07:06 UTC	15 Jun 2016, 18:26:01 UTC	Ошибка при расчёте	332.84	32.06	---	SETI@home v8 v8.10 (opencl_nvidia_sah)
x86_64-pc-linux-gnu
24065975	77035	15 Jun 2016, 13:36:49 UTC	15 Jun 2016, 15:52:29 UTC	Ошибка при расчёте	315.12	10.22	---	SETI@home v8 v8.12 (opencl_ati5_SoG_nocal)
windows_intelx86
24065976	79247	15 Jun 2016, 13:40:00 UTC	15 Jun 2016, 16:14:52 UTC	Ошибка при расчёте	312.62	10.55	---	SETI@home v8 v8.12 (opencl_nvidia_sah)
windows_intelx86
24075232	75035	16 Jun 2016, 9:34:24 UTC	16 Jun 2016, 18:30:41 UTC	Ошибка при расчёте	320.12	17.45	---	SETI@home v8 v8.10 (opencl_ati5_cat132)
x86_64-pc-linux-gnu
24075233	78555	16 Jun 2016, 11:29:47 UTC	16 Jun 2016, 18:42:04 UTC	Ошибка при расчёте	318.85	15.65	---	SETI@home v8 v8.12 (opencl_ati5_cat132)
windows_intelx86
24075278	75681	16 Jun 2016, 9:17:13 UTC	17 Jun 2016, 2:31:22 UTC	Ошибка при расчёте	328.81	23.49	---	SETI@home v8 v8.12 (opencl_ati5_SoG_cat132)
windows_intelx86
24084056	75292	17 Jun 2016, 4:07:33 UTC	9 Aug 2016, 9:07:15 UTC	В процессе	---	---	---	SETI@home v8 v8.12 (opencl_nvidia_SoG)
windows_intelx86
24084057	78121	17 Jun 2016, 4:33:35 UTC	17 Jun 2016, 5:13:36 UTC	Ошибка при расчёте	313.53	9.80	---	SETI@home v8 v8.12 (opencl_nvidia_SoG)
windows_intelx86
24084104	79164	17 Jun 2016, 4:07:25 UTC	9 Aug 2016, 9:07:07 UTC	В процессе	---	---	---	SETI@home v8 v8.04 
windows_intelx86
24093360	---	---	---	Неотправлен	---	---	---	---

News about SETI opt app releases: https://twitter.com/Raistmer
ID: 58645 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 58647 - Posted: 17 Jun 2016, 14:30:10 UTC - in response to Message 58645.  
Last modified: 17 Jun 2016, 14:40:02 UTC

Example of false sanity check triggering task:
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=8646966

Did you save the wu-file for later testing ?

And here is another one with a different problem :

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=8646877

CPU produces early -9 overflow and GPU apps run into false sanity check.
_\|/_
U r s
ID: 58647 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 0
United Kingdom
Message 58651 - Posted: 17 Jun 2016, 15:37:23 UTC - in response to Message 58647.  

ID: 58651 · Report as offensive
.
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1659
Credit: 12,549,090
RAC: 6,819
Sweden
Message 58652 - Posted: 17 Jun 2016, 15:48:52 UTC - in response to Message 58623.  

Picked up 5 errors on the SaH app

ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance
GPU device sync requested... ...GPU device synche


https://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24065993
https://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24065968
https://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24066051
https://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24065976
https://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24065955

GPUs run fine for cuda SoG and Cuda 42. Not sure why these were erroring out


Too big autocorr value happened too often. That triggered sanity check.
I discussed this with Eric - decision was leave it as is for now.

I got two of those too today:

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24083965
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=24084056
ID: 58652 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 58653 - Posted: 17 Jun 2016, 17:49:48 UTC - in response to Message 58647.  


Did you save the wu-file for later testing ?

No need. App reacted absolutely correct in own boundaries. No bug or whatever. The question is how often such tasks will occur. Boundaries of what to consider as error state maybe need to be changed.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 58653 · Report as offensive
.
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1659
Credit: 12,549,090
RAC: 6,819
Sweden
Message 58654 - Posted: 17 Jun 2016, 17:59:19 UTC
Last modified: 17 Jun 2016, 18:43:53 UTC

Edited to actually read what I meant :-)

Hmm, strange behaviour. Today again, the server starting to send out lots of opencl_nvidia_sah. Now it seems to send out 50/50 or so, of opencl_nvidia_sah and opencl_nvidia_SoG.

It had previously settled down fine on opencl_nvidia_SoG as being the fastest app, and looking at "Application details for host 75292", it is still very clear that opencl_nvidia_SoG is very much the fastest app. So, the question is, why this unnecessary "re-testing" of opencl_nvidia_sah?

From Application details for host 75292:

SETI@home v8 8.12 windows_intelx86 (opencl_nvidia_sah)

Number of tasks completed 178
Max tasks per day 64
Number of tasks today 59
Consecutive valid tasks 37
Average processing rate 133.75 GFLOPS
Average turnaround time 0.43 days

SETI@home v8 8.12 windows_intelx86 (opencl_nvidia_SoG)

Number of tasks completed 2039
Max tasks per day 86
Number of tasks today 86
Consecutive valid tasks 54
Average processing rate 169.90 GFLOPS
Average turnaround time 0.20 days
ID: 58654 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 58655 - Posted: 17 Jun 2016, 18:02:04 UTC - in response to Message 58654.  

Hmm, strange behaviour. Today again, the server starting to send out lots of opencl_intel_gpu_sah.

It's app for totally different device. For iGPU, Intel's GPU, not NV.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 58655 · Report as offensive
.
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1659
Credit: 12,549,090
RAC: 6,819
Sweden
Message 58659 - Posted: 17 Jun 2016, 18:34:51 UTC - in response to Message 58655.  
Last modified: 17 Jun 2016, 18:36:15 UTC

Hmm, strange behaviour. Today again, the server starting to send out lots of opencl_intel_gpu_sah.

It's app for totally different device. For iGPU, Intel's GPU, not NV.


So much for checking what I write before I post :-)

I wrote the wrong plan_class name of course. The text should of course had been:

Hmm, strange behaviour. Today again, the server starting to send out lots of opencl_nvidia_sah. Now it seems to send out 50/50 or so, of opencl_nvidia_sah and opencl_nvidia_SoG.

It had previously settled down fine on opencl_nvidia_SoG as being the fastest app, and looking at "Application details for host 75292", it is still very clear that opencl_nvidia_SoG is very much the fastest app. So, the question is, why this unnecessary "re-testing" of opencl_nvidia_sah?

The Application details for host 75292 is still the right ones though.

Next time I will try to check what I write, before I post it :-)
ID: 58659 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 58660 - Posted: 17 Jun 2016, 18:43:48 UTC - in response to Message 58659.  

Well, if difference not too big there is the good chance to get work under another plan class until number of completed tasks will be not too big.

Regarding current testing it seems mostly new tasks in play currently and we run them under all apps already. So, no immediate failures and initial testing completed with success.

Do we need second phase and collect statistics for completed results per se to check if all thresholds for splitting/processing set correctly for these increased sensitivity tasks with possibly different SNR?
Eric?
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 58660 · Report as offensive
.
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1659
Credit: 12,549,090
RAC: 6,819
Sweden
Message 58661 - Posted: 17 Jun 2016, 19:03:08 UTC - in response to Message 58660.  

Well, if difference not too big there is the good chance to get work under another plan class until number of completed tasks will be not too big.

Hmm, I would have thought that 2039 completed (opencl_nvidia_SoG) tasks versus 178 completed opencl_nvidia_sah, would have really finished the race of which app was the fastest.

But obviously, that's not the case, even though opencl_nvidia_SoG has an APR of 169.90 GFLOPS, and opencl_nvidia_sah only reaches an APR of 133.75 GFLOPS.

Yeah well, time will tell....
ID: 58661 · Report as offensive
.
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1659
Credit: 12,549,090
RAC: 6,819
Sweden
Message 58662 - Posted: 17 Jun 2016, 20:22:20 UTC
Last modified: 17 Jun 2016, 20:27:05 UTC

Ah, now I get it. I have had three of these autocorr sanity check errors today, all of them with the SoG app, and those errors are driving down my "Max tasks per day", so I can't get any more SoG tasks. That's why the Non Sog OpenCL app come back into play all of a sudden.


SETI@home v8 8.12 windows_intelx86 (opencl_nvidia_SoG)

Number of tasks completed 2053
Max tasks per day 36
Number of tasks today 100
Consecutive valid tasks 3
Average processing rate 173.37 GFLOPS
Average turnaround time 0.20 days

It's no good, when a "faked", or at least an error that hasn't anything to do with the local computers function, drives down the Max tasks per day like that, and force it to start running a slower app instead.
ID: 58662 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 15 Jun 16
Posts: 45
Credit: 1,836,741
RAC: 0
Australia
Message 58663 - Posted: 17 Jun 2016, 22:25:57 UTC - in response to Message 58662.  

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=8646466

4294967295 (0xffffffff) Unknown exit code for OpenCL applications,
processed as a noisy WU by SETI@home v8 v8.04 x86_64-pc-linux-gnu
Grant
Darwin NT.
ID: 58663 · Report as offensive
.
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1659
Credit: 12,549,090
RAC: 6,819
Sweden
Message 58664 - Posted: 17 Jun 2016, 22:32:35 UTC - in response to Message 58663.  

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=8646466

4294967295 (0xffffffff) Unknown exit code for OpenCL applications,
processed as a noisy WU by SETI@home v8 v8.04 x86_64-pc-linux-gnu

Yup, too many of those recently. Not good at all.
ID: 58664 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : News : Distributing 4-bit workunits


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.