Tests of new scheduler features.

Message boards : News : Tests of new scheduler features.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 17 · Next

AuthorMessage
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 26,986,665
RAC: 982
United States
Message 45679 - Posted: 1 May 2013, 18:18:48 UTC

I've added some new scheduler features to hopefully ensure that you (eventually) get the fastest application version for your processor/GPU. But in order to make sure I did it right, I'm going to need to reset the app_version and per host app version statistics.

It'll take a week or so for things to get back to normal. You'll initially see some non-optimal application choices (like CUDA3.2 on your CUDA5 card or CAL on your ATI card /w OpenCL.) This is deliberate. Please let them run. It's the only way I can see that the slow versions stop being sent. Once I'm sure things work, I'll set them back the way they were.
ID: 45679 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,661
RAC: 8
United Kingdom
Message 45680 - Posted: 1 May 2013, 18:35:43 UTC - in response to Message 45679.  

What would be the most helpful thing for anonymous platform users to do? Like our friend CElliott?
ID: 45680 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 3
United Kingdom
Message 45682 - Posted: 1 May 2013, 18:42:49 UTC - in response to Message 45680.  

What would be the most helpful thing for anonymous platform users to do? Like our friend CElliott?

And myself, since i'm running the AMD/ATI r1817 MB app that works correctly with Cat 12.10/APP runtime 1016.4 and Cat 13.2/13.3/13.4/ APP runtime 1124.2, it would now only need to be not sent to Cat 13.1/APP runtime 1084.4 hosts,
Note the AMD/ATI Astropulse apps have now been fixed for Cat 13.2/13.3/13.4/ APP runtime 1124.2 too.

http://lunatics.kwsn.net/12-gpu-crunching/update-for-ati-mb-gpu-app.0.html

Claggy
ID: 45682 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 26,986,665
RAC: 982
United States
Message 45683 - Posted: 1 May 2013, 20:16:25 UTC - in response to Message 45680.  

What would be the most helpful thing for anonymous platform users to do? Like our friend CElliott?


Just keep running as normal.
ID: 45683 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 26,986,665
RAC: 982
United States
Message 45684 - Posted: 1 May 2013, 20:17:17 UTC - in response to Message 45682.  

I hope to be distributing a new ATI app next week, FYI.
ID: 45684 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 45685 - Posted: 1 May 2013, 22:15:04 UTC - in response to Message 45684.  

I hope to be distributing a new ATI app next week, FYI.

Hopefully it will be not only one this time, there are four new ATI apps at least.
_\|/_
U r s
ID: 45685 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 26,986,665
RAC: 982
United States
Message 45686 - Posted: 2 May 2013, 3:54:23 UTC - in response to Message 45685.  

OK. Be sure to hold me to that.
ID: 45686 · Report as offensive
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1661
Credit: 12,808,212
RAC: 7,044
Sweden
Message 45689 - Posted: 2 May 2013, 16:13:12 UTC - in response to Message 45679.  

I've added some new scheduler features to hopefully ensure that you (eventually) get the fastest application version for your processor/GPU. But in order to make sure I did it right, I'm going to need to reset the app_version and per host app version statistics.

It'll take a week or so for things to get back to normal. You'll initially see some non-optimal application choices (like CUDA3.2 on your CUDA5 card or CAL on your ATI card /w OpenCL.) This is deliberate. Please let them run. It's the only way I can see that the slow versions stop being sent. Once I'm sure things work, I'll set them back the way they were.


Oh well, my NVIDIA 315M got a CUDA50 WU, and even though it was considerably slower than CUDA23 and 32, it did crunch it without any protest :-)

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=13671187
WARNING!! "THIS IS A SIGNATURE", of the "IT MAY CHANGE AT ANY MOMENT" type. It may, or may not be considered insulting, all depending upon HOW SENSITIVE THE VIEWER IS, to certain inputs to/from the nervous system.
ID: 45689 · Report as offensive
William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 606
Credit: 588,843
RAC: 0
Message 45690 - Posted: 2 May 2013, 16:24:18 UTC
Last modified: 2 May 2013, 16:33:28 UTC

I hope this is transient:
host 62763

cuda 2.2

<flops>9739745.861848</flops>

estimated CPU time remaining: 7261720.044040 [that's ~2000 hours - boinc's wording. It's actually the initial estimate]

saturated 717141815.35 busy 717141815.35 [that's the next 22 years]

this task was the one to send the several magnitudes too small flops.

NB the task will take aprox. 10k sec and hasn't started yet
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 45690 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 26,986,665
RAC: 982
United States
Message 45691 - Posted: 2 May 2013, 16:32:46 UTC - in response to Message 45690.  

Hopefully the bad FLOPs estimates will fix themselves. If not, well, that's why we're doing this test.
ID: 45691 · Report as offensive
William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 606
Credit: 588,843
RAC: 0
Message 45692 - Posted: 2 May 2013, 16:52:07 UTC

I forgot to say that the card proclaims to have 91 GFLOPS peak :)
I've no idea where those flops are coming from?

Since new flops are received on work allocation only, I'll probably have to abort tasks to get new work with fresh flops allocated.
We probably want to see numbers before 11 tasks validate and APR kicks in (and even that will take a few days, with that host's expected turnover).
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 45692 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,661
RAC: 8
United Kingdom
Message 45693 - Posted: 2 May 2013, 17:31:42 UTC - in response to Message 45692.  
Last modified: 2 May 2013, 17:50:09 UTC

OK, tried the equivalent test with William's other host (62652). That has been doing CPU work only up till now, to generate reference results.

GPU has a peak flops rating of 336 Gfl. Cuda50 was the allocated application (probably not optimal for a 9800GT, but we'll wait on that). Server calculated <flops> at 23,578,084,425 - that sounds very much like 10x the <p_fpops> benchmark, which is 2,350,462,673 [and nothing like the long-established APR of 157 the same machine has on the main project].

Edit - first task returned: run time 48 minutes, against an initial estimate of 70 minutes. That's actually very good.
ID: 45693 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 26,986,665
RAC: 982
United States
Message 45694 - Posted: 2 May 2013, 17:39:25 UTC - in response to Message 45693.  

It turns out that the host and app version stats will be messed up until all of the GPU results sent out before the server change are returned. That could take a while. I'll keep resetting the stats in the meantime to speed up the process.
ID: 45694 · Report as offensive
William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 606
Credit: 588,843
RAC: 0
Message 45697 - Posted: 3 May 2013, 9:11:00 UTC - in response to Message 45693.  

OK, tried the equivalent test with William's other host (62652). That has been doing CPU work only up till now, to generate reference results.

GPU has a peak flops rating of 336 Gfl. Cuda50 was the allocated application (probably not optimal for a 9800GT, but we'll wait on that). Server calculated <flops> at 23,578,084,425 - that sounds very much like 10x the <p_fpops> benchmark, which is 2,350,462,673 [and nothing like the long-established APR of 157 the same machine has on the main project].

Edit - first task returned: run time 48 minutes, against an initial estimate of 70 minutes. That's actually very good.

That value sounds familiar...

Tasks first came in with an estimate in the 3k region, which translates into the same 23.5e9 give or take.
So I'm wondering where that initial flops estimate derives from and where the far too small one originated.
Oh, runtime estimates are linked into CreditNew, aren't they? Nuff' said...
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 45697 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,661
RAC: 8
United Kingdom
Message 45698 - Posted: 3 May 2013, 10:57:22 UTC

This is going to get a bit weird. Host 62652 has now been allocated test work from all five cuda apps. Only the first one (cuda50) has any completed tasks and hence an APR, but not yet the 11 completions that would trigger usage of that APR in runtime estimation. But the flops values are all over the place:

    <flops>23520469976.920357</flops>
    <plan_class>cuda22</plan_class>

    <flops>141034284.159673</flops>
    <plan_class>cuda23</plan_class>

    <flops>77735925.456133</flops>
    <plan_class>cuda32</plan_class>

    <flops>100125333.859577</flops>
    <plan_class>cuda42</plan_class>

    <flops>23578084425.701229</flops>
    <plan_class>cuda50</plan_class>

cuda22 is estimating 50 minutes, cuda32 is estimating 252 hours - same WU batch.
ID: 45698 · Report as offensive
S@NL - John van Gorsel
Volunteer tester

Send message
Joined: 26 Jun 10
Posts: 22
Credit: 588,646
RAC: 245
Netherlands
Message 45699 - Posted: 3 May 2013, 13:41:11 UTC

Did something change in the credit system as well? All tasks that validated after the change in the scheduler did receive very low credits (0.5 or less for a task that normally yields around 50 credits)
I checked a few wingmen, including Eric, and everyone seems to receive low credits
ID: 45699 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,661
RAC: 8
United Kingdom
Message 45700 - Posted: 3 May 2013, 14:15:38 UTC - in response to Message 45698.  

First tasks are in for all versions now. I think all WUs are comparable.

cuda22 2,780 seconds
cuda23 1,254 seconds
cuda32 1,122 seconds
cuda42 1,429 seconds
cuda50 2,897 seconds

(I've taken the first task in each case, to allow for fair comparison with wisdom generation)

A useful reminder that with older hardware, biggest is not necessarily best.
ID: 45700 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 26,986,665
RAC: 982
United States
Message 45702 - Posted: 3 May 2013, 22:41:42 UTC - in response to Message 45699.  

The credit system didn't change, but it appears the huge run time estimates for the results are causing tiny credit grants when CUDA result validates against CUDA result. I'll fix those later. Hopefully credits will converge towards normal values as time goes on.

ID: 45702 · Report as offensive
Profile Sabroe_SMC
Volunteer tester

Send message
Joined: 12 Dec 08
Posts: 7
Credit: 3,116,266
RAC: 0
Germany
Message 45707 - Posted: 4 May 2013, 11:54:33 UTC - in response to Message 45702.  
Last modified: 4 May 2013, 11:57:56 UTC

Meanwhile it is every second Workingunit with low credits.
For example: http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=5209949
Given Credit of 0.23, validating Ati-OpenCl against Ati-OpenCL.
Gimme the whole credit, please! ;-)
ID: 45707 · Report as offensive
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1661
Credit: 12,808,212
RAC: 7,044
Sweden
Message 45708 - Posted: 4 May 2013, 13:13:22 UTC - in response to Message 45707.  
Last modified: 4 May 2013, 13:15:11 UTC

Meanwhile it is every second Workingunit with low credits.
For example: http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=5209949
Given Credit of 0.23, validating Ati-OpenCl against Ati-OpenCL.
Gimme the whole credit, please! ;-)

We don't need no steenking credit.
WARNING!! "THIS IS A SIGNATURE", of the "IT MAY CHANGE AT ANY MOMENT" type. It may, or may not be considered insulting, all depending upon HOW SENSITIVE THE VIEWER IS, to certain inputs to/from the nervous system.
ID: 45708 · Report as offensive
1 · 2 · 3 · 4 . . . 17 · Next

Message boards : News : Tests of new scheduler features.


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.