Deprecated: Function get_magic_quotes_gpc() is deprecated in /disks/centurion/b/carolyn/b/home/boincadm/projects/beta/html/inc/util.inc on line 663
Marked difference in fpop/AR slope between CPU and GPU

Marked difference in fpop/AR slope between CPU and GPU

Message boards : SETI@home Enhanced : Marked difference in fpop/AR slope between CPU and GPU
Message board moderation

To post messages, you must log in.

AuthorMessage
William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 606
Credit: 588,843
RAC: 0
Message 45549 - Posted: 11 Apr 2013, 17:48:14 UTC

Ok this is to kick off a discussion on the rsc_fpop_est(AR) formula.

The formula was last dialled in 'before my time' (i.e. before mid 2010) under enormous effort. There are still graphs around (dating from 2009 iirc) that will show the varying slope of fpops/AR. At that time there were no GPUs. Now we have a lot of them. And we find that their graph differs from the CPU one.
Up until now it hasn't mattered much, the difference was small enough.

With V7 and autocorrelation this has changed. And, as the GPU apps continue to mature it will change again, so we can only talk about a snapshot. [IIRC Jason considers the autocorrelation on CUDA as essetially un-optimised]
But if changes need to be made, we need to talk about it now, with what we have now.

Please consider host 62763.

Mid AR take about 22ksec on CPU and 17ksec on (small) NV GPU.
VHAR take 7ksec on CPU and 11ksec on GPU.

So on CPU VHAR are about 3x faster. But on GPU they are only 1.5x faster.

That played hell with the estimates (remember we don't have dcf any more). Now the host has gone past it's 11 validations for APR driven estimates. And it has had the (mis-)fortune to receive largely VHAR at first (for GPU). Consequently the APR is about half what it needs to be for mid AR. And midAR tasks end up being overestimated.

Let's face it, the formula to calculate APR from recent results is c***. It makes too many (false, oversimplifying) assumptions. (and it's worse for Anon, where you don't even get a fresh record with each new app).

Unless I (or somebody else) can instigate a cooperation between Jason and David to use Jason's aDCF algorithm (or a deritive thereof) for APR we are stuck with somthing suboptimal and have to make the best of it.

The other field to play with is of course rsc_fpops_est.
The problem is however, if the difference in performance is that marked, how can you find a middle ground?

NB those are results from just one very small GPU. Can others please provide their CPU/GPU values for a few model cases (i.e. pre-Fermi, Fermi, Kepler)? Also, what is the situation on ATI cards?


A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 45549 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 45551 - Posted: 11 Apr 2013, 18:23:24 UTC - in response to Message 45549.  

The previous research was done over the winter of 2007-08: Estimates and Deadlines revisited
ID: 45551 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 45554 - Posted: 12 Apr 2013, 6:10:40 UTC - in response to Message 45549.  

...
The other field to play with is of course rsc_fpops_est.
The problem is however, if the difference in performance is that marked, how can you find a middle ground?
...

As discussed in one of the News threads, Eric did code a change to the rsc_fpops_est curve. It increased the estimate by 48% at VHAR and 14% at midrange, a constant addition since all WUs get the same amount of Autocorr searching. That's more of an adjustment than CPUs needed, less than GPUs needed.

However, Murphy's law struck again. That change is in the splitter_fft path but not in the splitter_pfb path. So a WU which has <filter>fft</filter> in the header has SETI@home v7 rsc_fpops_est values, but one with <filter>polyphase</filter> has SETI@home Enhanced rsc_fpops_est values. Since almost all WUs here are using the polyphase version, we're getting the wrong estimates, and since many WUs at main are still being produced with the fft version, those are wrong too.

When I wrote Eric about this at the end of February, I suggested that the revised estimates should be made contingent on a non-zero <autocorr_fftlen> and he agreed that was a good plan. More pressing issues are presumably the reason it hasn't been implemented. Whether the compromise adjustment will be good enough has not really been tested.
                                                                  Joe
ID: 45554 · Report as offensive
William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 606
Credit: 588,843
RAC: 0
Message 45558 - Posted: 12 Apr 2013, 10:18:48 UTC

Thanks, Joe!

Ok, so we need to look at this more closely again when the adjusted rsc_fpops_est values are active.

I hope I won't have forgotten by then - memory seems to be a problem lately :(
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 45558 · Report as offensive
William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 606
Credit: 588,843
RAC: 0
Message 45641 - Posted: 26 Apr 2013, 12:03:34 UTC

Ok, it appears the change is live - VHAR are now showing 70.7e12 rsc_fpops_est instead of 47.5e12 and a three week deadline.

Estimates on 62763 consequently went from 10,035 sec to 14,918 sec for GPU and from 7,175 sec to 10,666 sec for CPU.

Now, APR needs to adjust again (*), and then we need some mid AR to see what the estimates look like there.

(*) growl at David for stupid APR calculation algorithm and note to self to get finger out and try to achieve cooperation on getting superior aDCF algorithm adapted and implemented instead.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 45641 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 45651 - Posted: 27 Apr 2013, 1:19:45 UTC - in response to Message 45641.  

For some time we'll be dealing with a mix. Although initial replication tasks (_0 and _1) will have v7 estimates, pendings and reissues (_2 and above) may have the Enhanced estimates, so there will be a delay in APR readustment.
                                                                   Joe
ID: 45651 · Report as offensive

Message boards : SETI@home Enhanced : Marked difference in fpop/AR slope between CPU and GPU


 
©2023 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.