Workunits with inconsistent results with CUDA v6.98 apps involved

Message boards : SETI@home Enhanced : Workunits with inconsistent results with CUDA v6.98 apps involved
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 44165 - Posted: 18 Oct 2012, 21:13:33 UTC
Last modified: 18 Oct 2012, 21:22:38 UTC

To match the ATi thread already started.

I'll start it off with workunit 4133491

I bench-tested that task overnight, and found a mis-matched best gaussian at lines 850/851 - the same weak similarity with every cuda application I tried:

Lunatics_x41g_win32_cuda32.exe
Lunatics_x41zb_win32_cuda32.exe
Lunatics_x41zb_win32_cuda42.exe
setiathome_6.98_windows_intelx86__cuda42.exe

Reference app was setiathome_6.98_windows_intelx86.exe, so an equivalent test to the inconclusive live run.

It would be interesting if somebody could run that task on an ATI card.

Edit - here's that best gaussian:

CUDA (this from Lunatics_x41zb_win32_cuda42.exe)
<best_gaussian>
  <peak_power>3.1339707374573</peak_power>
  <mean_power>0.51620918512344</mean_power>
  <time>2455292.6055129</time>
  <ra>11.49758206374</ra>
  <decl>19.650241570748</decl>
  <q_pix>0</q_pix>
  <freq>1419740200.0427</freq>
  <detection_freq>1419743213.6436</detection_freq>
  <barycentric_freq>0</barycentric_freq>
  <fft_len>16384</fft_len>
  <chirp_rate>33.574693025784</chirp_rate>
  <rfi_checked>0</rfi_checked>
  <rfi_found>0</rfi_found>
  <reserved>0</reserved>
  <sigma>3.7909734249115</sigma>
  <chisqr>1.1859285831451</chisqr>
  <null_chisqr>2.0525963306427</null_chisqr>
  <score>0</score>
  <max_power>6.9975395202637</max_power>
  <pot length=194 encoding="x-csv">
    21,11,10,12,8,27,5,14,68,37,22,19,12,5,12,3,4,2,35,9,34,4,23,43,6,18,57,
    10,11,24,25,1,6,33,18,33,10,30,4,7,5,56,10,27,1,64,4,72,14,17,139,196,
    9,255,65,197,174,89,4,54,7,88,2,48
  </pot>
</best_gaussian>

CPU (stock)
<best_gaussian>
  <peak_power>2.7500243186951</peak_power>
  <mean_power>0.51439195871353</mean_power>
  <time>2455292.6055517</time>
  <ra>11.498516637755</ra>
  <decl>19.65026491074</decl>
  <q_pix>0</q_pix>
  <freq>1419737750.8879</freq>
  <detection_freq>1419742978.3341</detection_freq>
  <barycentric_freq>0</barycentric_freq>
  <fft_len>16384</fft_len>
  <chirp_rate>-48.738113544999</chirp_rate>
  <rfi_checked>0</rfi_checked>
  <rfi_found>0</rfi_found>
  <reserved>0</reserved>
  <sigma>3.7909734249115</sigma>
  <chisqr>1.2279376983643</chisqr>
  <null_chisqr>2.0739624500275</null_chisqr>
  <score>0</score>
  <max_power>6.3048386573792</max_power>
  <pot length=200 encoding="x-csv">
    17,2,57,30,1,0,28,6,3,70,49,6,41,3,22,50,69,49,1,26,4,7,10,19,8,50,3,18,
    52,15,6,11,15,32,10,2,16,2,15,43,23,35,17,9,13,1,14,27,9,146,126,105,192,
    114,214,16,255,110,101,19,53,101,2,21
  </pot>
</best_gaussian>
ID: 44165 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 44166 - Posted: 18 Oct 2012, 21:25:27 UTC - in response to Message 44165.  
Last modified: 18 Oct 2012, 21:56:14 UTC

Can you post me the Stock 6.98 reference please, and I'll run the Wu on my HD7770

Edit: Bench is now running with r1643

Claggy
ID: 44166 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44167 - Posted: 18 Oct 2012, 21:54:38 UTC - in response to Message 44165.  

And can you post download link to task WU ? Generated with your magic tool ;)
ID: 44167 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 44168 - Posted: 18 Oct 2012, 21:58:03 UTC - in response to Message 44167.  

And can you post download link to task WU ? Generated with your magic tool ;)


http://boinc2.ssl.berkeley.edu/beta/download/f3/05ap10al.4475.10297.140733193388035.14.229

Claggy
ID: 44168 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 44169 - Posted: 18 Oct 2012, 23:21:28 UTC

OK, where do I put WU 4162406?
ID: 44169 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 44170 - Posted: 18 Oct 2012, 23:25:17 UTC - in response to Message 44165.  

The r1643 app produced high Q's and was Strongly similar:

MB7_win_x86_SSE_OpenCL_ATi_r1643.exe -verb -nog / 05ap10al.4475.10297.140733193388035.14.229.wu :
AppName: MB7_win_x86_SSE_OpenCL_ATi_r1643.exe
AppArgs: -verb -nog
TaskName: 05ap10al.4475.10297.140733193388035.14.229.wu
Started at : 22:50:02.603
Ended at : 23:15:51.340
1548.697 secs Elapsed
53.992 secs CPU time

R2: .\ref\ref-setiathome_6.98_windows_intelx86.exe-05ap10al.4475.10297.140733193388035.14.229.wu.res
Result : Strongly similar, Q= 99.97%

<best_gaussian>
  <peak_power>2.750018119812</peak_power>
  <mean_power>0.51439273357391</mean_power>
  <time>2455292.6055517</time>
  <ra>11.498516637755</ra>
  <decl>19.65026491074</decl>
  <q_pix>0</q_pix>
  <freq>1419737750.8879</freq>
  <detection_freq>1419742978.3342</detection_freq>
  <barycentric_freq>0</barycentric_freq>
  <fft_len>16384</fft_len>
  <chirp_rate>-48.738113544999</chirp_rate>
  <rfi_checked>0</rfi_checked>
  <rfi_found>0</rfi_found>
  <reserved>0</reserved>
  <sigma>3.7909734249115</sigma>
  <chisqr>1.2279407978058</chisqr>
  <null_chisqr>2.0739617347717</null_chisqr>
  <score>0</score>
  <max_power>6.304801940918</max_power>
  <pot length=200 encoding="x-csv">
    17,2,57,30,1,0,28,6,3,70,49,6,41,3,22,50,69,49,1,26,4,7,10,19,8,50,3,18,
    52,15,6,11,15,32,10,2,16,2,15,43,23,35,17,9,13,1,14,27,9,146,126,105,192,
    114,214,16,255,110,101,19,53,101,2,21
  </pot>
</best_gaussian>


Claggy
ID: 44170 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 44174 - Posted: 19 Oct 2012, 7:58:14 UTC - in response to Message 44165.  

I'll start it off with workunit 4133491

The WU has now been validated by a third user - all three tasks are OK.

I should have checked earlier - there are no reportable gaussians in the result. So does it make sense for the validator to check for the best of none?
ID: 44174 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44178 - Posted: 19 Oct 2012, 11:09:39 UTC - in response to Message 44169.  

OK, where do I put WU 4162406?


Depends on what CPU version says ;)
ID: 44178 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44179 - Posted: 19 Oct 2012, 11:10:47 UTC - in response to Message 44174.  

I'll start it off with workunit 4133491

The WU has now been validated by a third user - all three tasks are OK.

I should have checked earlier - there are no reportable gaussians in the result. So does it make sense for the validator to check for the best of none?


This question was answered as "yes" for MB on design time and as "no" for AP :)
ID: 44179 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44274 - Posted: 28 Oct 2012, 12:29:14 UTC
Last modified: 28 Oct 2012, 12:32:42 UTC

Here http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4178828 inconclusive between CPU 6.98 stock and cuda32.
CUDA res not overflowed so maybe worth analyse.

CUDA:
Spike count: 2
Autocorr count: 1
Pulse count: 5
Triplet count: 0
Gaussian count: 0

CPU:
Spike count: 1
Autocorr count: 0
Pulse count: 5
Triplet count: 0
Gaussian count: 0
ID: 44274 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 44276 - Posted: 28 Oct 2012, 12:50:20 UTC - in response to Message 44274.  
Last modified: 28 Oct 2012, 12:55:51 UTC

An anonymous wingmate with a laptop CPU and a desktop GPU, who has aborted most of his or her tasks? The omens aren't good, but I'll give it a bench.

http://boinc2.ssl.berkeley.edu/beta/download/1c1/05ap10al.28705.1299.140733193388039.14.254
ID: 44276 · Report as offensive
Darkknight900
Volunteer tester

Send message
Joined: 27 Nov 09
Posts: 3
Credit: 78,707
RAC: 0
Germany
Message 44288 - Posted: 1 Nov 2012, 18:11:38 UTC

Something wrong here?

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4203153

posting at cuda and ati since both versions involved...
ID: 44288 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 44290 - Posted: 1 Nov 2012, 19:03:17 UTC - in response to Message 44288.  

Both hosts are anonymous, never a good start for troubleshooting.

But when one host in a validation pair gets 'result overflow' (in this case, the ATI), and the other doesn't, then usually the overflow is a host problem.
ID: 44290 · Report as offensive
Darkknight900
Volunteer tester

Send message
Joined: 27 Nov 09
Posts: 3
Credit: 78,707
RAC: 0
Germany
Message 44292 - Posted: 1 Nov 2012, 20:01:39 UTC

http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=60134 is my host... Edited the settings now i forgot...
ID: 44292 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44417 - Posted: 7 Dec 2012, 13:13:53 UTC
Last modified: 7 Dec 2012, 13:17:11 UTC

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4122389
2 CUDA32, both non-overflowed but different number of signals.
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4119498
CUDA vs CPU 6.98, different number of signals, non-overflowed
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4253284
CUDA22 vs ATi both non-overflowed, different number of signals.
ID: 44417 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 44427 - Posted: 7 Dec 2012, 19:09:03 UTC - in response to Message 44417.  
Last modified: 7 Dec 2012, 19:14:45 UTC

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4122389
2 CUDA32, both non-overflowed but different number of signals.
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4119498
CUDA vs CPU 6.98, different number of signals, non-overflowed
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4253284
CUDA22 vs ATi both non-overflowed, different number of signals.


These are all running the Cuda 6.98 apps, and not the Bug fixed Cuda 6.99 apps,

Claggy
ID: 44427 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44431 - Posted: 7 Dec 2012, 19:31:36 UTC - in response to Message 44427.  

Yes, that's why I write in 6.987 and not in 6.99 thread ;)
BTW, my host has many CUDA tasks... but all 6.98 and no 6.99 at all :/
Thinking if I should doproject reset...
ID: 44431 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 44434 - Posted: 7 Dec 2012, 19:36:54 UTC - in response to Message 44431.  

Yes, that's why I write in 6.987 and not in 6.99 thread ;)
BTW, my host has many CUDA tasks... but all 6.98 and no 6.99 at all :/
Thinking if I should doproject reset...


I've started a Cuda 6.99 thread, feel free to do a project reset.

Claggy

ID: 44434 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 16 Jun 05
Posts: 2530
Credit: 1,074,556
RAC: 0
Germany
Message 44443 - Posted: 8 Dec 2012, 10:31:02 UTC

My first and only invalid unit with CPU.
Both wingmen are on 6.98 cuda.
I hope this is fixed in 6.99.
Sad no result print in stderr still.

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4257838

With each crime and every kindness we birth our future.
ID: 44443 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 44445 - Posted: 8 Dec 2012, 11:09:36 UTC - in response to Message 44443.  

My first and only invalid unit with CPU.
Both wingmen are on 6.98 cuda.
I hope this is fixed in 6.99.

I doubt it, all three results found 30 Autocorrections, the problem is the CPU and GPU apps do their searches in slightly different orders, so find different signals in these edge cases,

Claggy
ID: 44445 · Report as offensive
1 · 2 · Next

Message boards : SETI@home Enhanced : Workunits with inconsistent results with CUDA v6.98 apps involved


 
©2021 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.