SETI@home v8 beta to begin on Tuesday

Message boards : News : SETI@home v8 beta to begin on Tuesday
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 41 · 42 · 43 · 44 · 45 · 46 · 47 . . . 99 · Next

AuthorMessage
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57366 - Posted: 16 Mar 2016, 5:59:52 UTC - in response to Message 57349.  


Meanwhile, prog(ress) never made it past 40...

(I didn't bother with 1-second sampling - I think this shows the effect clearly enough. I'll perhaps try again tomorrow.)

What prog value recorded at state.sah when task finishes?
I would like to have some test case for offline checking.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57366 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57367 - Posted: 16 Mar 2016, 7:48:49 UTC - in response to Message 57364.  


Or the file via E-Mail?

Better this way.


I should test it here or at Main?

Testing on beta preferable cause results remain visible long enough. Also, if app not working properly it's not wise to use it on main.


Could be a SETI GPU app 'destroy' the AMD driver?

No. It's just Windows who destroys (restarts) driver. Windows watchdog timer exceeded most probably.


This was with Crimson 16.3 Hotfix (Beta) - and the reason that I installed again Crimson 15.12 (if the 16.3 driver isn't longer usable?) (of course, between usage of DDU ;-).

With what driver r3330 worked well?
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57367 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 57374 - Posted: 16 Mar 2016, 12:25:52 UTC - in response to Message 57366.  

Meanwhile, prog(ress) never made it past 40...

(I didn't bother with 1-second sampling - I think this shows the effect clearly enough. I'll perhaps try again tomorrow.)

What prog value recorded at state.sah when task finishes?
I would like to have some test case for offline checking.

I managed to snag one of the very, very high AR WUs that started all this off.

I won't bore you with all 464 lines, but here are the significant ones.

wu_name: 24no10ab.7605.6611.8.42.47
WU true angle range is :  136.505732

            <prog>        <fraction_done>
11:35:33
11:35:34    0.00000745
11:35:35    0.00000745
11:35:36    0.00031612    0.000012
11:35:37    0.00056272    0.000012
11:35:38    0.00086394    0.000012
11:35:39    0.00111551    0.000012
11:35:40    0.00136377    0.000012
<snip>
11:43:12    0.12065697    0.000012
11:43:13    0.12165332    0.000012
11:43:14    0.12312135    0.000012
11:43:15    0.12312135    1.000000
11:43:16    0.12312135    1.000000

I also saw what happens to estimates if you tell BOINC that your application has only made 0.0012% progress in 7 minutes:



I've kept the WU file (saves downloading it again later), and I'll try and organise an offline run after the budget speech.
ID: 57374 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 57375 - Posted: 16 Mar 2016, 15:06:46 UTC - in response to Message 57366.  

What prog value recorded at state.sah when task finishes?

OK, offline results.

1) Running MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG with standard 60-second checkpoint interval.

Final checkpoint file timestamped ‎16 ‎March ‎2016, ‏‎14:34:41
App Ended at : 14:34:43.804 (2.8 seconds later)

Final state.sah starts

<ncfft>99182</ncfft>
<cr>-9.999761e+001</cr>
<fl>32768</fl>
<prog>0.12312135</prog>
<potfreq>-1</potfreq>
<potactivity>0</potactivity>
<signal_count>5</signal_count>
<flops>964166035.817873</flops>
<spike_count>5</spike_count>
<autocorr_count>0</autocorr_count>
<pulse_count>0</pulse_count>
<gaussian_count>0</gaussian_count>
<triplet_count>0</triplet_count>


2) Reference run with Lunatics_x41zi_win32_cuda50

Final checkpoint file timestamped 16 ‎March ‎2016, ‏‎14:26:01
App Ended at : 14:27:05.574 (64.5 seconds later)

Final state.sah starts

<ncfft>84154</ncfft>
<cr>2.916597e+001</cr>
<fl>131072</fl>
<prog>0.85205204</prog>
<potfreq>-1</potfreq>
<potactivity>0</potactivity>
<signal_count>4</signal_count>
<flops>14347800817644.363000</flops>
<spike_count>4</spike_count>
<autocorr_count>0</autocorr_count>
<pulse_count>0</pulse_count>
<gaussian_count>0</gaussian_count>
<triplet_count>0</triplet_count>

(but validated Q= 99.96% - the fifth spike must have been found in the last minute)


3) Running MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG with special 1-second checkpoint interval.

Final checkpoint file timestamped 16 ‎March ‎2016, ‏‎14:55:22
App Ended at : 14:55:24.560 (2.5 seconds later)

Final state.sah starts

<ncfft>99182</ncfft>
<cr>-9.999761e+001</cr>
<fl>32768</fl>
<prog>0.12312135</prog>
<potfreq>-1</potfreq>
<potactivity>0</potactivity>
<signal_count>5</signal_count>
<flops>964166035.817873</flops>
<spike_count>5</spike_count>
<autocorr_count>0</autocorr_count>
<pulse_count>0</pulse_count>
<gaussian_count>0</gaussian_count>
<triplet_count>0</triplet_count>

I was checking at intervals throughout both SoG runs, and state.sah was being updated at the prescribed intervals - but it looks as if Murphy's law intervened and the final 60-second checkpoint occurred just as the app was preparing to clean up anyway. It looks to me as if both <prog> and <fraction_done> are broken, but in different ways.

I have to put this research to one side now, and go out - back tomorrow (Thursday) evening, and we can pick it up again then.
ID: 57375 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57382 - Posted: 16 Mar 2016, 22:22:14 UTC - in response to Message 57375.  
Last modified: 16 Mar 2016, 22:46:54 UTC

I think I have idea how to fix resulting readings a little (though it's all cosmetic)

EDIT: what if BOINC will have 100% completion instead of 0.0012% after few seconds from task start? Will it be preferable than sit at 0.0012% for most of time?
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57382 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 7 Jun 09
Posts: 285
Credit: 2,822,466
RAC: 0
Germany
Message 57394 - Posted: 17 Mar 2016, 19:25:24 UTC - in response to Message 57367.  
Last modified: 17 Mar 2016, 19:32:23 UTC


Or the file via E-Mail?

Better this way.


I should test it here or at Main?

Testing on beta preferable cause results remain visible long enough. Also, if app not working properly it's not wise to use it on main.


Could be a SETI GPU app 'destroy' the AMD driver?

No. It's just Windows who destroys (restarts) driver. Windows watchdog timer exceeded most probably.


This was with Crimson 16.3 Hotfix (Beta) - and the reason that I installed again Crimson 15.12 (if the 16.3 driver isn't longer usable?) (of course, between usage of DDU ;-).

With what driver r3330 worked well?


So 'nothing' could 'destroy' the AMD driver this way, so I would need to install the AMD driver again?

With (Catalyst 15.7.1(?) up to 15.11) Crimson 15.11 all others after up to Crimson 16.3 Hotfix (Beta) r3330 work/ed fine - IIRC.
-> The PC is running since ~ October last year and I had the newest/current (also Beta) AMD drivers installed.

But, the default cpu_lock of r3330 don't work properly (with all used drivers).
If default enabled all 4 GPU (1 WU/GPU) apps are fixed at Core#0.
I need to use -no_cpu_lock, so all Cores are used.

With r3401 the default cpu_lock work like it should.
Core#0, #1, #2 and #3 each with one fixed GPU app.

Please send me your E-Mail via private message. ;-)
ID: 57394 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 57405 - Posted: 18 Mar 2016, 2:17:28 UTC - in response to Message 57363.  
Last modified: 18 Mar 2016, 2:22:18 UTC

Let me know if you need any additional information.

Thanks,

Chris
Chris, my reruns of two wus from your missed Gaussians results have finished (see results if interested)
No signals have been found missing. From my point of view the Mac apps work ok for these wus.

Additional i've looked over mostly all the results in your result list and found that at some point in time the second GPU starts to run at lower frequency.
That could point to a driver crash, but i'm not quite sure yet.
Does your system logs show any problem that could come from a crashed GPU driver on the second GPU ?

There is also some other Mac with AMD D700 GPUs at beta, which seems to have no trouble at all with these apps.
https://setiweb.ssl.berkeley.edu/beta//show_host_detail.php?hostid=71984
_\|/_
U r s
ID: 57405 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 57414 - Posted: 18 Mar 2016, 12:08:35 UTC - in response to Message 57405.  

Hmm, interesting. I have not noticed and driver crashes, but I will certainly take a look at my system log to see if there is anything going on there. Where are you seeing the clock drop? In the openCL report in the output file? I have occasionally seen what I'd call erroneous (I.e 150MHz info reported there, as I never see hugely different completion times between the two units. Thank you for the information. I'll dig into my system logs this morning and see what's happening. Might be a case of me running beta OS X. I have a new version of 10.11.4 to install today as well.

Thanks,

Chris
ID: 57414 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 16 Jun 05
Posts: 2530
Credit: 1,074,556
RAC: 0
Germany
Message 57417 - Posted: 18 Mar 2016, 13:25:14 UTC

But, the default cpu_lock of r3330 don't work properly (with all used drivers).
If default enabled all 4 GPU (1 WU/GPU) apps are fixed at Core#0.
I need to use -no_cpu_lock, so all Cores are used.

With r3401 the default cpu_lock work like it should.
Core#0, #1, #2 and #3 each with one fixed GPU app.


Thats why i suggest to use 3401.
Its more stable also.

I just have to run a few more speed benches to make sure about the speed settings.
With each crime and every kindness we birth our future.
ID: 57417 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 57422 - Posted: 18 Mar 2016, 16:32:00 UTC - in response to Message 57414.  

Hmm, interesting. I have not noticed and driver crashes, but I will certainly take a look at my system log to see if there is anything going on there. Where are you seeing the clock drop? In the openCL report in the output file? I have occasionally seen what I'd call erroneous (I.e 150MHz info reported there, as I never see hugely different completion times between the two units. Thank you for the information. I'll dig into my system logs this morning and see what's happening. Might be a case of me running beta OS X. I have a new version of 10.11.4 to install today as well.

FYI, I looked at my system logs. I don't see any driver restarts per se but there looks like there is an OpenCL error every so often with the 8.07 app. My guess it is due to the beta version of OS X I'm running. About to update to the 7th beta of it so we'll see if it continues.

Thanks,

Chris
ID: 57422 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 57423 - Posted: 18 Mar 2016, 18:24:42 UTC - in response to Message 57422.  

Hmm, interesting. I have not noticed and driver crashes, but I will certainly take a look at my system log to see if there is anything going on there. Where are you seeing the clock drop? In the openCL report in the output file? I have occasionally seen what I'd call erroneous (I.e 150MHz info reported there, as I never see hugely different completion times between the two units. Thank you for the information. I'll dig into my system logs this morning and see what's happening. Might be a case of me running beta OS X. I have a new version of 10.11.4 to install today as well.

FYI, I looked at my system logs. I don't see any driver restarts per se but there looks like there is an OpenCL error every so often with the 8.07 app. My guess it is due to the beta version of OS X I'm running. About to update to the 7th beta of it so we'll see if it continues.

Thanks,

Chris


It also reports that 8.07 is making too many wakeup calls, it allows 150 per second and sometimes there are as many as 1300 per second. Doesn't seem to cause a crash exactly, but it is reported in the system log.

Chris
ID: 57423 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 57425 - Posted: 18 Mar 2016, 22:27:02 UTC - in response to Message 57423.  
Last modified: 18 Mar 2016, 22:27:57 UTC

Hmm, interesting. I have not noticed and driver crashes, but I will certainly take a look at my system log to see if there is anything going on there. Where are you seeing the clock drop? In the openCL report in the output file? I have occasionally seen what I'd call erroneous (I.e 150MHz info reported there, as I never see hugely different completion times between the two units. Thank you for the information. I'll dig into my system logs this morning and see what's happening. Might be a case of me running beta OS X. I have a new version of 10.11.4 to install today as well.

FYI, I looked at my system logs. I don't see any driver restarts per se but there looks like there is an OpenCL error every so often with the 8.07 app. My guess it is due to the beta version of OS X I'm running. About to update to the 7th beta of it so we'll see if it continues.

Thanks,

Chris


It also reports that 8.07 is making too many wakeup calls, it allows 150 per second and sometimes there are as many as 1300 per second. Doesn't seem to cause a crash exactly, but it is reported in the system log.

Chris

Is there listed if ati5 or ati5_SoG causes the wakeup calls ?
_\|/_
U r s
ID: 57425 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 57426 - Posted: 18 Mar 2016, 23:55:48 UTC - in response to Message 57425.  

Both the SoG and non-SoG are guilty based on the log files.

Chris
ID: 57426 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 57430 - Posted: 19 Mar 2016, 18:08:39 UTC - in response to Message 57426.  

Both the SoG and non-SoG are guilty based on the log files.

Chris

Was it with optimized settings or with defaults ?

Someone with a similar Mac (Pro, 2x D300/500/700 GPUs) could look up how many wakecalls happen on their hosts when OpenCL apps are running, to see if this is normal, please.
_\|/_
U r s
ID: 57430 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 57432 - Posted: 20 Mar 2016, 3:57:10 UTC - in response to Message 57430.  

If you don't hear from anyone, I'll have a second Mac Pro in about a month and a half with D500 cards and I'll see if it does the same on it. It looks like the wake up calls were occurring both with default and optimized settings, it's kinda hard to correlate the log files to specific wu's (I actually think it may match some of the inconclusives as well but it's hard to figure that out exactly)but I'll go back to default setting tomorrow and verify for you.

Thanks,

Chris
ID: 57432 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57447 - Posted: 21 Mar 2016, 14:15:47 UTC

Seems Sutaru found bug in current RC apps.
I need task with AR of 1.047818 for offline benchmarking. Please find such.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57447 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 57448 - Posted: 21 Mar 2016, 14:37:56 UTC - in response to Message 57447.  

Seems Sutaru found bug in current RC apps.
I need task with AR of 1.047818 for offline benchmarking. Please find such.

How precise do you need the match to be (+/- ?)

I had a number around that range during last week's test run, like

24no10ab.26598.1703.8.42.240_0

with WU true angle range is : 1.050587

And if that's not close enough, you can always edit the header for testing...
ID: 57448 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57454 - Posted: 22 Mar 2016, 7:54:47 UTC - in response to Message 57448.  

Seems just AR field change not enough to simulate same PulseFind geometry through task. So I prefer to get exact task.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57454 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 16 Jun 05
Posts: 2530
Credit: 1,074,556
RAC: 0
Germany
Message 57457 - Posted: 22 Mar 2016, 9:17:31 UTC

Will not be easy to find exactly same AR out in the field.
With each crime and every kindness we birth our future.
ID: 57457 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 57458 - Posted: 22 Mar 2016, 10:05:22 UTC - in response to Message 57457.  

You could try a command like

findstr "<true_angle_range>1.0478" *.*

in a batch file that you run periodically in the project directory of a machine with a busy cache - either interactively with a pause command to eyeball the results, or scheduled with a redirect/append to a log file for later analysis.

I did it like that, with a 4 decimal place trim to catch near misses, on this machine, but nothing.
ID: 57458 · Report as offensive
Previous · 1 . . . 41 · 42 · 43 · 44 · 45 · 46 · 47 . . . 99 · Next

Message boards : News : SETI@home v8 beta to begin on Tuesday


 
©2021 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.