SETI@home v8 beta to begin on Tuesday

Message boards : News : SETI@home v8 beta to begin on Tuesday
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 94 · 95 · 96 · 97 · 98 · 99 · Next

AuthorMessage
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 61471 - Posted: 12 Nov 2017, 20:59:27 UTC - in response to Message 61469.  

about what is expected before an application is accepted for deployment as a stock application on that SETI Main server,

Richard, binary passed beta testing. In all senses possible, there is no point for discussion. Beta testing itself was inadequatedue to lack of tasks and/or hosts diversity. If it willbe onbeta one more year it would change nothing until beta will be improved itself.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 61471 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 61472 - Posted: 12 Nov 2017, 21:33:29 UTC - in response to Message 61471.  

about what is expected before an application is accepted for deployment as a stock application on that SETI Main server,
Richard, binary passed beta testing. In all senses possible, there is no point for discussion. Beta testing itself was inadequatedue to lack of tasks and/or hosts diversity. If it willbe onbeta one more year it would change nothing until beta will be improved itself.
But the post you're quoting from makes no mention of 'testing at Beta'. I don't think any of the application versions you and I have worked on together have relied exclusively on testing at Beta. We've done offline bench testing with known stock apps for comparison: run under app_info at Main (which produces exactly the same result files as stock deployment): monitored host result lists for valid/inconclusive/invalid: downloaded live data files for checking in bench tests when results look unusual: and so on.

I suspect the issue is that Eric assumes that the whole gamut of testing has been run before an app is submitted to him for deployment: others may assume that an app offered to Eric for Beta deployment subsequently goes through all necessary testing stages before being transferred to Main.
ID: 61472 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 5,019,318
RAC: 0
United States
Message 61473 - Posted: 13 Nov 2017, 0:17:59 UTC - in response to Message 61470.  
Last modified: 13 Nov 2017, 0:46:17 UTC

Well, we definitely have 2 issues here:
1) Lack of adequate testing tasks on beta.
2) Lack of direct communication between OS X apps builder/tester (at least partial tester) and Eric.

All last OS X binaries updates passed through my understanding of OS X current situation (having no Mac hardware and no even slightest desire to have it) .
Maybe would be better if more direct route will be functional.

What demands from TBar as binaries reliser are valids from my point of view is to be as descriptive as possible regarding what hardware and what software/OS involved. Personally I'm lost in all those Maverics and whatever (ut seems Google tries to be next Apple on all terms, including proprietary APIs and non-consistent naming schemes). What plan class sees is OS version number, not all those fancy PR names. W/o clear vision for what subset of hosts some binary intended plan_class creation not possible. Maybe some help if other volunteers from Mac-side of world required here.
You forgot the third one,
3) A Sanity checker set a touch to high.
You could also add a fourth,
4) A New version of BOINC that can't read the GPU Memory correctly helping the Sanity Check Fail on some machines.

The Failures with the nVidia version on hardware slightly different than stock indicates a Sanity Checker a little too unforgiving. The nVidia version should be replaced as well, you shouldn't get Sanity failures from just using third party hardware. I believe if you check you will find the machines with the most failures are the ones reporting Zero or Lower GPU Memory. Some Projects are also having Failures from BOINC reporting Negative GPU Memory, Negative amount of GPU VRAM and valid Einstein results discarded. The requirement for the ATI App are Extremely simple, the OpenCL driver is built-in to the OS. All you need for that ATI App is OSX 10.7.5 (Darwin 11.4.2) or above, you can't get much simpler.
There is nothing I can do about software that has a Sanity Checker set too high, there also isn't much you can do about BOINC reporting Negative GPU Memory to the Sanity Check that is already set too high.
Fortunately, My ATI Cards along with all the other HD 5000 & HD 6000's didn't have the problem with the Sanity Checker. If you check the Macs at Beta you will find many of them are HD 5770 and some HD 6700 & 6900s which wouldn't have problems with the Sanity. That leaves few Macs that would have found the problem, and apparently those few at Beta are working well enough to pass the Sanity Check. It appears the r3610 version has a more tolerant Checker as the machines running it on Main haven't had any trouble in the months they've been running it.

No one person, or couple of people, can check the software better than running it on Beta. To keep harping that more testing should be done before arriving at Beta is just a lame attempt to Blame someone else rather than placing the blame on the Beta procedure where it belongs.
ID: 61473 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 61474 - Posted: 13 Nov 2017, 4:37:24 UTC - in response to Message 61473.  
Last modified: 13 Nov 2017, 4:39:20 UTC

What do you mean under "Sanity checker set too high" ?
sanity check is procedure that compares aquired results with theoretically possible (more precisely, impossible) ones.
The single ajustable limit is for Autocorr search. If sanity check fails on another type of signal it means processing gave theoretically impossible result, there is nothing to ajust.
More or less sanity check failures then just means more or less stable computations app does for some reason.
Maybe in low-memory condition some of kernels fail silently (it can be cuFFT kernel cause cuFFT requires huge amount of RAM or some of new Petri kernels or some of baseline kernels). And this failure results in theoretically impossible numbers in some of result fields. That causing sanity check fail but that's exactly it designed for - to catch errors.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 61474 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 5,019,318
RAC: 0
United States
Message 61475 - Posted: 13 Nov 2017, 5:15:22 UTC - in response to Message 61474.  
Last modified: 13 Nov 2017, 5:56:06 UTC

Whatever is causing the problem is occurring at startup, the App is failing to start numerous times. What else could cause random startup failures? Since the cards that aren't affected are the Legacy cards it would appear it's some difference with the newer driver. The 5000 & 6000 series were labeled Legacy by AMD some time ago, the 7000 series and higher use a different driver and are the ones suffering the startup failures. The same problem with the nVidia version. Most people using the Apple Supplied GPUs are using the Built-in Apple Supplied driver while those using the Non-Apple cards are using the Web Driver from nVidia that must be downloaded and installed. So, what's being randomly triggered by the different drivers in r3552 that's not being triggered by r3610?

This is a normal startup in r3552;
LotOfMem path: no
LowPerformanceGPU path: no
HighPerformanceGPU path: no
period_iterations_num=50
Triplet: peak=11.68754, time=37.56, period=23.58, d_freq=2225387591.59, chirp=0.64733, fft_len=128
Autocorr: peak=19.66915, time=17.18, delay=3.7208, d_freq=2225385894.55, chirp=-8.8951, fft_len=128k
Pulse: peak=1.974179, time=45.9, period=3.657, d_freq=2225389062.62, score=1.023, chirp=-19.039, fft_len=2k
D: threshold 0.3587262; unscaled peak power: 0.3640594 exceeds threshold for 1.487%
Triplet: peak=11.68612, time=65.65, period=20.01, d_freq=2225385645.68, chirp=-38.806, fft_len=32

This is a bad startup in 3552;
LotOfMem path: no
LowPerformanceGPU path: no
HighPerformanceGPU path: no
period_iterations_num=50
OpenCL platform detected: Apple
Number of OpenCL devices found : 2

So what is happening right after 'period_iterations_num=50' is printed to cause the task to restart?
ID: 61475 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 61476 - Posted: 13 Nov 2017, 7:03:48 UTC - in response to Message 61475.  

That could be checked with increased verbositybuild. Butwe have no hardware to test.
I asked forguinea pig host many times already no one volunteered.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 61476 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 61477 - Posted: 13 Nov 2017, 7:09:34 UTC - in response to Message 61472.  

But the post you're quoting from makes no mention of 'testing at Beta'. I don't think any of the application versions you and I have worked on together have relied exclusively on testing at Beta. We've done offline bench testing with known stock apps for comparison: run under app_info at Main (which produces exactly the same result files as stock deployment): monitored host result lists for valid/inconclusive/invalid: downloaded live data files for checking in bench tests when results look unusual: and so on.

I suspect the issue is that Eric assumes that the whole gamut of testing has been run before an app is submitted to him for deployment: others may assume that an app offered to Eric for Beta deployment subsequently goes through all necessary testing stages before being transferred to Main.


All that is right. But how it relates to current situation with ATi app release?
Why we did testing? Because I had no suitable hardware and you had.
Do we have eligible host currently? No.

Regarding bad cases collection - there were no such cases (according to TBar) before deployment on main - what to collect then?
There were no failures on hardware at oddline testers disposal. That is, after you post we still can't do anything additional to improve situation. Besides that I fully agree.

Single reason I encourage to use beta servers is much bigger diversity they can provide versus limited diversity I and offline testers team could provide.
And now w clearly see that even beta diversity not enough? So? Need to improve that! Load different tapes.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 61477 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 5,019,318
RAC: 0
United States
Message 61478 - Posted: 13 Nov 2017, 7:48:06 UTC - in response to Message 61476.  
Last modified: 13 Nov 2017, 8:09:34 UTC

That could be checked with increased verbosity build. But we have no hardware to test.
I asked for guinea pig host many times already no one volunteered.
Well, my machine doesn't have the startup problem. It also, like all other 5000 & 6000 series ATI cards, doesn't have the problem with BOINC 7.8.2 & 7.8.3 detecting Negative GPU Memory. Check it out, I suppose it's just another one of those coincidences. The GPUs that don't have the startup problem with r3552 also don't have the GPU memory problem with BOINC 7.8.x. Instead of trying to troubleshoot year old software wouldn't it be easier to just use the current software that doesn't have the problem? You could just do a limited release of r3610 to see if it really does work, say, release it to the latest OS version, Darwin 17.0.0 and above, and to the rest later.

Coprocessors : AMD ATI Radeon HD 5770 (1024MB) OpenCL: 1.2
Operating System : Darwin 16.7.0
BOINC version : 7.8.3
Coprocessors : AMD ATI Radeon HD 5770 (1024MB) OpenCL: 1.2
Operating System : Darwin 17.3.0
BOINC version : 7.8.3
Coprocessors [2] AMD AMD Radeon HD - FirePro D500 Compute Engine (-1024MB) OpenCL: 1.2
Operating System : Darwin 16.7.0
BOINC version : 7.8.3
Coprocessors : AMD AMD Radeon R9 M290X Compute Engine (-2048MB) OpenCL: 1.2
Operating System : Darwin 17.2.0
BOINC version : 7.8.3
Coprocessors : AMD ATI Radeon HD 6750M (512MB) OpenCL: 1.2
Operating System : Darwin 16.7.0
BOINC version : 7.8.3
etc...etc....etc...

BTW, anyone is free to look through the ATI Hosts at Beta and see if you can find a single Error result declaring Too Many Exits. I couldn't find any.
ID: 61478 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 61480 - Posted: 13 Nov 2017, 15:00:22 UTC - in response to Message 61478.  

What toolset used to build r3552 and r3610? Different ones?
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 61480 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 5,019,318
RAC: 0
United States
Message 61481 - Posted: 13 Nov 2017, 15:25:02 UTC - in response to Message 61480.  
Last modified: 13 Nov 2017, 16:24:33 UTC

Both Apps were built in Darwin 10.12.6 with the same tools. The difference is I tested different Defines and came up with a different set from what You had suggested for r3552. The set I used for r3610 works better on all the machines that tested it.
r3552; Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY OCL_CHIRP3 ASYNC_SPIKE FFTW JSPF SSSE3 64bit
r3610; Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_CHIRP3 ASYNC_SPIKE FFTW SSSE3 64bit

The version for nVidia works better with adding OCL_ZERO_COPY.
r3551, Build features: SETI8 Non-graphics OpenCL USE_OPENCL_INTEL OCL_ZERO_COPY OCL_CHIRP3 ASYNC_SPIKE FFTW JSPF SSSE3 64bit
The newer version of the nVidia App also doesn't use JSPF, SSSE3x OS X 64bit Build 3709

BTW, this Windows SoG App was found to have a Bad Best Gaussian when compared against the ATI Non-SoG App, the CPU agrees with the ATI App, SSSE3x OS X 64bit Build 3710
ID: 61481 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 61482 - Posted: 13 Nov 2017, 17:34:52 UTC - in response to Message 61481.  

r3610 is on beta servers already? under what plan class?
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 61482 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 5,019,318
RAC: 0
United States
Message 61483 - Posted: 13 Nov 2017, 18:19:02 UTC - in response to Message 61482.  

8.22 (opencl_ati5_mac) : 17 Oct 2017, 23:48:14 UTC : 45 GigaFLOPS http://setiweb.ssl.berkeley.edu/beta/setiathome_v8_x86_64-apple-darwin__opencl_ati5_mac.html
ID: 61483 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 61484 - Posted: 13 Nov 2017, 21:06:47 UTC - in response to Message 61483.  
Last modified: 13 Nov 2017, 21:07:16 UTC

OK, lets try...

Still I would prefer to know for sure if r3610 really free from this silent terminations or not. But this requires volunteer with right hardware and seems we don't have one.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 61484 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 61485 - Posted: 13 Nov 2017, 21:09:22 UTC - in response to Message 61481.  

The newer version of the nVidia App also doesn't use JSPF, SSSE3x OS X 64bit Build 3709

For what reason?
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 61485 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 5,019,318
RAC: 0
United States
Message 61486 - Posted: 13 Nov 2017, 21:47:43 UTC - in response to Message 61485.  
Last modified: 13 Nov 2017, 21:52:39 UTC

I don't see it here anywhere, https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/AKv8/ConfigureOSX_AKv8d_OPENCL_SSE3_MBv8.txt
In fact, I don't see it in any GPU Application. The only place I see JSPF is in CPU Apps, so, why did You suggest putting it in the OSX GPU Apps? The Apps seem to work just fine without it.
ID: 61486 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 5,019,318
RAC: 0
United States
Message 61487 - Posted: 13 Nov 2017, 22:27:42 UTC - in response to Message 61484.  

Still I would prefer to know for sure if r3610 really free from this silent terminations or not. But this requires volunteer with right hardware and seems we don't have one.
These two machines have been running r3610 for months, the only Errors I see is from what I suspect is the cmdline 'Target kernel sequence time set to 600ms'. The other machine isn't using that line and doesn't have any Errors;
Running 2 instances per GPU, https://setiathome.berkeley.edu/results.php?hostid=8243589
Running 3 instances per GPU with a problematic cmdline, https://setiathome.berkeley.edu/results.php?hostid=6105482
A refugee from Q & A still clearing Ghosts, https://setiathome.berkeley.edu/results.php?hostid=8248108
I don't see any Too Many Exits there.
ID: 61487 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 5,019,318
RAC: 0
United States
Message 61492 - Posted: 15 Nov 2017, 16:16:38 UTC - in response to Message 61485.  
Last modified: 15 Nov 2017, 16:38:33 UTC

The newer version of the nVidia App also doesn't use JSPF, SSSE3x OS X 64bit Build 3709

For what reason?

Have you ever put JSPF in one of Your GPU Apps? The other Mac App 8.20 r3556, and even 8.10r3430 doesn't have JSPF. None of my other GPU builds have it either. So, why do you think JSPF should be in the Mac GPU Apps?
The Only GPU Apps that I know of with JSPF are r3552 & 3551, which are the only two Apps I know of that have the Too Many Exits problem. Perhaps it would be better without it?

BTW, Chris says he's had other problems with his D700 Mac, and you should just look at his other machine, the D500 one, https://setiathome.berkeley.edu/results.php?hostid=8243589. He also says the D500 is also running Three Tasks at once, just like the D700. As far as I know the other Platforms AMD GPUs can't run 3 or even 2 Tasks at once.
ID: 61492 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 61495 - Posted: 16 Nov 2017, 18:30:13 UTC - in response to Message 61492.  

So, why do you think JSPF should be in the Mac GPU Apps?
The Only GPU Apps that I know of with JSPF are r3552 & 3551, which are the only two Apps I know of that have the Too Many Exits problem. Perhaps it would be better without it?

This option governs CPU-based Pulse computations so should have minimal influence on GPU builds.
Correlation you noticed may be random one (or not).

Regarding running few per once - stockbuild intended to run correctly one per device. All above is user's responsibility and choice.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 61495 · Report as offensive
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1700
Credit: 13,216,373
RAC: 0
Sweden
Message 61500 - Posted: 19 Nov 2017, 12:10:34 UTC
Last modified: 19 Nov 2017, 12:13:51 UTC

16 days later, and still 100% *DIAG_KIC*
Isn't the idea of Beta, that we should test all kinds of WU's and apps?
One WU type is surely not enough.
Yes I know that there really isn't much testing going on right now, but nevertheless,
as long as we get tasks, provide a good mix at least.

Not many running beta though (377 Users in last 24 hours). Even when we had new apps to test,
it was rarely over 400 Users in last 24 hours.
With so few testers/computers, there's no way in h*** we can find all the bugs.
It's a need for a Beta push, to get new Beta participants for the next round of app tests.
And of course a better mix of WU's.
ID: 61500 · Report as offensive
Profile Gary Charpentier
Volunteer tester
Avatar

Send message
Joined: 9 Apr 07
Posts: 1701
Credit: 4,622,751
RAC: 0
United States
Message 61502 - Posted: 19 Nov 2017, 16:06:34 UTC - in response to Message 61500.  

16 days later, and still 100% *DIAG_KIC*
Isn't the idea of Beta, that we should test all kinds of WU's and apps?
One WU type is surely not enough.
Yes I know that there really isn't much testing going on right now, but nevertheless,
as long as we get tasks, provide a good mix at least.

Not many running beta though (377 Users in last 24 hours). Even when we had new apps to test,
it was rarely over 400 Users in last 24 hours.
With so few testers/computers, there's no way in h*** we can find all the bugs.
It's a need for a Beta push, to get new Beta participants for the next round of app tests.
And of course a better mix of WU's.

Well, make cafe SETI at Beta a hip place to be and at least a few more will have to crunch to be there.
ID: 61502 · Report as offensive
Previous · 1 . . . 94 · 95 · 96 · 97 · 98 · 99 · Next

Message boards : News : SETI@home v8 beta to begin on Tuesday


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.