Message boards :
News :
SETI@home v8 beta to begin on Tuesday
Message board moderation
Previous · 1 . . . 91 · 92 · 93 · 94 · 95 · 96 · 97 . . . 99 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Well, re-looking posts about this issue: 1) HighSierra =17.x.x 2)There is single replacement app, ATi HD5 r3610 one. 3) SSSE3 +. Confusion from CPU app packaged with GPU ones. 4) The question remains. 5) This plan class should be limited to <17.0.0. Additional plan class should be released with Mac OS X/64-bit Intel 8.19 (opencl_ati5_mac) 8 Nov 2016, 23:03:25 UTC 18 GigaFLOPS from beta. That plan class on beta should be replaced with ATi HD5 r3610 . Well, I'll write Eric this roadmap. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
1) What version number of "High Sierra" ?MacOS 13.0.0, Darwin 17.0.0 2)All 3 are non-SoG ?The Only one that is SoG is 8.20r3556, which was compiled in MacOS 13.4 in hopes of better compatibility. 3)http://www.arkayn.us/forum/index.php?PHPSESSID=qg2ujvtrk8jp1rte5605lb7fq6&topic=191.msg4368#msg4368 I see only AVX+ here. Does it mean that all Mac CPUs support at least AVX ?No, it means that Most active Macs support AVX. The CPU Apps are just below where you can Clearly see both a SSSE3 & SSE4.1 CPU App. If someone with an Older Mac needs an older CPU App they can just swap out the CPU App they need. 4) What about this app/plan class: Mac OS X/64-bit Intel 8.10 (opencl_ati_mac)...That Plan Class was for the Older HD4 GPUs that were being tested Here, Main doesn't have that Plan Class. Someone Here will have to decide if Beta still needs an App that is only useful for the HD4 GPUs. 5) On main this app/plan class has biggest impact on ATi GPU on Mac:Actually the r3610 App is just as fast as the Older SoG App that now doesn't work with the current Apple OS. The Older 8.19r3552 is Almost as fast as the 8.20 App and is as fast on some machines. This is the Highest ranked AMD Mac at SETI, it is running r3610 and the current RAC is Higher than it ever was when running 8.20r3556, https://setiathome.berkeley.edu/show_host_detail.php?hostid=6105482. If you want to bother with Plan Classes in order to keep an App that doesn't work with the Current OS, so be it. The Current 8.19r3552 is just about the same as the App that doesn't work, and r3610 is a little better. The r3610 App has been around since early this year and has worked in every Mac that has run it, as far as I know. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Detailed instructions sent to Eric. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
Any Update on the Mac fix? SETI is losing quite a bit of Mac work in the meantime; State: All (416) · In progress (138) · Validation pending (51) · Validation inconclusive (0) · Valid (9) · Invalid (0) · Error (218) A simple Block on the SoG App with Darwin 17.0.0+ would work in the short term as the other App on Main still works, 8.00 (opencl_ati5_mac) : 22 Jan 2016, 0:38:52 UTC : 3,224 GigaFLOPS |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Current state on main is: Mac OS X/64-bit Intel 8.20 (opencl_ati5_mac) 17 Oct 2017, 23:49:50 UTC 8,790 GigaFLOPS Mac OS X/64-bit Intel 8.20 (opencl_ati5_SoG_mac) 28 Dec 2016, 23:34:07 UTC 11,893 GigaFLOPS Current state on beta is: Mac OS X/64-bit Intel 8.10 (opencl_ati_mac) 7 Apr 2016, 1:01:54 UTC 15 GigaFLOPS Mac OS X/64-bit Intel 8.20 (opencl_ati5_SoG_mac) 8 Nov 2016, 23:03:25 UTC 20 GigaFLOPS Mac OS X/64-bit Intel 8.22 (opencl_ati5_mac) 17 Oct 2017, 23:48:14 UTC 47 GigaFLOPS It seems all needed changes are implemented. Those with ATi Macs please report rev numbers for apps deployed 17 Oct. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
I seeing some errors with a number of machines running r3552 on Main. It appears the App is restarting numerous times on start without leaving any comments, https://setiathome.berkeley.edu/results.php?hostid=8144040&state=6&appid= Of course, you don't see any of those types of errors here on Beta. Looking at the top two AMD Macs on Main, both running r3610, I don't see that particular error either. I do see different errors with the Top Mac, but I attribute that to running 3 instances with aggressive settings. It would appear the current r3552 on Main is displaying some new type of error causing it to exit early with; Exit status -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS ??? |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
It would appear the current r3552 on Main is displaying some new type of error causing it to exit early with; To make things more easy to follow please specify rev along with plan class it issued for. r3552 is V8.20 on main for example? And what revision is Mac OS X/64-bit Intel 8.22 (opencl_ati5_mac) 17 Oct 2017, 23:48:14 UTC ? And what did you spot as peculiarities of those hosts? Your example is dual-GPU one. Do we have dual-GPU macs on beta? And more regarding your particular example of host with failures: SETI@home v8 8.20 x86_64-apple-darwin (opencl_ati5_SoG_mac) Number of tasks completed 47953 Max tasks per day 3780 Number of tasks today 0 Consecutive valid tasks 3747 Average processing rate 285.15 GFLOPS Average turnaround time 0.54 days SETI@home v8 8.20 x86_64-apple-darwin (opencl_ati5_mac) Number of tasks completed 1696 Max tasks per day 402 Number of tasks today 267 Consecutive valid tasks 369 Average processing rate 292.84 GFLOPS Average turnaround time 0.45 days So, seems it ran SoG build quite OK before and didn't require update actually. How so? And one more issue: https://setiathome.berkeley.edu/results.php?hostid=8144040&offset=0&show_names=0&state=3&appid= Big number of inconclusives. Should it be attributed to host or to app? News about SETI opt app releases: https://twitter.com/Raistmer |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
Also got this from Charlie Fenton:
![]() |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
I did say there are a number of machines showing this Error on Main, it's not restricted to just one type. They are easy to find, just look at the Mac's error list. What's troubling is the r3552 App has been on Beta for almost a Year and I haven't found a single example of that error at Beta. So, why didn't any of the Macs on Beta produce this error? It seems unlikely none of the Beta machines have this error. The last one is easy, the new OS doesn't work at all with the SoG App, I'd say that is good reason for an update. The question is what changed between Beta and Main to produce this new error? Is the API setting the same? I'll be swapping over to an ATI card later today and see if I can produce this error.It would appear the current r3552 on Main is displaying some new type of error causing it to exit early with;So, seems it ran SoG build quite OK before and didn't require update actually. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
The last one is easy, the new OS doesn't work at all with the SoG App, I'd say that is good reason for an update. OS for that case sample is 16.7 - is it new or not? SoG worked OK on that host. It's possible that he updated OS just very recently though... News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Also got this from Charlie Fenton: Maintenance in this case would mean to change driver version to working one. For Mac it probably mean to change OS to working one. Unfortunately (fortunately, actually) I have no of this proprietary devices available in any vicinity so hardly could do anything with that error. So, if no one has better option, we should go restrictive way again here. To determine subset of hosts with issue and forbid them from app download. Another option would be "guinea pig" with ATi-based Mac that surface issue who kick their dev forums until the root of issue will be established. It can be some line in code that their runtime can understand as already were before. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
And while we are on active page - there is another issue to think about - validness of validation process. When "tiebreaker" task sent to same anonymous platform app that 2 (or more) other tasks were it can't be named "validation". Preferably each pair should be sent to different plan class apps. And if third result required task should be sent preferably to stock CPU plan class host. With current resends scheduling we easely will have subsets of incorrect results that passed validator just because happened to land to host with same malfunction (that probability can be GREATLY reduced by choosing different plan class apps, especially for GPU plan classes and anonymous platform apps). News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
The last one is easy, the new OS doesn't work at all with the SoG App, I'd say that is good reason for an update. He could update the OS tonight, or sooner for all we know. I'm going to change the "number" to Most. From what I just saw, Most Macs in All OSes are having this "New" Error. So, we go from none of the Macs on Beta having this error for almost a Year, to, Most of the Macs on Main in all OS versions having this error. How likely is that? |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
And while we are on active page - there is another issue to think about - validness of validation process. I vote we do this for the Windows AstroPulse ATI machines. I just got robbed again by two of these notoriously Bad Windows AMD machines. I'm going to get robbed again very soon as once again two of these known bad machines are teaming up against my known Good Linux ATI machine. My ATI machine has a perfect record until it gets assaulted by two of these Windows machines, https://setiathome.berkeley.edu/workunit.php?wuid=2719529050 Can we please stop this robbery? |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
And while we are on active page - there is another issue to think about - validness of validation process. This should be done on scheduler server level, for all plan classes and all apps. Regarding Windows ATi AP - could you define subset that should be excluded? I know my own Ati Windows GPU handle AP just well, for example. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
The last one is easy, the new OS doesn't work at all with the SoG App, I'd say that is good reason for an update. Quite unlikely. Eric, could you check that CL file not omitted for example or named wrongly or smth alike. Silent death usually occurs when no correct CL file found at all. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Eric, could you check that CL file not omitted for example or named wrongly or smth alike.Raistmer, You can test that in seconds, by getting someone with a Mac to attach to both Main and Beta: compare the files you receive and the entries made (especially <app_version>) in client_state.xml @ Eric and Raistmer, Have a look at the main project thread Mac Client Bug which user 'Havoc' started this morning. It is abundantly clear that: a) 'CL file missing' fails as an explanation b) TBar is talking rubbish, as usual. The app is showing errors, but at a very low rate - 1% or 2% max. That can't be a deployment error. Havoc linked a workunit where two different Mac hosts both running the same app had both errored out with the same error: two Windows computers had completed and validated. All of this suggests to me that the application fails on certain types / classes / examples of workunits only: from the examples in the linked threads the vast majority of tasks run successfully. We have had this before: one of Raistmers apps passed testing here at Beta, but failed with excessive memory usage when exposed to a wider range of tasks on Main. I suspect something similar is happening now. If the app has been deployed for testing here at Beta for a year, then it probably also needs revising to take account of the substantial changes in Mac OS X last month. |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
Just a little update on a few Mac Apps. These Apps have more in common than the version numbers; (Main) 8.19r3551 (opencl_nvidia_mac) (Main) 8.20r3552 (opencl_ati5_mac) (Beta) 8.19r3553 (opencl_intel_gpu_sah) They are from the Exact same code, in fact, r3551 & 3553 are the Exact same Apps. The only difference is the Name. The only difference with the ATI App is it uses the HD5 tag instead of Intel, otherwise, it is the same. So, when one of these three is accused of suffering some Bug, it might help your case if the other two displayed the same symptoms. That is Not the case in this instance. The nVidia App has been running on Main for about a year, and along with it's Intel sister on Beta hasn't suffered any startup errors, or alleged particular WU problems. This is why Raistmer and myself deem it unlikely the ATI r3552 App has suddenly, after a year, developed some isolated problem with a certain WU. The Non-SoG ATI App is much closer to the Intel builds than the SoG build and the other two 355x Apps are still going strong. I believe if you check the results at main you will find the problem exists with a wide range of WUs, GPUs, and Operating systems. Machines running Mavericks are suddenly having this problem. Mavericks is from 2013, someone care to explain to me what recent changes Mavericks has suffered? No trash, just Fact. |
![]() Send message Joined: 10 Mar 12 Posts: 1700 Credit: 13,216,373 RAC: 0 ![]() |
No great mix of work here now. It's 100% *DIAG_KIC* |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Just a little update on a few Mac Apps. These Apps have more in common than the version numbers;Yes, I'd agree with that. r3551 was Raistmer's "OpenCL MB: - improvement of validation rate on overflows - debug output": r3552 and r3553 were Eric Korpela working on the Android apps, which use a completely different codebase - so his changes don't concern us here. But it goes on from there: Rev Age Author Log Message r3551  12 months raistmer OpemCL MB: - improvement of validation rate on overflows - debug output … r3555  12 months raistmer OpenCL MB: TwinChirp? path updates r3556  12 months raistmer OpenCL MB: - massive improvement in overflow validation rate - bugfix for … r3557  12 months raistmer OpenCL MB: - Twinchirp path separated to make compiler for FERMI-class … r3564  12 months raistmer opt MB: - FLOP counter incrementing disabled (obsolete) - progress for GPU … r3565  12 months raistmer Urs' changes: - fix : SmallInts? to double conversion problems in … r3566  12 months raistmer OpenCL MB: -WG size fix r3567  12 months raistmer OpenCL/CPU MB: - fix for PulseFind? in TwinChirp? path - WG size selection … r3568  12 months raistmer OpenCL MB: -CPUlock re-done - SoG and APU paths incompatibility check … r3570  11 months raistmer OpenCL MB: - compatibility fixes for Linux.modern compilers - … r3571  11 months raistmer Windows OpenCL MB: - new configs added r3574  11 months raistmer OpenCL MB TwinChirp?: -bugfixes r3578  11 months raistmer OpenCL SoG MB: - make sure that at least 100 times through task progress … r3581  11 months raistmer OpenCL MB: - subtle synchronization bug fixed in TwinChirp? path r3584  11 months raistmer OpenCL MB: -Urs' *nix changes: + version to 8.22 + several make file … r3585  11 months raistmer OpenCL MB: - iGPU event handling r3600  10 months raistmer MB: -added missing VS2010 solution file. r3602  10 months raistmer OpenCL: - making errors in profiling recoverable r3604  10 months raistmer MultiBeam?: - making sources compatible with new (C++11) compilers (thanks … r3605  10 months raistmer MultiBeam?: -enabling higher than SSE2 SIMD levels for MSVC-based x64 … r3623  9 months raistmer MultiBeam?: - making VS2010 config for x64 graphics binary workable r3631  9 months raistmer MultiBeam? for Linux on ARM hard float: - merging EABI params … r3632  9 months raistmer "MultiBeam?: - making VS2008 solution buildable for Win x64, both app and …" r3633  9 months raistmer MultiBeam?: - fix VFP ChirpData? for armhf - both NEON and VFP chirp enabled … r3643  8 months raistmer MultiBeam?: - Tom Rinehart's fix for fpu_ChirpData in ARM path r3658  7 months raistmer -added VS2015 config -make sources compatible with VC2015 r3674  6 months raistmer MultiBeam?: -added VS2017 config -preparations for PGOAnd that's just the code changes. As you know from your own experience with BOINC code, different operating systems require different compilers and different build settings. Let me give you an example of what can happen. Just before that sequence of changes, round about r3541, Raistmer and I were having a similarly inconclusive conversation about the poor validation rates for the intel_gpu app on Haswell chips under Windows. We weren't making any progress, so I went out and bought an HD530 (i5-6500) for testing. Raistmer and I spent a whole weekend running through the possibilities: every hour or two, he'd email me another build, I'd run it under the bench test suite, and I'd reply "No, still not accurate enough." The initial assumption was that the iGPU drivers were causing the problem? Nope. The Intel FFT implementation? Nope. Something in code? Couldn't find it. Finally - Eureka: the accuracy shot up and we were good to go. That turned out to be a compiler optimisation flag set too high - the code ran fast, but lost accuracy. With a lower setting, less speed, but better results. In the thread at Main, people are saying that the temporary exit event log message is Task postponed: Suspicious pulse results, host needs reboot or maintenanceI'm sorry, but I don't believe that every Mac attached to BOINC needs maintenance that often: it sounds more like an inaccuracy in the application triggering a sanity check more than necessary. I'd suggest that it's the app that needs maintenance to find and correct the cause of the inaccuracy, the same as Raistmer and I did last year for the iGPU. |
©2023 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.