Message boards :
News :
SETI@home v8 beta to begin on Tuesday
Message board moderation
Previous · 1 . . . 92 · 93 · 94 · 95 · 96 · 97 · 98 . . . 99 · Next
Author | Message |
---|---|
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
I'll repeat it in a little more detail. All three of those App were compiled from the Exact same r3551 folder. The only reason the version numbers are different is so they will produce a different Wisdom file on the same machine. Make clean was used between compiles with slight changes between compiles, they are basically the Same App, meaning, they all will have the exact same bugs, or not. I'm still not buying one App out of the three has suddenly developed a bug that didn't exist for the Year of the Apps existence. Those current Errors are occurring in Older OSes and GPUs that have Not Changed in the past year. That would mean it is the App that has changed somehow. I would suggest trying a fresh copy of the App and if that doesn't work go with r3610 which has been run on machines on Main without any trouble for the better part of a year.Just a little update on a few Mac Apps. These Apps have more in common than the version numbers;Yes, I'd agree with that. r3551 was Raistmer's "OpenCL MB: - improvement of validation rate on overflows - debug output": r3552 and r3553 were Eric Korpela working on the Android apps, which use a completely different codebase - so his changes don't concern us here.... |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Were the exact same three compilers, with the same three sets of configuration settings, used for all apps? I buy the rest of the explanation, but not that one. |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
Were the exact same three compilers, with the same three sets of configuration settings, used for all apps? I buy the rest of the explanation, but not that one.All three Apps were compiled during a 3 hour period using the same source folder and probably the same Terminal window. The only difference is one used -DUSE_OPENCL_HD5xxx while the other 2 used -DUSE_OPENCL_INTEL, and the version numbers were changed in the Makefile. Basically, if you're having problems with one, you should have the same problems with the other two. Check the creation times, they will read; MBv8_8.19r3552_ati5_ssse3_x86_64-apple-darwin 5,547,108 bytes (5.6 MB on disk) Saturday, November 5, 2016 at 1:41 AM MBv8_8.19r3551_NV_ssse3_x86_64-apple-darwin 5,543,040 bytes (5.5 MB on disk) Saturday, November 5, 2016 at 4:05 AM MBv8_8.19r3553_Intel_ssse3_x86_64-apple-darwin 5,543,040 bytes (5.5 MB on disk) Saturday, November 5, 2016 at 4:18 AM Busy night. |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Fair enough. But be aware that Raistmer makes extensive use of #if constructs in his code, so changing those compiler directive flags effectively means you're changing the contents of the code files - you're using different parts of them. For example, ... #elif USE_OPENCL_HD5xxx #if __APPLE__ strcpy(buildoptions,"-w -D__APPLE__ -cl-unsafe-math-optimizations -DUSE_OPENCL_HD5xxx"); #else strcpy(buildoptions,"-w -cl-unsafe-math-optimizations -DUSE_OPENCL_HD5xxx -fno-bin-amdil"); #endif ...- that's one line of code that will only be used with ATI HD5 on a Mac, with a different version for other platforms (it's from GPU_lock.cpp, line 702). There will be many other examples scattered throughout the code files. |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
It all becomes mute once you accept that just about All the Mac AMD machines on Main are having this error within seconds of launch at least every day or two, and that NONE of the machines at Beta had this error over almost a year. Have you accepted that most of the Macs on Main are having this error? Because once you accept that, it's impossible for a rational person to to think this error went unnoticed on Beta for almost a year. There are just too many machines having this error to think it escaped Beta if it was occurring on that many machines. It sounds as though you've accepted it back here, I'm sorry, but I don't believe that every Mac attached to BOINC needs maintenance that often..Can you rationally explain how such a widespread problem could escape Beta for almost a year? Oh, am I still talking 'rubbish' now that you've accepted that just about every AMD Mac on Main is having this Error? |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
And they are having it relatively rarely. As Tut pointed out again today, the Beta site commonly supplies a very limited range of work types. And as I've pointed out during this conversation, there is at least one previous example of an application having no problems at Beta, but developing problems at Main when they encountered workunit characteristics that weren't tested here. The usual next step is to download the datafile for one or more of the tasks which have failed in live running, and try them again offline 'under the microscope' as a bench test on the type of computer which is failing them. |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
It should be the shortest test in some time. The App launches and in about 7 seconds it either continues running or spits out an error and moves to the next task, not much to test there. Was this memory error you keep mentioning that easy to see, or did you have to look at some tool and see how much memory was being used? The one is pretty easy to see -> run for 7 seconds and error with an error result which on Beta stays for quite a while. From what I've seen the Macs on Main are having this error with all types of WUs, OSes, and GPUS above the 6 series. I haven't seen an AMD 5 series with this error...yet. Certainly there are machines on Beta that would have produced this error after a year of trying, if the error was present on Beta. The Error appeared on Main after just one day. |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
After another scroll down the Hosts list it seems most of the machines receiving -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS with v8.20r3552 (opencl_ati5_mac) App are running Arecibo tasks. There are a few running v8.20r3552 with BLC tasks receiving a different error, 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED. However, it seems the TIME_LIMIT_EXCEEDED error was also occurring with the older v8.00r3321 App, https://setiathome.berkeley.edu/results.php?hostid=8353212&state=6 where the TOO_MANY_EXITS error is new to Main. I did find a couple machines running the nVidia v8.19r3551 receiving the -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS error, but they were running Non-Apple GPUs on a Mac, or, were running a Hackintosh with Non-Apple GPUs. I didn't find any Apple nVidia GPUs receiving the TOO_MANY_EXITS error. So, it appears r3551 may be a little touchy with the TOO_MANY_EXITS error even though there aren't any of those errors on Beta. |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
So, it appears r3551 may be a little touchy with the TOO_MANY_EXITS error even though there aren't any of those errors on Beta.Yes, I would agree with that description. I'd also agree that a loss of 7 seconds elapsed time doesn't look too bad on the face of it, but I'm a little suspicious about BOINC's time recording at those very small levels - I'm not sure it fully accounts for the setup time before crunching really gets under way. Also, if an app exits before the first checkpoint, the elapsed time is rewound to zero for the restart. We might be looking at 700 seconds (for the 100 attempts) - we'd probably have to pore through an event log to check that. The memory problem caused quite a stir on the Main forum when it happened, but it was some time ago: IIRC, it was after one of the big upgrades, possibly SAH v6 to v7. It'll take some finding, but I'll have a look. |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
The best fix for the AMD GPUs is running on Beta right now, and appears to be doing well, http://setiweb.ssl.berkeley.edu/beta/setiathome_v8_x86_64-apple-darwin__opencl_ati5_mac.html The question is how long will it take to move it to Main. The other OpenCL r3551 App is doing well with the Apple nVidia GPUs, but giving errors with some of the Non-Apple supplied nVidia GPUs. I use the CUDA App, which doesn't have any trouble with the Non-Apple nVidia GPUs...providing you use the correct CUDA driver. One CUDA App that is giving problems is the 8.11 (cuda75_mac) App. As with the Linux Apps, the older nVidia GPUs don't work well with anything above CUDA 6.0, and most of the Older GPUs are not working with the 8.11 (cuda75_mac) Baseline App as seen here, https://setiathome.berkeley.edu/results.php?hostid=7823473&state=5. For some reason the Server isn't sending that Host the CUDA 4.2 App which would probably work just fine with the older GPU. Since the Baseline CUDA 75 Mac App is just slightly faster than the CUDA 42 App, and doesn't work on the older Apple supplied nVidia GPUs, it should probably be removed. Any GPU that can work with the Baseline CUDA 75 App can also work with the CUDA 42 App, and probably work successfully. I would recommend the 8.11 (cuda75_mac) Baseline App be removed from Main and Beta and the 8.11 (cuda42_mac) plan class allow the App to work on all nVidia GPUs. That would probably eliminate a few Inconclusive/Invalid results being produced by the Older GPUs running the 8.11 (cuda75_mac) Baseline App. The newer Non-Apple nVidia GPUs with CC=5.0 and above should be using the CUDA 75 Special App anyway, it's much better than the Baseline App and seems to be working well; https://setiathome.berkeley.edu/results.php?hostid=7942417&offset=320 https://setiathome.berkeley.edu/results.php?hostid=4726043&offset=120 |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Those seconds will be weeks in real life it seems. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Single memory issue I can recall is overflow with AstroPulse. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
The best fix for the AMD GPUs is running on Beta right now, and appears to be doing well, http://setiweb.ssl.berkeley.edu/beta/setiathome_v8_x86_64-apple-darwin__opencl_ati5_mac.html Who can give warranty that same issue will not appear after deployment on main? Cause prev deployment was under same conditions - "stable" binary from beta... I'll not recommend any promote until the reason of currrent failure will be found. With binary we already deployed. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Fair enough. But be aware that Raistmer makes extensive use of #if constructs in his code, so changing those compiler directive flags effectively means you're changing the contents of the code files - you're using different parts of them. For example, Definitely. Different set of defines DOES constitute different build path. FOR ANY APP, that's just what define is (while we not speak about parameter defines that ill-substitute of constant). So calling those binaries "the same" is pointless and just distracts from bug-hunting. Even if binaries were really SAME, arguing to NV and AMD _binaries_ similarity STILL pointless, cause GPU code deployed as high-level language source file (CL file) and COMPILED DIFFERENTLY not only by each vendor runtime but even by different driver versions of the same runtime, we passed through that many times already. So whole that "similar" stuff is irrelevant. One need to find similarities and differencies with host setups and or WU properties of failure and good tasks instead. Or, better, to provide guinea pig host. P.S. And different behavior on beta and main is bad surprise that (if we really can exclude bad deployment) means we doesn't have adequate testing platform. Most probably, because of too homogenious _AND_ different from main tasks we get on beta. That's what project CAN fix. Another possibility (that project could fix only relatively) is just lack of diversity in testing hardware/software base. Mostly that could be fixed not by project but by Mac proprietary hardware fans community here. (With situation regarding new iPhone madness around the world I think that community really could have better point of interest :P ) News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
the 8.11 (cuda42_mac) plan class allow the App to work on all nVidia GPUs. What limitations it has currently? Host can receive CUDA75 instead of CUDA42 not because ithas limitations in plan class, but just because CUDA75 happened to built bigger APR on that particular host. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Any GPU that can work with the Baseline CUDA 75 App can also work with the CUDA 42 App, and probably work successfully. That "probably" can eliminate whole set of newer GPUs out of game in case that probability was wrongly estimated. Better to check that for sure, at least on few case studies. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
That is, to have that guinea pig + capable operator on its behalf. Any volunteers? News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
It all becomes mute once you accept that just about All the Mac AMD machines on Main are having this error within seconds of launch at least every day or two, and that NONE of the machines at Beta had this error over almost a year. Have you accepted that most of the Macs on Main are having this error? Because once you accept that, it's impossible for a rational person to to think this error went unnoticed on Beta for almost a year. There are just too many machines having this error to think it escaped Beta if it was occurring on that many machines. That (if really situation described correctly) means on main app encountered new type of WU. That's definitely means shame on beta work supply cause it doesn't do its work - to TEST app, not just to WASTE energy on meaningless similar tasks processing over and over. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
I have no idea what limitations are in place, but, something is causing the Server to not send the CUDA 42 App to that host. You can see here that host has never been sent the CUDA 42 App which on My machine works with every CUDA card from the 8800GT to the GTX 1060.the 8.11 (cuda42_mac) plan class allow the App to work on all nVidia GPUs. https://setiathome.berkeley.edu/host_app_versions.php?hostid=7823473 SETI@home v8 8.11 x86_64-apple-darwin (cuda42_mac) Number of tasks completed : 0 Max tasks per day : 1 Number of tasks today : 0 Consecutive valid tasks : 0 Average turnaround time : 0.00 days As noted previously, the Older GPUs have trouble with any CUDA above 6.0, I've seen a number of older GPUs have trouble with the CUDA 75 Baseline App which was built Before the BLC tasks were released. At this point the Baseline CUDA 75 App is more trouble than it's worth. |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
The CUDA 42 App works with every GPU from the 8800GT to the GTX 1060 on My Machine. From experience we know the CUDA 75 Baseline App only works with Maxwell and above GPUs which means it will Not work correctly with any of the Current Apple supplied nVidia cards. Anyone with a Maxwell or above card in a Mac shouldn't be using the Baseline CUDA 75 anyway as the Special App works on those GPUs making the Baseline CUDA 75 App Worthless. Why on Earth would you use the Baseline 75 App when you can use the Much faster Special 75 App with your Maxwell cards? I dunno, I suppose it might make sense to someone.Any GPU that can work with the Baseline CUDA 75 App can also work with the CUDA 42 App, and probably work successfully. The following was run in Darwin 16.7 on a GTX 1060. The MBv8_8.19r3551_NV Apps uses the Intel Path, it's the same App as on Main. The MBv8_8.22r3608_NV App uses the Use_NV path and never made it off my machine for obvious reasons. The x41p_zi3xs3 App is the current build for Sierra & High Sierra, the older OSes uses the CUDA 75 Special App. The Baseline CUDA 75 App works on my Pascal GPU, but so does the Baseline CUDA 42 App, and the CUDA 42 App also works with most of the current Apple NVidia GPUs whereas the Baseline CUDA 75 App Doesn't. Listing wu-file(s) in /testWUs : 01ap07ad.15720.22976.8.35.178.wu 18dc09ah.26284.16432.6.33.125.wu 31oc16ab.6546.18275.10.37.120.wu Listing executable(s) in /APPS : MBv8_8.19r3551_NV_ssse3_x86_64-apple-darwin MBv8_8.22r3608_NV_ssse3_x86_64-apple-darwin setiathome_x41p_zi3xs3_x86_64-apple-darwin_cuda90 setiathome_x41zi_x86_64-apple-darwin_cuda42 setiathome_x41zi_x86_64-apple-darwin_cuda75 Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: 01ap07ad.15720.22976.8.35.178.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 8206 seconds --------------------------------------------------- Running app with command : MBv8_8.19r3551_NV_ssse3_x86_64-apple-darwin -sbs 256 -period_iterations_num 10 -device 1 517.78 real 109.61 user 219.57 sys Elapsed Time : ……………………………… 517 seconds Speed compared to default : 1587 % ----------------- Comparing results Result : Strongly similar, Q= 97.79% --------------------------------------------------- Running app with command : MBv8_8.22r3608_NV_ssse3_x86_64-apple-darwin -sbs 256 -period_iterations_num 10 -device 1 895.78 real 135.22 user 394.09 sys Elapsed Time : ……………………………… 895 seconds Speed compared to default : 916 % ----------------- Comparing results Result : Strongly similar, Q= 97.79% --------------------------------------------------- Running app with command : setiathome_x41p_zi3xs3_x86_64-apple-darwin_cuda90 -device 1 265.81 real 39.81 user 29.46 sys Elapsed Time : ……………………………… 266 seconds Speed compared to default : 3084 % ----------------- Comparing results Result : Strongly similar, Q= 99.82% --------------------------------------------------- --------------------------------------------------- Done with 01ap07ad.15720.22976.8.35.178.wu. Current WU: 18dc09ah.26284.16432.6.33.125.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 3517 seconds --------------------------------------------------- Running app with command : MBv8_8.19r3551_NV_ssse3_x86_64-apple-darwin -sbs 256 -period_iterations_num 10 -device 1 275.40 real 86.21 user 78.28 sys Elapsed Time : ……………………………… 276 seconds Speed compared to default : 1274 % ----------------- Comparing results Result : Strongly similar, Q= 99.48% --------------------------------------------------- Running app with command : MBv8_8.22r3608_NV_ssse3_x86_64-apple-darwin -sbs 256 -period_iterations_num 10 -device 1 693.74 real 258.24 user 180.21 sys Elapsed Time : ……………………………… 694 seconds Speed compared to default : 506 % ----------------- Comparing results Result : Strongly similar, Q= 99.49% --------------------------------------------------- Running app with command : setiathome_x41p_zi3xs3_x86_64-apple-darwin_cuda90 -device 1 107.29 real 16.78 user 11.98 sys Elapsed Time : ……………………………… 107 seconds Speed compared to default : 3286 % ----------------- Comparing results Result : Strongly similar, Q= 99.70% --------------------------------------------------- Running app with command : setiathome_x41zi_x86_64-apple-darwin_cuda42 286.01 real 126.02 user 91.49 sys Elapsed Time : ……………………………… 286 seconds Speed compared to default : 1229 % ----------------- Comparing results Result : Strongly similar, Q= 99.69% --------------------------------------------------- Running app with command : setiathome_x41zi_x86_64-apple-darwin_cuda75 273.28 real 96.73 user 88.46 sys Elapsed Time : ……………………………… 273 seconds Speed compared to default : 1288 % ----------------- Comparing results Result : Strongly similar, Q= 99.70% --------------------------------------------------- --------------------------------------------------- Done with 18dc09ah.26284.16432.6.33.125.wu. Current WU: 31oc16ab.6546.18275.10.37.120.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 6567 seconds --------------------------------------------------- Running app with command : MBv8_8.19r3551_NV_ssse3_x86_64-apple-darwin -sbs 256 -period_iterations_num 10 -device 1 376.29 real 65.04 user 134.45 sys Elapsed Time : ……………………………… 376 seconds Speed compared to default : 1746 % ----------------- Comparing results Result : Strongly similar, Q= 97.90% --------------------------------------------------- Running app with command : MBv8_8.22r3608_NV_ssse3_x86_64-apple-darwin -sbs 256 -period_iterations_num 10 -device 1 658.38 real 78.39 user 270.34 sys Elapsed Time : ……………………………… 659 seconds Speed compared to default : 996 % ----------------- Comparing results Result : Strongly similar, Q= 97.90% --------------------------------------------------- Running app with command : setiathome_x41p_zi3xs3_x86_64-apple-darwin_cuda90 -device 1 165.38 real 26.36 user 19.43 sys Elapsed Time : ……………………………… 165 seconds Speed compared to default : 3980 % ----------------- Comparing results Result : Strongly similar, Q= 99.95% --------------------------------------------------- Running app with command : setiathome_x41zi_x86_64-apple-darwin_cuda42 351.65 real 139.76 user 99.03 sys Elapsed Time : ……………………………… 352 seconds Speed compared to default : 1865 % ----------------- Comparing results Result : Strongly similar, Q= 99.97% --------------------------------------------------- Running app with command : setiathome_x41zi_x86_64-apple-darwin_cuda75 338.80 real 109.87 user 95.20 sys Elapsed Time : ……………………………… 339 seconds Speed compared to default : 1937 % ----------------- Comparing results Result : Strongly similar, Q= 99.97% --------------------------------------------------- Done with 31oc16ab.6546.18275.10.37.120.wu The Short version, if you have a Maxwell or Pascal GPU in your Classic Mac Pro use the CUDA Special App, Not the BASELINE CUDA Apps. The only use for the Baseline CUDA Apps is to work on the Older GPUs that can't run the Special App. The CUDA 42 App works on the Older GPUs, where the Baseline CUDA 75 Doesn't work on the Oder GPUs, https://setiathome.berkeley.edu/results.php?hostid=7823473&state=5 |
©2023 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.