Deprecated: Function get_magic_quotes_gpc() is deprecated in /disks/centurion/b/carolyn/b/home/boincadm/projects/beta/html/inc/util.inc on line 663
Large difference in SoG speed in Mac and Windows?

Large difference in SoG speed in Mac and Windows?

Message boards : SETI@home Enhanced : Large difference in SoG speed in Mac and Windows?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 58114 - Posted: 2 May 2016, 19:19:45 UTC

I finally put SoG on Windows 10 computer with an ATI 7990. It is a card that has 2 7970's on it running at 1000MHz. I had been fairly pleased with my Mac Pro's SoG run times (was getting around 53,000 RAC before the Greenback wu's nuked everyones RAC) which has D700 cards, They are the same as the 7990 except they run at 850MHz, so depending on boost speeds and all of that around 15-17% slower. However run time are over 40+% slower on the Mac, despite a stronger CPU/Memory subsystem.

I made the settings the same on the two systems and it didn't bring them really any closer at all. Could this difference be due to the lack of CPU_Lock on the Mac? Within the Mac's console logs I also see system diagnostic reports over "wakeups", on the order of 1200 per second for around 30 seconds. These are happening every 5 minutes or so according to the logs.

The Mac Pro was running 3 at a time until around 17:00 UTC, at which point I switched over to 2 at a time to match the Windows settings.

We've discussed the Mac Pro a couple of times before because it has a high number of inconclusives, typically because it has missed a gaussian. Unfortunately there aren't many other Mac running the SoG so I don't have any good comparison. There is another Mac on beta running a couple of SoG wu's a day and one of its inconclusives does miss a gaussian too. (https://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=8476208

That said, all the other Mac Pro's on beta are only running a few a day, vs the 400-500 a day I'm pushing through, so its statistically to compare the occurrences on the other machines. I'll be getting another Mac Pro with D500 cards in it next week so I'll have a better collection of data once I have it up and running. That wasn't the point of this message, but I did want to make sure it was known we were talking bout the same machine that has even discussed before.

The two systems are here:

Windows

http://setiathome.berkeley.edu/results.php?hostid=7330085

Mac Pro

http://setiathome.berkeley.edu/results.php?hostid=6105482

Just curious if anyone had any ideas why there would be this much difference between the two systems running basically the same cards, albeit at slightly different clock speeds.

Thanks,

Chris
ID: 58114 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 58117 - Posted: 2 May 2016, 20:35:25 UTC - in response to Message 58114.  

What about speed of other, non-SoG, ATi apps on that Mac?
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 58117 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 58118 - Posted: 2 May 2016, 20:54:44 UTC - in response to Message 58117.  

I haven't tried them on Main to get a good variety of angle ranges, but on beta the non-SOG apps have been slower than the SoG apps by 10-15%, so I expect moving to the non-SoG version will just be slower still, but I'd need to verify that more rigorously before I say that definatively. The difference in AstroPulse run times between the computers correlates almost exactly to the difference in GPU clock speed.

I'll see about running the non-SoG version of 8.10 for a bit this evening to see how it does.

Chris
ID: 58118 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 58126 - Posted: 3 May 2016, 13:37:44 UTC

Chris,

Have you thought about using different optimization settings on OS X than on windows.
Apple might have altered the GPU driver in a way that they think works "best".
_\|/_
U r s
ID: 58126 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 58134 - Posted: 4 May 2016, 12:13:58 UTC - in response to Message 58126.  

I've played around a lot with the different settings on the Mac, much more so than the PC. That said, I won't claim that I have found the very best settings on the Mac. I usually run the Mac with higher SBS for example. But frankly, aside from setting the oclfft local radix size to 16 and dropping the Pulsefind iterations down, nothing produces a very noticeable change in performance.

Another quirk I still see, on Windows, then boinc reports 100% complete the run time stops. On the Mac, when it reaches 100% the wu will still run for another 20-30 seconds even though GPU and CPU utilization on that wu has stopped. But that's only 20-30 seconds and I'm seeing around 300 seconds overall difference between the two machines running 2 at a time.

I've been pretty swamped the last few days but I'll trying playing around with the setting some more. I'll have my new Mac Pro next week too, so I'll be able to see if the errors inconclusives and general slowness persists as well.

Thanks,

Chris
ID: 58134 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 58140 - Posted: 4 May 2016, 20:24:36 UTC - in response to Message 58134.  

As for the Idle Wake Ups (2000+ per second) this same SoG app does not cause this on my laptop that has an AMD 6750M. Maybe its because it is much slower, not sure. Not even sure that is the related to the slow processing by the D700's. I will say looking at all the D700's on main that they all seem to be running stock slower than expected. They are barely faster than the D500's running stock and in most cases no faster at all. Considering the D500's have only 24 cu's compared to the 32 in the D700, I would expect to see a little bit better numbers. Of course I have no idea what those folks are doing with their computers so until I have mine with a D500 next week I can't really do a fair comparison.

Any suggestions on which variables to change? Astropulse seems to respond well to changing the oclfft plan class and my MB times go down with it as well. I've tried a couple of options for Tune but there is a insignificant difference.

SpikeFind threshold doesn't seem to matter a whole lot whether it is there or not. I'm not real sure what the oclfft max local fft size does and which memory on the card its trying to fit into best, i.e. register memory, local memory, global, etc. I did change the number of local memory banks to 32 from the recommended 64 because the Tahiti based chips only have 32 per AMD documentation. I can't say there was any noticeable difference in changing that though.

Lastly, not sure what if anything the coalesced widths does for me. the AMD chips don't support coalesced reads I think, might be writes, don't remember off the top of my head which their documentation said.

Anyway, I'll fiddle around with some of the settings this evening and see if there is any noticeable difference.

Thanks,

Chris
ID: 58140 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 58141 - Posted: 4 May 2016, 23:47:56 UTC - in response to Message 58140.  
Last modified: 5 May 2016, 0:12:44 UTC

Some quick testing of different versions.

The following settings were used and are the only ones that seem to make a difference in run times on the Mac:

-tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -period_iterations_num 5

Mac

AR .39 1 wu, 3430 SoG: 549s
AR .39 1 wu, 3430 non-SoG: 653s
AR .39 1 wu, 3347 Tbar build: 590s

AR .39 2 wu, 3430 SoG: 975s
AR .39 2 wu, 3430 non-SoG: 1055s

Windows

AR .38 2 wu, 3430 SoG: 675s

It should also be noted, only build 3430, both SoG and non-SoG cause the idle wake ups to be in the thousands per second. Tbar's build (the only "stock" type build I readily had available, only causes minimal wake ups, 200-300 per second. Again I have no idea if that has anything to do with the slow run times or not.

Thanks,

Chris
ID: 58141 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 58143 - Posted: 5 May 2016, 2:25:12 UTC - in response to Message 58141.  

Do you run with BOINCs default priority settings or have you set <no_priority_change>1<no_priority_change> in "cc_config.xml" ?
_\|/_
U r s
ID: 58143 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 58144 - Posted: 5 May 2016, 2:34:37 UTC - in response to Message 58143.  

I've never fiddled with that, so it's probably default. I'll make that change and take a look.

Thanks,

Chris
ID: 58144 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 5,019,318
RAC: 0
United States
Message 58156 - Posted: 5 May 2016, 11:18:44 UTC
Last modified: 5 May 2016, 11:23:49 UTC

I was compiling a couple cuda apps in Mountain Lion and decided to see how the SoG compile would go. The App seems to work about the same as my older non-SoG App with the exception of the progress switching every few seconds. This can be annoying if you have the tasks sorted by progress, is this normal? It can switch from just a few percent to more than double, triple, or higher right up until it's finished. The amount of the switch seems to vary as to the actual percentage completed. I'm also seeing the console message;
5/5/16 6:57:55.000 AM kernel[0]: process MBv8_8.08r3452_a[6300] caught causing excessive wakeups. Observed wakeups rate (per sec): 570; Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 45053

All of my machines have been using the <no_priority_change> option for years.
If anyone wants to test this app they are more than welcome. I'm going to put the nVidia cards back in the machine.
ID: 58156 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 58157 - Posted: 5 May 2016, 12:01:15 UTC - in response to Message 58156.  

Yup, the progress bounces around on mine as well. I think there has been some discussion on here about it on Windows at least and on there is manifests itself as an irratic jump in percent complete vs 10% to 45% and back to 10% over and over. I think it basically was because the GPU is not reporting anything back to the CPU on a regular basis like it was before.

For what it's worth changing the priority setting didn't change my run times in a preciptible way.

Those are the same console messages I'm getting as well, except they are on the order of 2000-3000 per second. I've seen it as high as 17,000 as well.

Chris
ID: 58157 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 5,019,318
RAC: 0
United States
Message 58159 - Posted: 5 May 2016, 13:43:20 UTC - in response to Message 58157.  

Hmmm, I thought the progress 'anomaly' had been solved. I guess I haven't been paying attention. Hey, those last couple of tasks with the sbs set at 256 look interesting, normal times on the 6850 were around 23-4 minutes for that AR. Just over 20 minutes could hint at an improvement, http://setiathome.berkeley.edu/result.php?resultid=4910016299. Too bad I don't have a second Mac I could place those old ATI cards in, I have a few laying around.
ID: 58159 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 58160 - Posted: 5 May 2016, 14:43:37 UTC - in response to Message 58159.  
Last modified: 5 May 2016, 15:40:57 UTC

My D700 actually seems to like an SBS value of 1536, luckily it each card has 6GB so running 3 at a time is doable. My computer with D500's shows up at work Tuesday, so I'll be interested to see how it fares compared to my D700, it should be pretty comparable to your Cayman card, same CU's but clocked a hair less. That said at this point I'm hoping there is something wrong with my D700's as it will be easier to have them pulled and replaced than it will be to track down whatever the issue is that is slowing their processing down so much. That said they work perfectly with Astropulse, actually a little faster as there is only a 15% difference between the two machines.

Did you put your app on CA?

I thought the progress problem was resolved too but then I figured it was just resolved on the Windows side. May be an "anomaly" in my thinking however.=)

You can pick up a Mac Pro 4,1 for pretty cheap these days, some of the 5,1's aren't bad either.

Thanks,

Chris
ID: 58160 · Report as offensive
Zalster
Volunteer tester

Send message
Joined: 30 Dec 13
Posts: 258
Credit: 12,340,341
RAC: 0
United States
Message 58162 - Posted: 5 May 2016, 16:30:12 UTC - in response to Message 58157.  
Last modified: 5 May 2016, 16:31:32 UTC

Yup, the progress bounces around on mine as well. I think there has been some discussion on here about it on Windows at least and on there is manifests itself as an irratic jump in percent complete vs 10% to 45% and back to 10% over and over.


That was something I brought up with Raistmer when the SoG first came out.

Since the majority of the work was being done on the GPU, it wasn't moving any of the data or compete checks back to the CPU. ( if I am getting the terminology right)

It was waiting until near completion, then it would move those results. Resulting in CPU usage going from 40% to 100%

What it was doing, running at a set rate until 70% complete then it crawled until 100% completion. The time it took for 2/3 of the work was equal to the time for the last 1/3. He eventually fixed that.

The other issue had to do with number work units exceeding the number of cores.

They resolved that as well, fixing something with the cpu_lock and commands in 2 separate areas, one in the command line txt and and in the app_confix.xml Both were needed to specify the total number of work units per cards and whole number in total. Before that, what was happening was work progressed to different points, then stared over again..example it would say 40 or 60% complete then drop down again to 0 or 10% Eventually they said it was an issue with the work not being tied a physical core. That is what was corrected in the later versions of SoG.

Don't know if either of these are what you are seeing.
ID: 58162 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 58164 - Posted: 5 May 2016, 21:24:37 UTC
Last modified: 5 May 2016, 21:25:28 UTC

What is that "wake up" thing about?
Also, for listed completion times I don't see difference between SOG vs non-SoG. But SoG times too fluctuating.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 58164 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 5,019,318
RAC: 0
United States
Message 58181 - Posted: 6 May 2016, 10:23:52 UTC - in response to Message 58164.  

From what I can tell, excessive wake ups are an indication that the App is not responsive. The higher the number, the less responsive the App. Usually, excessive wakes happen prior to a hang. You might want to research it yourself, but, that's what it appears to me.

Did you put your app on CA?

I was waiting to see if the tasks validated, it appears most have. I'm also a little reluctant to post something that has the progress skipping around so much.
Maybe a different route than posting it.
ID: 58181 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 58201 - Posted: 7 May 2016, 14:44:22 UTC - in response to Message 58162.  

Yup, the progress bounces around on mine as well. I think there has been some discussion on here about it on Windows at least and on there is manifests itself as an irratic jump in percent complete vs 10% to 45% and back to 10% over and over.


That was something I brought up with Raistmer when the SoG first came out.

Since the majority of the work was being done on the GPU, it wasn't moving any of the data or compete checks back to the CPU. ( if I am getting the terminology right)

It was waiting until near completion, then it would move those results. Resulting in CPU usage going from 40% to 100%

What it was doing, running at a set rate until 70% complete then it crawled until 100% completion. The time it took for 2/3 of the work was equal to the time for the last 1/3. He eventually fixed that.

The other issue had to do with number work units exceeding the number of cores.

They resolved that as well, fixing something with the cpu_lock and commands in 2 separate areas, one in the command line txt and and in the app_confix.xml Both were needed to specify the total number of work units per cards and whole number in total. Before that, what was happening was work progressed to different points, then stared over again..example it would say 40 or 60% complete then drop down again to 0 or 10% Eventually they said it was an issue with the work not being tied a physical core. That is what was corrected in the later versions of SoG.

Don't know if either of these are what you are seeing.


May well be what's happening since I don't believe there is a -cpu_lock implemented on the Mac side of things. Anyone have any idea how to view that possibility on the Mac? I don't see anything in activity monitor that shows which CPU a thread is running on.

Thanks,

Chris
ID: 58201 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 5,019,318
RAC: 0
United States
Message 58206 - Posted: 7 May 2016, 18:47:07 UTC - in response to Message 58160.  

My D700 actually seems to like an SBS value of 1536, luckily it each card has 6GB so running 3 at a time is doable. My computer with D500's shows up at work Tuesday, so I'll be interested to see how it fares compared to my D700, it should be pretty comparable to your Cayman card, same CU's but clocked a hair less. That said at this point I'm hoping there is something wrong with my D700's as it will be easier to have them pulled and replaced than it will be to track down whatever the issue is that is slowing their processing down so much. That said they work perfectly with Astropulse, actually a little faster as there is only a 15% difference between the two machines...

Is that a Beta OS? The highest I can find at Apple is 10.11.4. Seems strange other machines don't have the Gaussian problem. Do you have an external drive or some other method to possibly run Darwin 15.4?
ID: 58206 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 58208 - Posted: 7 May 2016, 22:20:15 UTC - in response to Message 58206.  

Yes it is, however it was doing it on 15.4 as well. I keep updating hoping that there is a fix in their latest driver, but no luck so far. Is there a verbose setting that would show what the app is doing/finding during Gaussian search to try and figure out if it is even running at all? Are there any settings that effect the sensitivity/optimizations for the Gaussian searches? It's bizarre that neither card never seems to find any...

Thanks,

Chris
ID: 58208 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 58209 - Posted: 7 May 2016, 22:31:37 UTC - in response to Message 58208.  

Yes it is, however it was doing it on 15.4 as well. I keep updating hoping that there is a fix in their latest driver, but no luck so far. Is there a verbose setting that would show what the app is doing/finding during Gaussian search to try and figure out if it is even running at all? Are there any settings that effect the sensitivity/optimizations for the Gaussian searches? It's bizarre that neither card never seems to find any...

Thanks,

Chris

compare counters values with your wingman and report if they differ in PoT transferred to CPU part.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 58209 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : SETI@home Enhanced : Large difference in SoG speed in Mac and Windows?


 
©2023 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.