Deprecated: Function get_magic_quotes_gpc() is deprecated in /disks/centurion/b/carolyn/b/home/boincadm/projects/beta/html/inc/util.inc on line 663
SETI@home v7 6.98 for ATI OpenCL released.

SETI@home v7 6.98 for ATI OpenCL released.

Message boards : News : SETI@home v7 6.98 for ATI OpenCL released.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 16 Jun 05
Posts: 2530
Credit: 1,074,556
RAC: 0
Germany
Message 44198 - Posted: 19 Oct 2012, 22:16:40 UTC - in response to Message 44196.  
Last modified: 19 Oct 2012, 22:44:24 UTC

Sounds like the description of my card. I have now 5 valid results.


Whats weird on your results you got small credits a few times.
Evenso you should add -sbs 256 to MB_cmdline.txt to speed up a little bit.
With each crime and every kindness we birth our future.
ID: 44198 · Report as offensive
Christoph
Volunteer tester

Send message
Joined: 16 Oct 09
Posts: 58
Credit: 662,990
RAC: 0
Germany
Message 44200 - Posted: 19 Oct 2012, 22:51:07 UTC - in response to Message 44198.  
Last modified: 19 Oct 2012, 22:58:32 UTC

Done. Thanks for the advice.

Of course I forgot to check memory consumtion before the change. Now it is 384 mb static and 60 mb dynamic. Would be no problem if it would go higher.
ID: 44200 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 16 Jun 05
Posts: 2530
Credit: 1,074,556
RAC: 0
Germany
Message 44202 - Posted: 19 Oct 2012, 23:16:30 UTC - in response to Message 44200.  
Last modified: 19 Oct 2012, 23:17:31 UTC

Done. Thanks for the advice.

Of course I forgot to check memory consumtion before the change. Now it is 384 mb static and 60 mb dynamic. Would be no problem if it would go higher.



Currently allocated 145 MB for GPU buffers

With each crime and every kindness we birth our future.
ID: 44202 · Report as offensive
Ron Gottheiner
Volunteer tester

Send message
Joined: 18 Jun 12
Posts: 1
Credit: 70,213
RAC: 0
United States
Message 44219 - Posted: 21 Oct 2012, 1:11:47 UTC

I continually get an error message saying "can't resolve hostname"??? I am new at this so I'm sorry but I also don't know what other info. you need to go along with this.

Thanks,

Ron
ID: 44219 · Report as offensive
EoD
Volunteer tester
Avatar

Send message
Joined: 21 Oct 12
Posts: 4
Credit: 4,548
RAC: 0
Germany
Message 44226 - Posted: 21 Oct 2012, 23:00:58 UTC
Last modified: 21 Oct 2012, 23:02:22 UTC

So far I got only one validated workunit, but the card completed quite a bunch already without any problems from a user's perspective.

The "problem" I noticed it that the GPU load is quite stationary at about 75% +/- 5% percent and never reached 90% or more. I checked the GPU load via GPU Shark.
ID: 44226 · Report as offensive
TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 28 Jan 11
Posts: 619
Credit: 2,580,051
RAC: 0
Sweden
Message 44227 - Posted: 22 Oct 2012, 4:41:09 UTC

App is running very nice on my 5850.

But I noticed there are no .VLAR tasks yet.

http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=55958
ID: 44227 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 16 Jun 05
Posts: 2530
Credit: 1,074,556
RAC: 0
Germany
Message 44231 - Posted: 22 Oct 2012, 14:00:21 UTC - in response to Message 44227.  

App is running very nice on my 5850.

But I noticed there are no .VLAR tasks yet.

http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=55958


No VLARs beeing send to GPUs anymore.
Its a result for the NVidia fix at main i guess.

With each crime and every kindness we birth our future.
ID: 44231 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44234 - Posted: 22 Oct 2012, 20:44:51 UTC

Eric wanted to fix this some time ago.
Don't know if he succeeded.
ID: 44234 · Report as offensive
EoD
Volunteer tester
Avatar

Send message
Joined: 21 Oct 12
Posts: 4
Credit: 4,548
RAC: 0
Germany
Message 44246 - Posted: 24 Oct 2012, 23:32:09 UTC - in response to Message 44226.  
Last modified: 24 Oct 2012, 23:52:21 UTC

The "problem" I noticed it that the GPU load is quite stationary at about 75% +/- 5% percent and never reached 90% or more. I checked the GPU load via GPU Shark.


I got the GPU usage of my HD6870 up to 75%-80% (and rarely below) with
-sbs 512  -period_iterations_num 5 -unroll 7


This reduced the time the task from 38minutes to 26minutes: Task 11361409

Any suggestions how to get above 90%?
ID: 44246 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 44249 - Posted: 25 Oct 2012, 7:25:45 UTC - in response to Message 44246.  

Any suggestions how to get above 90%?


Try the -hp High Priority cmdline,

Claggy
ID: 44249 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44254 - Posted: 25 Oct 2012, 9:49:46 UTC - in response to Message 44246.  
Last modified: 25 Oct 2012, 9:52:10 UTC

The "problem" I noticed it that the GPU load is quite stationary at about 75% +/- 5% percent and never reached 90% or more. I checked the GPU load via GPU Shark.


I got the GPU usage of my HD6870 up to 75%-80% (and rarely below) with
-sbs 512  -period_iterations_num 5 -unroll 7


This reduced the time the task from 38minutes to 26minutes: Task 11361409

Any suggestions how to get above 90%?

Cause you going into tuning area and seems to care about what your host doing you could consider to switch to anonymous platform that provides more opportunities for tweaking and tuning. For example, to run 2 tasks simultaneously.
Disadvantage: you will need to care about your host further and do needed upgrades manually (not a "set and forget" approach).
Beta mostly about correctness and compatibility, but if you proved already that app works stable and correctly in default config you can do further optimizations. If not I would recommend to stay with default config for now.

EDIT: btw, -unroll param doesn't defined for MB7 app and ignored. Remove it, it's for AP6 app.
ID: 44254 · Report as offensive
TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 28 Jan 11
Posts: 619
Credit: 2,580,051
RAC: 0
Sweden
Message 44256 - Posted: 25 Oct 2012, 14:59:32 UTC - in response to Message 44246.  
Last modified: 25 Oct 2012, 14:59:53 UTC

The "problem" I noticed it that the GPU load is quite stationary at about 75% +/- 5% percent and never reached 90% or more. I checked the GPU load via GPU Shark.


I got the GPU usage of my HD6870 up to 75%-80% (and rarely below) with
-sbs 512  -period_iterations_num 5 -unroll 7


This reduced the time the task from 38minutes to 26minutes: Task 11361409

Any suggestions how to get above 90%?



If I look at your task: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=11361409

I see:
Used GPU device parameters are:
Number of compute units: 14
Single buffer allocation size: 64MB
max WG size: 256

Switch -sbs 512 and Single buffer allocation size: 64MB is the same if I have understood it right.

Anyone can correct me here please???

Shouldn't -sbs be set to 128 or possible 256 for an 6870??
ID: 44256 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44257 - Posted: 25 Oct 2012, 15:11:15 UTC - in response to Message 44256.  

There is no signs in stderr that app recived any params at all. So it uses defaults.
-sbs N should result in message about new value + value actually used will be reported as "single buffer ..."
ID: 44257 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 16 Jun 05
Posts: 2530
Credit: 1,074,556
RAC: 0
Germany
Message 44259 - Posted: 25 Oct 2012, 20:40:37 UTC
Last modified: 25 Oct 2012, 20:41:28 UTC

It is set to 512.

Used GPU device parameters are:
Number of compute units: 14
Single buffer allocation size: 256MB

Used GPU device parameters are:
Number of compute units: 14
Single buffer allocation size: 512MB

He changed it to 256 first and then 512.
With each crime and every kindness we birth our future.
ID: 44259 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 16 Jun 05
Posts: 2530
Credit: 1,074,556
RAC: 0
Germany
Message 44260 - Posted: 25 Oct 2012, 20:48:20 UTC
Last modified: 25 Oct 2012, 21:17:03 UTC

Shouldn't -sbs be set to 128 or possible 256 for an 6870??


With -sbs 256 you should see speedup ~5% - 10%.

MB7_win_x86_SSE_OpenCL_ATi_r1643.exe -period_iterations_num 15 -verbose -hp -sbs 64 :
Elapsed 96.181 secs
CPU 36.676 secs

MB7_win_x86_SSE_OpenCL_ATi_r1643.exe -period_iterations_num 15 -verbose -hp -sbs 256 :
Elapsed 90.480 secs, speedup: 5.93% ratio: 1.06
CPU 31.403 secs, speedup: 14.38% ratio: 1.17

MB7_win_x86_SSE_OpenCL_ATi_r1643.exe -period_iterations_num 15 -verbose -hp -sbs 512 :
Elapsed 248.997 secs, speedup: -158.88% ratio: 0.39
CPU 34.632 secs, speedup: 5.57% ratio: 1.06

As you can see -sbs 512 is waste of time.
Even with my 18 CU`s
With each crime and every kindness we birth our future.
ID: 44260 · Report as offensive
EoD
Volunteer tester
Avatar

Send message
Joined: 21 Oct 12
Posts: 4
Credit: 4,548
RAC: 0
Germany
Message 44264 - Posted: 26 Oct 2012, 22:19:16 UTC - in response to Message 44259.  
Last modified: 26 Oct 2012, 22:20:38 UTC

EDIT: btw, -unroll param doesn't defined for MB7 app and ignored. Remove it, it's for AP6 app.

Ok, I removed it, thanks :)

It is set to 512.

Used GPU device parameters are:
Number of compute units: 14
Single buffer allocation size: 256MB

Used GPU device parameters are:
Number of compute units: 14
Single buffer allocation size: 512MB

He changed it to 256 first and then 512.

Yes, I played a bit with the values before I let it run for most of its part.

MB7_win_x86_SSE_OpenCL_ATi_r1643.exe -period_iterations_num 15 -verbose -hp -sbs 512 :
Elapsed 248.997 secs, speedup: -158.88% ratio: 0.39
CPU 34.632 secs, speedup: 5.57% ratio: 1.06

As you can see -sbs 512 is waste of time.
Even with my 18 CU`s

How can it be, that it's slower with 512MB than 256MB? My GPU Ram never got above 750MB (of 1GB), so I don't see any reason why going lower decreases calculation time.
ID: 44264 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44265 - Posted: 27 Oct 2012, 5:51:02 UTC - in response to Message 44264.  
Last modified: 27 Oct 2012, 5:52:43 UTC


How can it be, that it's slower with 512MB than 256MB? My GPU Ram never got above 750MB (of 1GB), so I don't see any reason why going lower decreases calculation time.


It can and will be quite simple.
Short answer - balance between CU load and memory subsystem load.
More advanced explanation: when app uses bigger memory amount it launches more separate workitems to do less iterations per workitem.
While GPU engine not fully loaded it will be benefical.
From some point all GPU processing elements are busy already and new workitems just go in queue (this will not increase but also not decrease performance per se). But each workitem requires some initial data to be fetched from GPU memory.
The more number of workitems the more data should be read. While all workitems become immediatelly active GPU hardware can use broadcast mechanisms to supply required value to many workitems at once, GPU memory subsystem load stays low.
But when workitem go in queue first and become active only after some awaiting it needs to fetch GPU memory from scratch, increasing GPU memory subsystem load.
ID: 44265 · Report as offensive
EoD
Volunteer tester
Avatar

Send message
Joined: 21 Oct 12
Posts: 4
Credit: 4,548
RAC: 0
Germany
Message 44266 - Posted: 27 Oct 2012, 9:16:19 UTC - in response to Message 44265.  

I tried one workunit with sbs 256 and it seemed to be faster.

Did you try values between 256 and 512, like 384 on your HD69xx?
ID: 44266 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 16 Jun 05
Posts: 2530
Credit: 1,074,556
RAC: 0
Germany
Message 44267 - Posted: 28 Oct 2012, 9:15:57 UTC - in response to Message 44266.  

I tried one workunit with sbs 256 and it seemed to be faster.

Did you try values between 256 and 512, like 384 on your HD69xx?


I did.
256 is fastest.

With each crime and every kindness we birth our future.
ID: 44267 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 18 May 06
Posts: 280
Credit: 26,477,429
RAC: 0
United States
Message 44293 - Posted: 2 Nov 2012, 0:47:15 UTC
Last modified: 2 Nov 2012, 0:47:50 UTC

Is there a FAQ, readme, or something, that lists all the parameters, and the value ranges? I asked before, and was told to look at the lunatics forum. I looked there, but I was never able to find it.
Dublin, California
Team: SETI.USA

ID: 44293 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : News : SETI@home v7 6.98 for ATI OpenCL released.


 
©2023 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.