Radeon VII Seti performance vs 1080ti SoG?

Message boards : Number crunching : Radeon VII Seti performance vs 1080ti SoG?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1995591 - Posted: 28 May 2019, 8:20:56 UTC - in response to Message 1995583.  

Are you running the n_instances_per_device parameter in your command line to go with your cpu_lock?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1995591 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1995645 - Posted: 28 May 2019, 18:44:44 UTC

I am not sure if i understand your question.

I use the parameter set from Mike above:
-sbs 2048 -period_iterations_num 10 -high_perf -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -cpu_lock

I only changed the 10 iterations to 1 as mentioned.
I dont see n_instances_per_device there - so i think i dont use it.

The 2 WUs in parallel are managed by an app_config (0.5 GPU and 1CPU per WU)

What does the -cpu_lock do exactly?
ID: 1995645 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1995649 - Posted: 28 May 2019, 18:56:34 UTC

I understood that the cpu_lock parameter was supposed to be used with the instances_per_device N parameter per Raistmer's instructions. In case you aren't aware, Raistmer is the Windows SoG developer.

NOTE: -cpu_lock can't be used to run few app instances simultaneously until -instances_per_device N supplied with proper (or greater) number of instances.
W/o that option first instance will be locked to 0th CPU but second will be suspended until first finishes.

If you read his discussions at Lunatics on the app in the Loading APU to the limit thread and the Some considerations regarding OpenCL MultiBeam app tuning thread, you might gain a better understanding of the parameters.

http://lunatics.kwsn.info/index.php/topic,1735.msg61165/topicseen.html#new
http://lunatics.kwsn.info/index.php/topic,1808.msg61251.html#msg61251
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1995649 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1995654 - Posted: 28 May 2019, 19:46:39 UTC

Thank you!
As a first quick solution i will remove this cpu lock.
ID: 1995654 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34492
Credit: 79,922,639
RAC: 80
Germany
Message 1995665 - Posted: 28 May 2019, 20:53:02 UTC - in response to Message 1995649.  

I understood that the cpu_lock parameter was supposed to be used with the instances_per_device N parameter per Raistmer's instructions. In case you aren't aware, Raistmer is the Windows SoG developer.

NOTE: -cpu_lock can't be used to run few app instances simultaneously until -instances_per_device N supplied with proper (or greater) number of instances.
W/o that option first instance will be locked to 0th CPU but second will be suspended until first finishes.

If you read his discussions at Lunatics on the app in the Loading APU to the limit thread and the Some considerations regarding OpenCL MultiBeam app tuning thread, you might gain a better understanding of the parameters.

http://lunatics.kwsn.info/index.php/topic,1735.msg61165/topicseen.html#new
http://lunatics.kwsn.info/index.php/topic,1808.msg61251.html#msg61251


Yes, but it also helps to pin the GPU app to the correct CPU cores.
With each crime and every kindness we birth our future.
ID: 1995665 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1995671 - Posted: 28 May 2019, 21:17:54 UTC
Last modified: 28 May 2019, 21:18:15 UTC

Also without the cpu lock ohne WU is stucking. Its marked as active - not waiting or something like this. But there ist no progress and the fan slows down with just one really active WU.
In the meantime 3 other WUs were started and finished.
OK, at the weekend i read a little bit and try to understand, how to pin a GPU-program to a CPU-core.
ID: 1995671 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34492
Credit: 79,922,639
RAC: 80
Germany
Message 1995673 - Posted: 28 May 2019, 21:37:19 UTC - in response to Message 1995671.  

Also without the cpu lock ohne WU is stucking. Its marked as active - not waiting or something like this. But there ist no progress and the fan slows down with just one really active WU.
In the meantime 3 other WUs were started and finished.
OK, at the weekend i read a little bit and try to understand, how to pin a GPU-program to a CPU-core.


I forgot you are running Linux.
IIRC this param is not working in Linux.

Stderr should give a notice....... Info: CPU affinity mask used: 1; system mask is ffff

Just to make sure reduce CPU tasks to 10 to see if this helps.
With each crime and every kindness we birth our future.
ID: 1995673 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1995675 - Posted: 28 May 2019, 21:49:12 UTC - in response to Message 1995673.  

Stderr should give a notice....... Info: CPU affinity mask used: 1; system mask is ffff

That is what I always found strange in his stderr.txt outputs. No sign it is using the cpu affinity mask.
I've seen that used in other ATI hosts outputs.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1995675 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13913
Credit: 208,696,464
RAC: 304
Australia
Message 1995717 - Posted: 29 May 2019, 5:03:34 UTC - in response to Message 1995675.  

Stderr should give a notice....... Info: CPU affinity mask used: 1; system mask is ffff

That is what I always found strange in his stderr.txt outputs. No sign it is using the cpu affinity mask.
I've seen that used in other ATI hosts outputs.

From a WU processed by my RTX 2060
Target kernel sequence time set to 1500ms
Number of period iterations for PulseFind set to:1
High-performance path selected. If GUI lags occur consider to remove -high_perf option from tuning line
System timer will be set in high resolution mode
CPU affinity adjustment enabled
Maximum single buffer size set to:2048MB
SpikeFind FFT size threshold override set to:4096
TUNE: kernel 1 now has workgroup size of (64,1,4)
oclFFT global radix override set to:256
oclFFT local radix override set to:16
oclFFT max WG size override set to:256
oclFFT max local FFT size override set to:512
oclFFT number of local memory banks set to:64
oclFFT minimal memory coalesce width set to:64
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: NVIDIA Corporation
GPUlock enabled. Use -instances_per_device N switch to provide number of instances to run if BOINC is configured to launch few tasks per device.
BOINC assigns device 1, slots 1 to 1 (including) will be checked
Used slot is 1;	Info: BOINC provided OpenCL device ID used
Info: CPU affinity mask used: 2; system mask is fff

Grant
Darwin NT
ID: 1995717 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1995823 - Posted: 29 May 2019, 19:18:45 UTC - in response to Message 1995673.  
Last modified: 29 May 2019, 19:21:41 UTC

Also without the cpu lock ohne WU is stucking. Its marked as active - not waiting or something like this. But there ist no progress and the fan slows down with just one really active WU.
In the meantime 3 other WUs were started and finished.
OK, at the weekend i read a little bit and try to understand, how to pin a GPU-program to a CPU-core.


I forgot you are running Linux.
IIRC this param is not working in Linux.

Stderr should give a notice....... Info: CPU affinity mask used: 1; system mask is ffff

Just to make sure reduce CPU tasks to 10 to see if this helps.


I thought the number of this WUs was reduced - but there is a time limit and i have a lot of them without notice, just found them in the list of my tasks.
The good thing is, they dont block the GPU for hours or days...

Here is the stderr of a "bad task":
[spoiler]
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 4791.83 (3677746.15G/753.52G)</message>
<stderr_txt>
Maximum single buffer size set to:2048MB
Number of period iterations for PulseFind set to 1 
High-performance path selected. If GUI lags occur consider to remove -high_perf option from tuning line
SpikeFind FFT size threshold override set to:4096
TUNE: kernel 1 now has workgroup size of (64,1,4)
oclFFT global radix override set to:256
oclFFT local radix override set to:16
oclFFT max WG size override set to:256
oclFFT max local FFT size override set to:512
oclFFT number of local memory banks set to:64
oclFFT minimal memory coalesce width set to:64
OpenCL platform detected: Advanced Micro Devices, Inc.
Number of OpenCL devices found : 1 
BOINC assigns slot on device #0.
Info: BOINC provided OpenCL device ID used

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSE2 64bit 
 System: Linux  x86_64  Kernel: 4.15.0-50-generic
 CPU   : AMD Ryzen 7 1700 Eight-Core Processor
 16 core(s), Speed :  3017.782 MHz
 L1 : 64 KB, Cache : 512 KB
 Features : FPU TSC PAE APIC MTRR MMX SSE  SSE2 HT PNI SSSE3 SSE4A SSE4_1 SSE4_2 AVX  AVX2  

OpenCL-kernels filename : MultiBeam_Kernels_r3584.cl 
ar=0.008117  NumCfft=116331  NumGauss=0  NumPulse=47020601216  NumTriplet=59991734432
Currently allocated 4121 MB for GPU buffers
In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768
Linux optimized setiathome_v8 application
Version info: SSE2x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSE2x Linux64 Build 3584 , Ported by : Raistmer, JDWhale, Urs Echternacht


OpenCL version by Raistmer, r3584

AMD HD5 version by Raistmer

Number of OpenCL platforms:				 1


 OpenCL Platform Name:					 AMD Accelerated Parallel Processing
Number of devices:				 1
  Max compute units:				 60
  Max work group size:				 256
  Max clock frequency:				 1802Mhz
  Max memory allocation:			 4244635648
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 16384
  Global memory size:				 16978542592
  Constant buffer size:				 4244635648
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 65536
  Queue properties:				 
    Out-of-Order:				 No
  Profiling timer offset:			 1
  Global free memory:				 1
  SIMD per compute unit:			 4
  SIMD width:					 16
  SIMD instruction width:			 1
  Wavefront width:				 64
  Global mem channels:				 128
  Global mem channel banks:			 4
  Global mem channel bank width:		 256
  Local mem size per compute unit:		 65536
  Local mem banks:				 32
  Thread trace supported:			 Yes
  Board Name:					 AMD Radeon VII
  Name:						 gfx906
  Vendor:					 Advanced Micro Devices, Inc.
  Driver version:				 2841.4 (PAL,HSAIL)
  Version:					 OpenCL 2.0 AMD-APP (2841.4)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes 


Work Unit Info:
...............
Credit multiplier is :  2.85
WU true angle range is :  0.008117
Used GPU device parameters are:
	Number of compute units: 60
	Single buffer allocation size: 4048MB
	Total device global memory: 16192MB
	max WG size: 256
	local mem type: Real
	LotOfMem path: yes
	LowPerformanceGPU path: no
	HighPerformanceGPU path: yes
period_iterations_num=1

</stderr_txt>
]]>

[/spoiler]

No i reduce the cpu used cores from 14 to 10.
ID: 1995823 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1995834 - Posted: 29 May 2019, 20:31:31 UTC - in response to Message 1995823.  


NoW i reduce the cpu used cores from 14 to 10.

There was a W missing.
ID: 1995834 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1997581 - Posted: 9 Jun 2019, 18:52:00 UTC
Last modified: 9 Jun 2019, 18:56:25 UTC

This weekend i had time to test again.
Now i have removed all optimizations, the txt-files are empty.
I also run only 1 WU on the GPU, the CPU-threads are still 10 used of 16.
And i still have WUs that stuck and go later in timeout.
Next step will be testing in Windows.
ID: 1997581 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1997939 - Posted: 12 Jun 2019, 18:01:31 UTC

Also Windows/Lunatics is running fine with just 1 WU on the GPU for 24h without any error.
But switching to 2 WUs means that they hang after some minutes run time. But instead of timeout, the graphics driver is resetting.

My conclusion is: Seti@Home is running poor on the Radeon VII. I will continue to cruch this project on nvidia.
The VII is running perfect with Einstein and Milkyway.
ID: 1997939 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1997944 - Posted: 12 Jun 2019, 19:19:31 UTC - in response to Message 1981122.  

Up and crunching now under Ubuntu 18.04. Vega 56 and Radeon VII in the same system.

https://setiathome.berkeley.edu/show_host_detail.php?hostid=6993114


@Sean: I would like to know how your settings are for this PC?
Just the stock app and only 1 WU per GPU ?
ID: 1997944 · Report as offensive
Profile Sputnik
Avatar

Send message
Joined: 20 Oct 01
Posts: 3
Credit: 10,453,636
RAC: 131
Germany
Message 2001910 - Posted: 10 Jul 2019, 10:21:12 UTC - in response to Message 1997939.  

Same conclusion here:
SETI GPU Crunching and Radeon VII (@Win10) and 1WU for GPU is stable, but not using the possibilities.
2 (or more) WU will reset the GPU driver or hang the thread... tried with different Adrenalin Versions and different frequencies/volts.
GPU utilization jumps high and low in seconds (e.g. Taskmanager|GPU|Compute1... or AMD WATTMAN|Activity) with 1WU and is overall far far away from 100% - there is enough free raw gpu power, but today it can´t be used.

MilkyWay and Einstein run like hell...
beep-beep-beep...
ID: 2001910 · Report as offensive
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : Radeon VII Seti performance vs 1080ti SoG?


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.