Message boards :
Number crunching :
Radeon VII Seti performance vs 1080ti SoG?
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Are you running the n_instances_per_device parameter in your command line to go with your cpu_lock? Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() Send message Joined: 19 Sep 99 Posts: 70 Credit: 40,327,877 RAC: 75 ![]() ![]() |
I am not sure if i understand your question. I use the parameter set from Mike above: -sbs 2048 -period_iterations_num 1 I only changed the 10 iterations to 1 as mentioned. I dont see n_instances_per_device there - so i think i dont use it. The 2 WUs in parallel are managed by an app_config (0.5 GPU and 1CPU per WU) What does the -cpu_lock do exactly? |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I understood that the cpu_lock parameter was supposed to be used with the instances_per_device N parameter per Raistmer's instructions. In case you aren't aware, Raistmer is the Windows SoG developer. NOTE: -cpu_lock can't be used to run few app instances simultaneously until -instances_per_device N supplied with proper (or greater) number of instances. If you read his discussions at Lunatics on the app in the Loading APU to the limit thread and the Some considerations regarding OpenCL MultiBeam app tuning thread, you might gain a better understanding of the parameters. http://lunatics.kwsn.info/index.php/topic,1735.msg61165/topicseen.html#new http://lunatics.kwsn.info/index.php/topic,1808.msg61251.html#msg61251 Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() Send message Joined: 19 Sep 99 Posts: 70 Credit: 40,327,877 RAC: 75 ![]() ![]() |
Thank you! As a first quick solution i will remove this cpu lock. |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34492 Credit: 79,922,639 RAC: 80 ![]() ![]() |
I understood that the cpu_lock parameter was supposed to be used with the instances_per_device N parameter per Raistmer's instructions. In case you aren't aware, Raistmer is the Windows SoG developer. Yes, but it also helps to pin the GPU app to the correct CPU cores. With each crime and every kindness we birth our future. |
![]() ![]() Send message Joined: 19 Sep 99 Posts: 70 Credit: 40,327,877 RAC: 75 ![]() ![]() |
Also without the cpu lock ohne WU is stucking. Its marked as active - not waiting or something like this. But there ist no progress and the fan slows down with just one really active WU. In the meantime 3 other WUs were started and finished. OK, at the weekend i read a little bit and try to understand, how to pin a GPU-program to a CPU-core. |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34492 Credit: 79,922,639 RAC: 80 ![]() ![]() |
Also without the cpu lock ohne WU is stucking. Its marked as active - not waiting or something like this. But there ist no progress and the fan slows down with just one really active WU. I forgot you are running Linux. IIRC this param is not working in Linux. Stderr should give a notice....... Info: CPU affinity mask used: 1; system mask is ffff Just to make sure reduce CPU tasks to 10 to see if this helps. With each crime and every kindness we birth our future. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Stderr should give a notice....... Info: CPU affinity mask used: 1; system mask is ffff That is what I always found strange in his stderr.txt outputs. No sign it is using the cpu affinity mask. I've seen that used in other ATI hosts outputs. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Stderr should give a notice....... Info: CPU affinity mask used: 1; system mask is ffff From a WU processed by my RTX 2060 Target kernel sequence time set to 1500ms Number of period iterations for PulseFind set to:1 High-performance path selected. If GUI lags occur consider to remove -high_perf option from tuning line System timer will be set in high resolution mode CPU affinity adjustment enabled Maximum single buffer size set to:2048MB SpikeFind FFT size threshold override set to:4096 TUNE: kernel 1 now has workgroup size of (64,1,4) oclFFT global radix override set to:256 oclFFT local radix override set to:16 oclFFT max WG size override set to:256 oclFFT max local FFT size override set to:512 oclFFT number of local memory banks set to:64 oclFFT minimal memory coalesce width set to:64 Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: NVIDIA Corporation GPUlock enabled. Use -instances_per_device N switch to provide number of instances to run if BOINC is configured to launch few tasks per device. BOINC assigns device 1, slots 1 to 1 (including) will be checked Used slot is 1; Info: BOINC provided OpenCL device ID used Info: CPU affinity mask used: 2; system mask is fff Grant Darwin NT |
![]() ![]() Send message Joined: 19 Sep 99 Posts: 70 Credit: 40,327,877 RAC: 75 ![]() ![]() |
Also without the cpu lock ohne WU is stucking. Its marked as active - not waiting or something like this. But there ist no progress and the fan slows down with just one really active WU. I thought the number of this WUs was reduced - but there is a time limit and i have a lot of them without notice, just found them in the list of my tasks. The good thing is, they dont block the GPU for hours or days... Here is the stderr of a "bad task": [spoiler] <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> exceeded elapsed time limit 4791.83 (3677746.15G/753.52G)</message> <stderr_txt> Maximum single buffer size set to:2048MB Number of period iterations for PulseFind set to 1 High-performance path selected. If GUI lags occur consider to remove -high_perf option from tuning line SpikeFind FFT size threshold override set to:4096 TUNE: kernel 1 now has workgroup size of (64,1,4) oclFFT global radix override set to:256 oclFFT local radix override set to:16 oclFFT max WG size override set to:256 oclFFT max local FFT size override set to:512 oclFFT number of local memory banks set to:64 oclFFT minimal memory coalesce width set to:64 OpenCL platform detected: Advanced Micro Devices, Inc. Number of OpenCL devices found : 1 BOINC assigns slot on device #0. Info: BOINC provided OpenCL device ID used Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSE2 64bit System: Linux x86_64 Kernel: 4.15.0-50-generic CPU : AMD Ryzen 7 1700 Eight-Core Processor 16 core(s), Speed : 3017.782 MHz L1 : 64 KB, Cache : 512 KB Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT PNI SSSE3 SSE4A SSE4_1 SSE4_2 AVX AVX2 OpenCL-kernels filename : MultiBeam_Kernels_r3584.cl ar=0.008117 NumCfft=116331 NumGauss=0 NumPulse=47020601216 NumTriplet=59991734432 Currently allocated 4121 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Linux optimized setiathome_v8 application Version info: SSE2x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSE2x Linux64 Build 3584 , Ported by : Raistmer, JDWhale, Urs Echternacht OpenCL version by Raistmer, r3584 AMD HD5 version by Raistmer Number of OpenCL platforms: 1 OpenCL Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Max compute units: 60 Max work group size: 256 Max clock frequency: 1802Mhz Max memory allocation: 4244635648 Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 16978542592 Constant buffer size: 4244635648 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Queue properties: Out-of-Order: No Profiling timer offset: 1 Global free memory: 1 SIMD per compute unit: 4 SIMD width: 16 SIMD instruction width: 1 Wavefront width: 64 Global mem channels: 128 Global mem channel banks: 4 Global mem channel bank width: 256 Local mem size per compute unit: 65536 Local mem banks: 32 Thread trace supported: Yes Board Name: AMD Radeon VII Name: gfx906 Vendor: Advanced Micro Devices, Inc. Driver version: 2841.4 (PAL,HSAIL) Version: OpenCL 2.0 AMD-APP (2841.4) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.008117 Used GPU device parameters are: Number of compute units: 60 Single buffer allocation size: 4048MB Total device global memory: 16192MB max WG size: 256 local mem type: Real LotOfMem path: yes LowPerformanceGPU path: no HighPerformanceGPU path: yes period_iterations_num=1 </stderr_txt> ]]> [/spoiler] No i reduce the cpu used cores from 14 to 10. |
![]() ![]() Send message Joined: 19 Sep 99 Posts: 70 Credit: 40,327,877 RAC: 75 ![]() ![]() |
There was a W missing. |
![]() ![]() Send message Joined: 19 Sep 99 Posts: 70 Credit: 40,327,877 RAC: 75 ![]() ![]() |
This weekend i had time to test again. Now i have removed all optimizations, the txt-files are empty. I also run only 1 WU on the GPU, the CPU-threads are still 10 used of 16. And i still have WUs that stuck and go later in timeout. Next step will be testing in Windows. |
![]() ![]() Send message Joined: 19 Sep 99 Posts: 70 Credit: 40,327,877 RAC: 75 ![]() ![]() |
Also Windows/Lunatics is running fine with just 1 WU on the GPU for 24h without any error. But switching to 2 WUs means that they hang after some minutes run time. But instead of timeout, the graphics driver is resetting. My conclusion is: Seti@Home is running poor on the Radeon VII. I will continue to cruch this project on nvidia. The VII is running perfect with Einstein and Milkyway. |
![]() ![]() Send message Joined: 19 Sep 99 Posts: 70 Credit: 40,327,877 RAC: 75 ![]() ![]() |
Up and crunching now under Ubuntu 18.04. Vega 56 and Radeon VII in the same system. @Sean: I would like to know how your settings are for this PC? Just the stock app and only 1 WU per GPU ? |
![]() ![]() Send message Joined: 20 Oct 01 Posts: 3 Credit: 10,453,636 RAC: 131 ![]() ![]() |
Same conclusion here: SETI GPU Crunching and Radeon VII (@Win10) and 1WU for GPU is stable, but not using the possibilities. 2 (or more) WU will reset the GPU driver or hang the thread... tried with different Adrenalin Versions and different frequencies/volts. GPU utilization jumps high and low in seconds (e.g. Taskmanager|GPU|Compute1... or AMD WATTMAN|Activity) with 1WU and is overall far far away from 100% - there is enough free raw gpu power, but today it can´t be used. MilkyWay and Einstein run like hell... beep-beep-beep... |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.