Message boards :
AstroPulse :
oclFFT's -34 failure in last AP builds debugging
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Please try this debug build: https://www.dropbox.com/s/6to8zalyo6silhc/AP7_win_x86_SSE2_OpenCL_ATI_r2559_debug.7z It will generate very long stderr, i'm interesting in first few dozens lines + separately generated clFFT_dumpPLAN.txt file if any. Example of stderr from my Loveland (offline run): Not using ap_cmdline.txt-file, using commandline options. 02:31:44 (5840): Can't set up shared mem: -1. Will run in standalone mode. Priority of worker thread raised successfully Priority of process adjusted successfully, below normal priority class used GPU device # not found in init_data.xml WARNING: BOINC was unable to find GPU device, using own enumeration OpenCL platform detected: Advanced Micro Devices, Inc. WARNING: BOINC supplied wrong platform! call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752 BOINC assigns device 0 WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities call 'clGetDeviceIDs (second call)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 919 call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936 call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961 call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317 Used GPU device parameters are: Number of compute units: 2 Single buffer allocation size: 167MB Total device global memory: 669MB max WG size: 256 -unroll default value used: 2 -ffa_block default value used: 512 -ffa_block_fetch default value used: 256 Build features: Non-graphics BLANKIT OpenCL TWIN_FFA OCL_VERBOSE OCL_ZERO_COPY COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86 CPUID: AMD C-60 APU with Radeon(tm) HD Graphics Cache: L1=64K L2=512K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4A AstroPulse v7 Windows x86 rev 2559, V7 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2 OpenCL version by Raistmer oclFFT fix for ATI GPUs by Urs Echternacht ffa threshold mods by Joe Segur SSE3 dechirping by JDWhale Combined dechirp kernel by Frizz Built with uncommitted modifications Number of OpenCL platforms: 1 OpenCL Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Max compute units: 2 Max work group size: 256 Max clock frequency: 275Mhz Max memory allocation: 175374336 Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 701497344 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Queue properties: Out-of-Order: No Name: Loveland Vendor: Advanced Micro Devices, Inc. Driver version: 1268.1 (VM) Version: OpenCL 1.2 AMD-APP (1268.1) Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_dx9_media_sharing cl_amd_image2d_from_buffer_read_only state.fold_buf_size_short=65536; state.fold_buf_size_long=262144 INFO: can't open binary kernel file: .\\AstroPulse_Kernels_r2559.cl_Loveland.bin_V7_TWIN_FFA_12681VM, continue with recompile... call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585 INFO: binary kernel file created call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130 call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139 call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459 call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474 call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480 call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486 call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516 call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549 call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577 call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580 call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582 call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583 call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593 call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595 call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611 call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621 WARNING: can't open binary kernel file for oclFFT plan: .\\AP_clFFTplan_Loveland_32768_r2559.bin_12681VM, continue with recompile... call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_setup.cpp near line 723 oclFFT context=6d1b58 AP main context=6d1b58 Dumping clFFT Plan to file call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719 call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580 in ap oclFFT_1 ok. in ap oclFFT_1 ok. in ap oclFFT_1 ok. call 'clFFT_ExecuteInterleaved_ap' is finished OK in file ..\..\ap_science.cpp near line 2680 call 'Setting kernel argument: dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 2117 News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
The nVidia app is working; INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2559.cl_GeForce8800GT.bin_V7_TWIN_FFA_26658, continue with recompile... INFO: binary kernel file created WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_GeForce8800GT_32768_r2559.bin_26658, continue with recompile... The Two ATI Cards aren't, they both say; Error in ap oclFFT_1: -34 ERROR: OpenCL kernel/call 'clFFT_ExecuteInterleaved_ap' call failed (-34) in file ..\..\ap_science.cpp near line 2680. Waiting 30 sec before restart... The first few lines of the 6770 say; Priority of process adjusted successfully, high priority class used OpenCL platform detected: NVIDIA Corporation OpenCL platform detected: Advanced Micro Devices, Inc. call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752 BOINC assigns device 0, slots 0 to 0 (including) will be checked Used slot is 0; Info: BOINC provided OpenCL device ID used call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936 call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961 call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317 Info: CPU affinity mask used: 1 clFFT_dumpPLAN.txt; Run kernel fft0 with global dim = {4096*BatchSize}, local dim={128} Run kernel fft1 with global dim = {4096*BatchSize}, local dim={128} Run kernel fft2 with global dim = {4096*BatchSize}, local dim={256} The first few lines of the 4670 say; Priority of process adjusted successfully, high priority class used OpenCL platform detected: NVIDIA Corporation OpenCL platform detected: Advanced Micro Devices, Inc. call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752 BOINC assigns device 1, slots 1 to 1 (including) will be checked Used slot is 1; Info: BOINC provided OpenCL device ID used call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936 call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961 call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317 Info: CPU affinity mask used: 2 clFFT_dumpPLAN.txt; Run kernel fft0 with global dim = {4096*BatchSize}, local dim={32} Run kernel fft1 with global dim = {4096*BatchSize}, local dim={32} Run kernel fft2 with global dim = {4096*BatchSize}, local dim={32} #ifndef M_PI #define M_PI 0x1.921fb54442d18p+1 #endif #define complexMul(a,b) ((float2)(mad(-(a).y, (b).y, (a).x * (b).x), mad((a).y, (b).x, (a).x * (b).y))) #define cos_sinLUT1(res,dir,i,cossinLUT)\ {\ (res)=(float2)((cossinLUT)[i].x , (dir)*(cossinLUT)[i].y);\ } #define cos_sinLUT2(res,dir,_i,_k,cossinLUT1,cossinLUT2) \ { float _sin_1= (cossinLUT1)[_i].y; \ float _sin_2= (cossinLUT2)[_k].y; \ float _cos_1= (cossinLUT1)[_i].x; \ float _cos_2= (cossinLUT2)[_k].x; \ float _cos_res = _cos_1 * _cos_2 - _sin_1 * _sin_2; \ float _sin_res = (dir) * (_sin_1 * _cos_2 + _cos_1 * _sin_2); \ (res)=(float2)(_cos_res,_sin_res); \ } #define conj(a) ((float2)((a).x, -(a).y)) #define conjTransp(a) ((float2)(-(a).y, (a).x)) #define fftKernel2(a,dir) \ { \ float2 c = (a)[0]; \ (a)[0] = c + (a)[1]; \ (a)[1] = c - (a)[1]; \ } #define fftKernel2S(d1,d2,dir) \ { \ float2 c = (d1); \ (d1) = c + (d2); \ (d2) = c - (d2); \ } #define fftKernel4(a,dir) \ { \ fftKernel2S((a)[0], (a)[2], dir); \ fftKernel2S((a)[1], (a)[3], dir); \ fftKernel2S((a)[0], (a)[1], dir); \ (a)[3] = (float2)(dir)*(conjTransp((a)[3])); \ fftKernel2S((a)[2], (a)[3], dir); \ float2 c = (a)[1]; \ (a)[1] = (a)[2]; \ (a)[2] = c; \ } #define fftKernel4s(a0,a1,a2,a3,dir) \ { \ fftKernel2S((a0), (a2), dir); \ fftKernel2S((a1), (a3), dir); \ fftKernel2S((a0), (a1), dir); \ (a3) = (float2)(dir)*(conjTransp((a3))); \ fftKernel2S((a2), (a3), dir); \ float2 c = (a1); \ (a1) = (a2); \ (a2) = c; \ } #define bitreverse8(a) \ { \ float2 c; \ c = (a)[1]; \ (a)[1] = (a)[4]; \ (a)[4] = c; \ c = (a)[3]; \ (a)[3] = (a)[6]; \ (a)[6] = c; \ } #define fftKernel8(a,dir) \ { \ const float2 w1 = (float2)(0x1.6a09e6p-1f, dir*0x1.6a09e6p-1f); \ const float2 w3 = (float2)(-0x1.6a09e6p-1f, dir*0x1.6a09e6p-1f); \ float2 c; \ fftKernel2S((a)[0], (a)[4], dir); \ fftKernel2S((a)[1], (a)[5], dir); \ fftKernel2S((a)[2], (a)[6], dir); \ fftKernel2S((a)[3], (a)[7], dir); \ (a)[5] = complexMul(w1, (a)[5]); \ (a)[6] = (float2)(dir)*(conjTransp((a)[6])); \ (a)[7] = complexMul(w3, (a)[7]); \ fftKernel2S((a)[0], (a)[2], dir); \ fftKernel2S((a)[1], (a)[3], dir); \ fftKernel2S((a)[4], (a)[6], dir); \ fftKernel2S((a)[5], (a)[7], dir); \ (a)[3] = (float2)(dir)*(conjTransp((a)[3])); \ (a)[7] = (float2)(dir)*(conjTransp((a)[7])); \ fftKernel2S((a)[0], (a)[1], dir); \ fftKernel2S((a)[2], (a)[3], dir); \ fftKernel2S((a)[4], (a)[5], dir); \ fftKernel2S((a)[6], (a)[7], dir); \ bitreverse8((a)); \ } #define bitreverse4x4(a) \ { \ float2 c; \ c = (a)[1]; (a)[1] = (a)[4]; (a)[4] = c; \ c = (a)[2]; (a)[2] = (a)[8]; (a)[8] = c; \ c = (a)[3]; (a)[3] = (a)[12]; (a)[12] = c; \ c = (a)[6]; (a)[6] = (a)[9]; (a)[9] = c; \ c = (a)[7]; (a)[7] = (a)[13]; (a)[13] = c; \ c = (a)[11]; (a)[11] = (a)[14]; (a)[14] = c; \ } #define fftKernel16(a,dir) \ { \ const float w0 = 0x1.d906bcp-1f; \ const float w1 = 0x1.87de2ap-2f; \ const float w2 = 0x1.6a09e6p-1f; \ fftKernel4s((a)[0], (a)[4], (a)[8], (a)[12], dir); \ fftKernel4s((a)[1], (a)[5], (a)[9], (a)[13], dir); \ fftKernel4s((a)[2], (a)[6], (a)[10], (a)[14], dir); \ fftKernel4s((a)[3], (a)[7], (a)[11], (a)[15], dir); \ (a)[5] = complexMul((a)[5], (float2)(w0, dir*w1)); \ (a)[6] = complexMul((a)[6], (float2)(w2, dir*w2)); \ (a)[7] = complexMul((a)[7], (float2)(w1, dir*w0)); \ (a)[9] = complexMul((a)[9], (float2)(w2, dir*w2)); \ (a)[10] = (float2)(dir)*(conjTransp((a)[10])); \ (a)[11] = complexMul((a)[11], (float2)(-w2, dir*w2)); \ (a)[13] = complexMul((a)[13], (float2)(w1, dir*w0)); \ (a)[14] = complexMul((a)[14], (float2)(-w2, dir*w2)); \ (a)[15] = complexMul((a)[15], (float2)(-w0, dir*-w1)); \ fftKernel4((a), dir); \ fftKernel4((a) + 4, dir); \ fftKernel4((a) + 8, dir); \ fftKernel4((a) + 12, dir); \ bitreverse4x4((a)); \ } #define bitreverse32(a) \ { \ float2 c1, c2; \ c1 = (a)[2]; (a)[2] = (a)[1]; c2 = (a)[4]; (a)[4] = c1; c1 = (a)[8]; (a)[8] = c2; c2 = (a)[16]; (a)[16] = c1; (a)[1] = c2; \ c1 = (a)[6]; (a)[6] = (a)[3]; c2 = (a)[12]; (a)[12] = c1; c1 = (a)[24]; (a)[24] = c2; c2 = (a)[17]; (a)[17] = c1; (a)[3] = c2; \ c1 = (a)[10]; (a)[10] = (a)[5]; c2 = (a)[20]; (a)[20] = c1; c1 = (a)[9]; (a)[9] = c2; c2 = (a)[18]; (a)[18] = c1; (a)[5] = c2; \ c1 = (a)[14]; (a)[14] = (a)[7]; c2 = (a)[28]; (a)[28] = c1; c1 = (a)[25]; (a)[25] = c2; c2 = (a)[19]; (a)[19] = c1; (a)[7] = c2; \ c1 = (a)[22]; (a)[22] = (a)[11]; c2 = (a)[13]; (a)[13] = c1; c1 = (a)[26]; (a)[26] = c2; c2 = (a)[21]; (a)[21] = c1; (a)[11] = c2; \ c1 = (a)[30]; (a)[30] = (a)[15]; c2 = (a)[29]; (a)[29] = c1; c1 = (a)[27]; (a)[27] = c2; c2 = (a)[23]; (a)[23] = c1; (a)[15] = c2; \ } #define fftKernel32(a,dir) \ { \ fftKernel2S((a)[0], (a)[16], dir); \ fftKernel2S((a)[1], (a)[17], dir); \ fftKernel2S((a)[2], (a)[18], dir); \ fftKernel2S((a)[3], (a)[19], dir); \ fftKernel2S((a)[4], (a)[20], dir); \ fftKernel2S((a)[5], (a)[21], dir); \ fftKernel2S((a)[6], (a)[22], dir); \ fftKernel2S((a)[7], (a)[23], dir); \ fftKernel2S((a)[8], (a)[24], dir); \ fftKernel2S((a)[9], (a)[25], dir); \ fftKernel2S((a)[10], (a)[26], dir); \ fftKernel2S((a)[11], (a)[27], dir); \ fftKernel2S((a)[12], (a)[28], dir); \ fftKernel2S((a)[13], (a)[29], dir); \ fftKernel2S((a)[14], (a)[30], dir); \ fftKernel2S((a)[15], (a)[31], dir); \ (a)[17] = complexMul((a)[17], (float2)(0x1.f6297cp-1f, dir*0x1.8f8b84p-3f)); \ (a)[18] = complexMul((a)[18], (float2)(0x1.d906bcp-1f, dir*0x1.87de2ap-2f)); \ (a)[19] = complexMul((a)[19], (float2)(0x1.a9b662p-1f, dir*0x1.1c73b4p-1f)); \ (a)[20] = complexMul((a)[20], (float2)(0x1.6a09e6p-1f, dir*0x1.6a09e6p-1f)); \ (a)[21] = complexMul((a)[21], (float2)(0x1.1c73b4p-1f, dir*0x1.a9b662p-1f)); \ (a)[22] = complexMul((a)[22], (float2)(0x1.87de2ap-2f, dir*0x1.d906bcp-1f)); \ (a)[23] = complexMul((a)[23], (float2)(0x1.8f8b84p-3f, dir*0x1.f6297cp-1f)); \ (a)[24] = complexMul((a)[24], (float2)(0x0p+0f, dir*0x1p+0f)); \ (a)[25] = complexMul((a)[25], (float2)(-0x1.8f8b84p-3f, dir*0x1.f6297cp-1f)); \ (a)[26] = complexMul((a)[26], (float2)(-0x1.87de2ap-2f, dir*0x1.d906bcp-1f)); \ (a)[27] = complexMul((a)[27], (float2)(-0x1.1c73b4p-1f, dir*0x1.a9b662p-1f)); \ (a)[28] = complexMul((a)[28], (float2)(-0x1.6a09e6p-1f, dir*0x1.6a09e6p-1f)); \ (a)[29] = complexMul((a)[29], (float2)(-0x1.a9b662p-1f, dir*0x1.1c73b4p-1f)); \ (a)[30] = complexMul((a)[30], (float2)(-0x1.d906bcp-1f, dir*0x1.87de2ap-2f)); \ (a)[31] = complexMul((a)[31], (float2)(-0x1.f6297cp-1f, dir*0x1.8f8b84p-3f)); \ fftKernel16((a), dir); \ fftKernel16((a) + 16, dir); \ bitreverse32((a)); \ } __kernel void \ clFFT_1DTwistInterleaved(__global float2 *in, unsigned int startRow, unsigned int numCols, unsigned int N, unsigned int numRowsToProcess, int dir) \ { \ float2 a, w; \ float ang; \ unsigned int j; \ unsigned int i = get_global_id(0); \ unsigned int startIndex = mad24(startRow, numCols, i); \ \ if(i < numCols) \ { \ for(j = 0; j < numRowsToProcess; j++) \ { \ a = in[startIndex]; \ ang = 2.0f * M_PI * dir * i * (startRow + j) / N; \ w = (float2)(native_cos(ang), native_sin(ang)); \ a = complexMul(a, w); \ in[startIndex] = a; \ startIndex += numCols; \ } \ } \ } \ __kernel void fft0(__global float2 *in, __global float2 *out, int dir, int S, __global float2 * cossinLUT1, __global float2 * cossinLUT2 ) { __local float sMem[260]; int i, j, r, indexIn, indexOut, index, tid, bNum, xNum, k, l; int s, ii, jj, offset; float2 w; float ang, angf, ang1; __local float *lMemStore, *lMemLoad; float2 a[8]; int lId = get_local_id( 0 ); int groupId = get_group_id( 0 ); bNum = groupId & 127; xNum = groupId >> 7; indexIn = mul24(bNum, 4); tid = indexIn; i = tid >> 0; j = tid & 0; indexOut = mad24(i, 64, j); indexIn += (xNum << 15); indexOut += (xNum << 15); tid = lId; i = tid & 3; j = tid >> 2; indexIn += mad24(j, 512, i); in += indexIn; a[0] = in[0]; a[1] = in[4096]; a[2] = in[8192]; a[3] = in[12288]; a[4] = in[16384]; a[5] = in[20480]; a[6] = in[24576]; a[7] = in[28672]; fftKernel8(a, dir); ang = dir*(0x1.921fb54442d18p-4) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[1] = complexMul(a[1], w); ang = dir*(1.9634954631e-001) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[2] = complexMul(a[2], w); ang = dir*(2.9452431947e-001) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[3] = complexMul(a[3], w); ang = dir*(3.9269909263e-001) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[4] = complexMul(a[4], w); ang = dir*(4.9087386578e-001) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[5] = complexMul(a[5], w); ang = dir*(5.8904863894e-001) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[6] = complexMul(a[6], w); ang = dir*(6.8722341210e-001) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[7] = complexMul(a[7], w); indexIn = mad24(j, 32, i); lMemStore = sMem + tid; lMemLoad = sMem + indexIn; lMemStore[0] = a[0].x; lMemStore[32] = a[1].x; lMemStore[64] = a[2].x; lMemStore[96] = a[3].x; lMemStore[128] = a[4].x; lMemStore[160] = a[5].x; lMemStore[192] = a[6].x; lMemStore[224] = a[7].x; barrier(CLK_LOCAL_MEM_FENCE); a[0].x = lMemLoad[0]; a[1].x = lMemLoad[4]; a[2].x = lMemLoad[8]; a[3].x = lMemLoad[12]; a[4].x = lMemLoad[16]; a[5].x = lMemLoad[20]; a[6].x = lMemLoad[24]; a[7].x = lMemLoad[28]; barrier(CLK_LOCAL_MEM_FENCE); lMemStore[0] = a[0].y; lMemStore[32] = a[1].y; lMemStore[64] = a[2].y; lMemStore[96] = a[3].y; lMemStore[128] = a[4].y; lMemStore[160] = a[5].y; lMemStore[192] = a[6].y; lMemStore[224] = a[7].y; barrier(CLK_LOCAL_MEM_FENCE); a[0].y = lMemLoad[0]; a[1].y = lMemLoad[4]; a[2].y = lMemLoad[8]; a[3].y = lMemLoad[12]; a[4].y = lMemLoad[16]; a[5].y = lMemLoad[20]; a[6].y = lMemLoad[24]; a[7].y = lMemLoad[28]; barrier(CLK_LOCAL_MEM_FENCE); fftKernel8(a + 0, dir); l = ((bNum << 2) + i) >> 0; k = j << 0; ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 0)); w = (float2)(native_cos(ang), native_sin(ang)); a[0] = complexMul(a[0], w); ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 8)); w = (float2)(native_cos(ang), native_sin(ang)); a[1] = complexMul(a[1], w); ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 16)); w = (float2)(native_cos(ang), native_sin(ang)); a[2] = complexMul(a[2], w); ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 24)); w = (float2)(native_cos(ang), native_sin(ang)); a[3] = complexMul(a[3], w); ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 32)); w = (float2)(native_cos(ang), native_sin(ang)); a[4] = complexMul(a[4], w); ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 40)); w = (float2)(native_cos(ang), native_sin(ang)); a[5] = complexMul(a[5], w); ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 48)); w = (float2)(native_cos(ang), native_sin(ang)); a[6] = complexMul(a[6], w); ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 56)); w = (float2)(native_cos(ang), native_sin(ang)); a[7] = complexMul(a[7], w); lMemStore = sMem + mad24(i, 65, j << 0); lMemLoad = sMem + mad24(tid >> 6, 65, tid & 63); lMemStore[ 0] = a[0].x; lMemStore[ 8] = a[1].x; lMemStore[ 16] = a[2].x; lMemStore[ 24] = a[3].x; lMemStore[ 32] = a[4].x; lMemStore[ 40] = a[5].x; lMemStore[ 48] = a[6].x; lMemStore[ 56] = a[7].x; barrier(CLK_LOCAL_MEM_FENCE); a[0].x = lMemLoad[0]; a[1].x = lMemLoad[32]; a[2].x = lMemLoad[65]; a[3].x = lMemLoad[97]; a[4].x = lMemLoad[130]; a[5].x = lMemLoad[162]; a[6].x = lMemLoad[195]; a[7].x = lMemLoad[227]; barrier(CLK_LOCAL_MEM_FENCE); lMemStore[ 0] = a[0].y; lMemStore[ 8] = a[1].y; lMemStore[ 16] = a[2].y; lMemStore[ 24] = a[3].y; lMemStore[ 32] = a[4].y; lMemStore[ 40] = a[5].y; lMemStore[ 48] = a[6].y; lMemStore[ 56] = a[7].y; barrier(CLK_LOCAL_MEM_FENCE); a[0].y = lMemLoad[0]; a[1].y = lMemLoad[32]; a[2].y = lMemLoad[65]; a[3].y = lMemLoad[97]; a[4].y = lMemLoad[130]; a[5].y = lMemLoad[162]; a[6].y = lMemLoad[195]; a[7].y = lMemLoad[227]; barrier(CLK_LOCAL_MEM_FENCE); indexOut += tid; out += indexOut; out[0] = a[0]; out[32] = a[1]; out[64] = a[2]; out[96] = a[3]; out[128] = a[4]; out[160] = a[5]; out[192] = a[6]; out[224] = a[7]; } __kernel void fft1(__global float2 *in, __global float2 *out, int dir, int S, __global float2 * cossinLUT1, __global float2 * cossinLUT2 ) { __local float sMem[256]; int i, j, r, indexIn, indexOut, index, tid, bNum, xNum, k, l; int s, ii, jj, offset; float2 w; float ang, angf, ang1; __local float *lMemStore, *lMemLoad; float2 a[8]; int lId = get_local_id( 0 ); int groupId = get_group_id( 0 ); bNum = groupId & 127; xNum = groupId >> 7; indexIn = mul24(bNum, 4); tid = indexIn; i = tid >> 6; j = tid & 63; indexOut = mad24(i, 4096, j); indexIn += (xNum << 15); indexOut += (xNum << 15); tid = lId; i = tid & 3; j = tid >> 2; indexIn += mad24(j, 512, i); in += indexIn; a[0] = in[0]; a[1] = in[4096]; a[2] = in[8192]; a[3] = in[12288]; a[4] = in[16384]; a[5] = in[20480]; a[6] = in[24576]; a[7] = in[28672]; fftKernel8(a, dir); ang = dir*(0x1.921fb54442d18p-4) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[1] = complexMul(a[1], w); ang = dir*(1.9634954631e-001) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[2] = complexMul(a[2], w); ang = dir*(2.9452431947e-001) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[3] = complexMul(a[3], w); ang = dir*(3.9269909263e-001) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[4] = complexMul(a[4], w); ang = dir*(4.9087386578e-001) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[5] = complexMul(a[5], w); ang = dir*(5.8904863894e-001) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[6] = complexMul(a[6], w); ang = dir*(6.8722341210e-001) * (j); w = (float2)(native_cos(ang), native_sin(ang)); a[7] = complexMul(a[7], w); indexIn = mad24(j, 32, i); lMemStore = sMem + tid; lMemLoad = sMem + indexIn; lMemStore[0] = a[0].x; lMemStore[32] = a[1].x; lMemStore[64] = a[2].x; lMemStore[96] = a[3].x; lMemStore[128] = a[4].x; lMemStore[160] = a[5].x; lMemStore[192] = a[6].x; lMemStore[224] = a[7].x; barrier(CLK_LOCAL_MEM_FENCE); a[0].x = lMemLoad[0]; a[1].x = lMemLoad[4]; a[2].x = lMemLoad[8]; a[3].x = lMemLoad[12]; a[4].x = lMemLoad[16]; a[5].x = lMemLoad[20]; a[6].x = lMemLoad[24]; a[7].x = lMemLoad[28]; barrier(CLK_LOCAL_MEM_FENCE); lMemStore[0] = a[0].y; lMemStore[32] = a[1].y; lMemStore[64] = a[2].y; lMemStore[96] = a[3].y; lMemStore[128] = a[4].y; lMemStore[160] = a[5].y; lMemStore[192] = a[6].y; lMemStore[224] = a[7].y; barrier(CLK_LOCAL_MEM_FENCE); a[0].y = lMemLoad[0]; a[1].y = lMemLoad[4]; a[2].y = lMemLoad[8]; a[3].y = lMemLoad[12]; a[4].y = lMemLoad[16]; a[5].y = lMemLoad[20]; a[6].y = lMemLoad[24]; a[7].y = lMemLoad[28]; barrier(CLK_LOCAL_MEM_FENCE); fftKernel8(a + 0, dir); l = ((bNum << 2) + i) >> 6; k = j << 0; ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 0)); w = (float2)(native_cos(ang), native_sin(ang)); a[0] = complexMul(a[0], w); ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 8)); w = (float2)(native_cos(ang), native_sin(ang)); a[1] = complexMul(a[1], w); ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 16)); w = (float2)(native_cos(ang), native_sin(ang)); a[2] = complexMul(a[2], w); ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 24)); w = (float2)(native_cos(ang), native_sin(ang)); a[3] = complexMul(a[3], w); ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 32)); w = (float2)(native_cos(ang), native_sin(ang)); a[4] = complexMul(a[4], w); ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 40)); w = (float2)(native_cos(ang), native_sin(ang)); a[5] = complexMul(a[5], w); ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 48)); w = (float2)(native_cos(ang), native_sin(ang)); a[6] = complexMul(a[6], w); ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 56)); w = (float2)(native_cos(ang), native_sin(ang)); a[7] = complexMul(a[7], w); indexOut += mad24(j, 64, i); out += indexOut; out[0] = a[0]; out[512] = a[1]; out[1024] = a[2]; out[1536] = a[3]; out[2048] = a[4]; out[2560] = a[5]; out[3072] = a[6]; out[3584] = a[7]; } __kernel void fft2(__global float2 *in, __global float2 *out, int dir, int S, __global float2 * cossinLUT1, __global float2 * cossinLUT2 ) { int i, j, r, indexIn, indexOut, index, tid, bNum, xNum, k, l; int s, ii, jj, offset; float2 w; float ang, angf, ang1; __local float *lMemStore, *lMemLoad; float2 a[8]; int lId = get_local_id( 0 ); int groupId = get_group_id( 0 ); bNum = groupId & 127; xNum = groupId >> 7; indexIn = mul24(bNum, 32); tid = indexIn; i = tid >> 12; j = tid & 4095; indexOut = mad24(i, 32768, j); indexIn += (xNum << 15); indexOut += (xNum << 15); tid = lId; i = tid & 31; j = tid >> 5; indexIn += mad24(j, 4096, i); in += indexIn; a[0] = in[0]; a[1] = in[4096]; a[2] = in[8192]; a[3] = in[12288]; a[4] = in[16384]; a[5] = in[20480]; a[6] = in[24576]; a[7] = in[28672]; fftKernel8(a, dir); indexOut += mad24(j, 32768, i); out += indexOut; out[0] = a[0]; out[4096] = a[1]; out[8192] = a[2]; out[12288] = a[3]; out[16384] = a[4]; out[20480] = a[5]; out[24576] = a[6]; out[28672] = a[7]; } :-( |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
I suppose it's getting late over there. But... I got tried of watching the ATI cards sitting there with Suspended tasks, so, I installed Cat 12.1. I'm not getting any ATI Errors but I'm also not getting any ATI activity. It's been over 10 minutes and SIV still shows No ATI GPU load, GPUz says the same. The nVidia task is about a third of the way finished or else I'd pull the app_info and download some stock apps. Is there some link to the Stock Win ATI app so I can just swap out the apps and change the app_info? No GPU load on the 6770 or 4670... |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
stderr logs too short, post next 20-30 lines too. And better to attach into pastebin or another place such long files as fftPlan. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Well, another failing device's log: call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593 call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595 call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611 call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621 oclFFT context=1051930 AP main context=1051930 Dumping clFFT Plan to file call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719 call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580 Error in ap oclFFT_1: -34 ERROR: OpenCL kernel/call 'clFFT_ExecuteInterleaved_ap' call failed (-34) in file ..\..\ap_science.cpp near line 2680. Waiting 30 sec before restart... Hence, no issues with context per se it seems. Both main loop and oclFFT use absolutely the same context indeed.... Need to dig further. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
stderr logs too short, post next 20-30 lines too. I saved the files before I Updated to Catalyst 12.1. Everything seems to be working with Cat 12.1. Strange 1 driver version would make such a difference. Tasks for Computer 72229 More of the stderr.txt from the 6770 with Cat 11.12; INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2559.cl_Juniper.bin_V7_TWIN_FFA_CAL141646, continue with recompile... call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585 INFO: binary kernel file created call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130 call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139 call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459 call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474 call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480 call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486 call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516 call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549 call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577 call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580 call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582 call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583 call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593 call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595 call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611 call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621 WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_Juniper_32768_r2559.bin_CAL141646, continue with recompile... call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_setup.cpp near line 723 oclFFT context=27bb8a0 AP main context=27bb8a0 Dumping clFFT Plan to file call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719 call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580 Error in ap oclFFT_1: -34 ERROR: OpenCL kernel/call 'clFFT_ExecuteInterleaved_ap' call failed (-34) in file ..\..\ap_science.cpp near line 2680. Waiting 30 sec before restart... Running on device number: 0 DATA_CHUNK_UNROLL set to:4 FFA thread block override value:4096 FFA thread fetchblock override value:2048 CPU affinity adjustment enabled GPUlock enabled. Use -instances_per_device N switch to provide number of instances to run if BOINC is configured to launch few tasks per device. Maximum single buffer size set to:256MB Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: NVIDIA Corporation OpenCL platform detected: Advanced Micro Devices, Inc. call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752 BOINC assigns device 0, slots 0 to 0 (including) will be checked Used slot is 0; Info: BOINC provided OpenCL device ID used call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936 call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961 call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317 Info: CPU affinity mask used: 1 Used GPU device parameters are: Number of compute units: 10 Single buffer allocation size: 128MB Total device global memory: 512MB max WG size: 256 |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
More from the 4670 with Cat 11.12; INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2559.cl_ATIRV730.bin_V7_TWIN_FFA_CAL141646, continue with recompile... call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585 INFO: binary kernel file created call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130 call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139 call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459 call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474 call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480 call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486 call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516 call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549 call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577 call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580 call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582 call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583 call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593 call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595 call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611 call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621 WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_ATIRV730_32768_r2559.bin_CAL141646, continue with recompile... WARNING: patching required max_kernel_wg_size=32 oclFFT context=27bb8a0 AP main context=27bb8a0 Dumping clFFT Plan to file call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719 call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580 Error in ap oclFFT_1: -34 ERROR: OpenCL kernel/call 'clFFT_ExecuteInterleaved_ap' call failed (-34) in file ..\..\ap_science.cpp near line 2680. Waiting 30 sec before restart... Running on device number: 1 DATA_CHUNK_UNROLL set to:4 FFA thread block override value:4096 FFA thread fetchblock override value:2048 CPU affinity adjustment enabled GPUlock enabled. Use -instances_per_device N switch to provide number of instances to run if BOINC is configured to launch few tasks per device. Maximum single buffer size set to:256MB Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: NVIDIA Corporation OpenCL platform detected: Advanced Micro Devices, Inc. call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752 BOINC assigns device 1, slots 1 to 1 (including) will be checked Used slot is 1; Info: BOINC provided OpenCL device ID used call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936 call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961 call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317 Info: CPU affinity mask used: 2 Used GPU device parameters are: Number of compute units: 8 Single buffer allocation size: 128MB Total device global memory: 512MB max WG size: 128 |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
I'm glad that this issue shows itself only with limited number of AMD drivers versions. Please try to run this one: https://www.dropbox.com/s/6g2s6f09y928aqd/AP7_win_x86_SSE2_OpenCL_ATI_r2559_oclFFT_debug_2.7z under driver that experienced this issue. Again, I need stderr log. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
I'm glad that this issue shows itself only with limited number of AMD drivers versions. I can't open the file. I've downloaded it twice. Something wrong with the file? |
Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0 ![]() |
I'm glad that this issue shows itself only with limited number of AMD drivers versions. Are you using 7zip to extract it? It extracted OK here. Claggy |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
I suppose I forgot you have to Click on the link instead of doing a save as. So, I installed it while running Cat 12.1 and I get the same thing as before. BOINC Manager says it's running but SIV and GPUz doesn't show any GPU load. Before I go through changing drivers, is that normal? Is it suppose to not show any load with Catalyst 12.1? |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
I suppose I forgot you have to Click on the link instead of doing a save as. It's debug build with single purpose to generate log under 11.12. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
Here's the stderr.txt for the 6770 with cat 11.12; call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752 BOINC assigns device 0 Info: BOINC provided OpenCL device ID used call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936 call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961 call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317 ... INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2559.cl_Juniper.bin_V7_TWIN_FFA_CAL141646, continue with recompile... call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585 INFO: binary kernel file created call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130 call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139 call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459 call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474 call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480 call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486 call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516 call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549 call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577 call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580 call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582 call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583 call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593 call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595 call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611 call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621 WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_Juniper_32768_r2559.bin_CAL141646, continue with recompile... call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_setup.cpp near line 723 oclFFT context=7641178 AP main context=7641178 Dumping clFFT Plan to file call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719 call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580 ERROR: OpenCL kernel/call 'oclFFT_1 params' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 375. Waiting 30 sec before restart... Running on device number: 0 DATA_CHUNK_UNROLL set to:6 FFA thread block override value:2280 FFA thread fetchblock override value:1140 Maximum single buffer size set to:256MB Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: NVIDIA Corporation OpenCL platform detected: Advanced Micro Devices, Inc. call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752 BOINC assigns device 0 Info: BOINC provided OpenCL device ID used call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936 call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961 call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317 ... ### Restart at 0.00 percent. state.fold_buf_size_short=65536; state.fold_buf_size_long=262144 call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130 call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139 call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459 call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474 call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480 call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486 call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516 call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549 call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577 call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580 call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582 call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583 call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593 call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595 call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611 call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621 oclFFT context=7641178 AP main context=7641178 Dumping clFFT Plan to file call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719 call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580 ERROR: OpenCL kernel/call 'oclFFT_1 params' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 375. Waiting 30 sec before restart... The 4670; call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752 BOINC assigns device 1 Info: BOINC provided OpenCL device ID used call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936 call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961 call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317 ... INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2559.cl_ATIRV730.bin_V7_TWIN_FFA_CAL141646, continue with recompile... call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585 INFO: binary kernel file created call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130 call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139 call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459 call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474 call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480 call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486 call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516 call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549 call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577 call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580 call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582 call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583 call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593 call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595 call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611 call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621 WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_ATIRV730_32768_r2559.bin_CAL141646, continue with recompile... WARNING: patching required max_kernel_wg_size=32 oclFFT context=7641178 AP main context=7641178 Dumping clFFT Plan to file call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719 call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580 ERROR: OpenCL kernel/call 'oclFFT_1 params' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 375. Waiting 30 sec before restart... Running on device number: 1 DATA_CHUNK_UNROLL set to:6 FFA thread block override value:2280 FFA thread fetchblock override value:1140 Maximum single buffer size set to:256MB Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: NVIDIA Corporation OpenCL platform detected: Advanced Micro Devices, Inc. call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752 BOINC assigns device 1 Info: BOINC provided OpenCL device ID used call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936 call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961 call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317 ... ### Restart at 0.00 percent. state.fold_buf_size_short=65536; state.fold_buf_size_long=262144 call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130 call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139 call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459 call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474 call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480 call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486 call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516 call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549 call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577 call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580 call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582 call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583 call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593 call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595 call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611 call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621 WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_ATIRV730_32768_r2559.bin_CAL141646, continue with recompile... WARNING: patching required max_kernel_wg_size=32 oclFFT context=7641178 AP main context=7641178 Dumping clFFT Plan to file call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719 call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580 ERROR: OpenCL kernel/call 'oclFFT_1 params' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 375. Waiting 30 sec before restart... |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Thanks, that's line I expected to see: ERROR: OpenCL kernel/call 'oclFFT_1 params' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 375. So, not kernel call failure but parameters setup failure. It allows further debugging. Will post next attempt soon. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0 ![]() |
Thanks, that's line I expected to see: Using TWIN_FFA these should be multiples of 128 i thought! But using good values from old APv6 might not work anymore : FFA thread block override value:2280 FFA thread fetchblock override value:1140 _\|/_ U r s |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
....might not work anymore : Well, how would it not work? It seems to be working? http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=17413256 http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=17411937 |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
[quote]Thanks, that's line I expected to see: I hope no. No additional restrictions added. EDIT: also, failure in oclFFT, TWIN_FFA is FFA mod, FFT in mainloop. Should be not connected (that is, even if those setting fail inside FFA, they should not prevent normal FFT work, only -unroll N influences on FFT calls). News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
New build to try: https://www.dropbox.com/s/fo7cda1sz7pg9se/AP7_win_x86_SSE2_OpenCL_ATI_r2567_oclFFT_debug_3.7z Example of valid output: .... oclFFT context=33074b8 AP main context=33074b8 Dumping clFFT Plan to file call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719 call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580 oclFFT1 param0 value (read buf): 3d573b0 call 'oclFFT1, param 0' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 383 oclFFT1 param1 value (write buf): 3d572c0 call 'oclFFT1, param 1' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 386 oclFFT1 param2 value (direction): -1 call 'oclFFT1, param 2' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 389 oclFFT1 param3 value (s,batch size): 2 call 'oclFFT1, param 3' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 392 oclFFT1 param4 value (cossin_LUT_d1): 0 call 'oclFFT1, param 4' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 395 oclFFT1 param5 value (cossin_LUT_d2): 0 call 'oclFFT1, param 5' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 398 call 'oclFFT_1 params' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 399 in ap oclFFT_1 ok. .... And I have some suspiction what fails for 11.12 driver. With last versions I updated oclFFT merging improvements done by Einstein@home's developers (Oliver & others). They added additional path (via lookup tables) but currently non-iGPU AP uses old native trigonometry (enough precision on all but iGPU and fastest) hence LuT buffers are NULL (bolded in output). Older versions just didn't have those params at all. So, it's possible that 11.12 doesn't allow NULL pointer as valid value for kernel param of cl_mem type. Will see when logs will be available. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
Still getting the restart. 6770; call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752 BOINC assigns device 0 Info: BOINC provided OpenCL device ID used call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936 call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961 call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317 ... state.fold_buf_size_short=65536; state.fold_buf_size_long=262144 INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2567.cl_Juniper.bin_V7_TWIN_FFA_CAL141646, continue with recompile... call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585 INFO: binary kernel file created call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130 call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139 call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459 call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474 call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480 call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486 call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516 call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549 call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577 call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580 call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582 call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583 call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593 call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595 call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611 call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621 WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_Juniper_32768_r2567.bin_CAL141646, continue with recompile... call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_setup.cpp near line 723 oclFFT context=27ba158 AP main context=27ba158 Dumping clFFT Plan to file call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719 call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580 oclFFT1 param0 value (read buf): 27daba8 call 'oclFFT1, param 0' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 383 oclFFT1 param1 value (write buf): 281a118 call 'oclFFT1, param 1' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 386 oclFFT1 param2 value (direction): -1 call 'oclFFT1, param 2' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 389 oclFFT1 param3 value (s,batch size): 6 call 'oclFFT1, param 3' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 392 oclFFT1 param4 value (cossin_LUT_d1): 0 ERROR: OpenCL kernel/call 'oclFFT1, param 4' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 395. Waiting 30 sec before restart... 4670; call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752 BOINC assigns device 1 Info: BOINC provided OpenCL device ID used call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936 call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961 call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287 call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317 ... state.fold_buf_size_short=65536; state.fold_buf_size_long=262144 INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2567.cl_ATIRV730.bin_V7_TWIN_FFA_CAL141646, continue with recompile... call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585 INFO: binary kernel file created call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130 call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139 call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459 call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474 call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480 call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486 call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516 call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549 call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576 call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577 call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580 call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582 call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583 call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593 call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595 call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611 call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621 WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_ATIRV730_32768_r2567.bin_CAL141646, continue with recompile... WARNING: patching required max_kernel_wg_size=32 oclFFT context=27ba158 AP main context=27ba158 Dumping clFFT Plan to file call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719 call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580 oclFFT1 param0 value (read buf): 744bc50 call 'oclFFT1, param 0' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 383 oclFFT1 param1 value (write buf): 27b4f10 call 'oclFFT1, param 1' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 386 oclFFT1 param2 value (direction): -1 call 'oclFFT1, param 2' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 389 oclFFT1 param3 value (s,batch size): 6 call 'oclFFT1, param 3' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 392 oclFFT1 param4 value (cossin_LUT_d1): 0 ERROR: OpenCL kernel/call 'oclFFT1, param 4' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 395. Waiting 30 sec before restart... |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Just as I thought! 11.12 can't accept NULL for buffer. I would say it's driver bug, cause nothing about it in OpenCL specs as far as I can recall. But workaround is possible. Will code workaround, stay tuned. News about SETI opt app releases: https://twitter.com/Raistmer |
©2021 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.