Message boards :
Number crunching :
Some tasks postponing...
Message board moderation
Author | Message |
---|---|
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 ![]() |
I have noticed that WUs starting with numbers like 15fe, 15ap, 12jl and 12mr run for about 15 seconds and then are postponed on my t1700 machine. The WUs starting with blc26 run all the way to completion. I have had to suspend those postponed task to get a blc26 WU to start. The computer is running Ubuntu 18.04.2 LTS (4.15.0-51-generic) and I am using the Beta SOG app for the GPU's. The only thing I changed was going from 50% to 25% CPU use through Boinc Manager. I have subsequently changed it back to 50% and restarted BOINC but no dice. Anyone have any ideas how to fix this? |
![]() ![]() ![]() Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 ![]() ![]() |
Does it say anything useful in the event log? |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 ![]() |
Does it say anything useful in the event log? Not that I can see Thu 06 Jun 2019 11:27:36 AM PDT | SETI@home | Task 12jl12ab.17477.81727.10.37.195_1 postponed for 30 seconds: Thu 06 Jun 2019 11:27:52 AM PDT | SETI@home | Task 15ap10ab.29110.4161.4.31.14_0 postponed for 30 seconds: Thu 06 Jun 2019 11:28:06 AM PDT | SETI@home | Task 12mr09aa.24054.23794.9.36.36_0 postponed for 30 seconds: Thu 06 Jun 2019 11:28:23 AM PDT | SETI@home | Task 12jl12ab.17477.81727.10.37.195_1 postponed for 30 seconds: Thu 06 Jun 2019 11:28:39 AM PDT | SETI@home | Task 15ap10ab.29110.4161.4.31.14_0 postponed for 30 seconds: Thu 06 Jun 2019 11:28:54 AM PDT | SETI@home | Task 12mr09aa.24054.23794.9.36.36_0 postponed for 30 seconds: Thu 06 Jun 2019 11:29:11 AM PDT | SETI@home | Task 12jl12ab.17477.81727.10.37.195_1 postponed for 30 seconds: Thu 06 Jun 2019 11:29:27 AM PDT | SETI@home | Task 15ap10ab.29110.4161.4.31.14_0 postponed for 30 seconds: Thu 06 Jun 2019 11:29:41 AM PDT | SETI@home | Task 12mr09aa.24054.23794.9.36.36_0 postponed for 30 seconds: Thu 06 Jun 2019 11:29:58 AM PDT | SETI@home | Task 12jl12ab.17477.81727.10.37.195_1 postponed for 30 seconds: Thu 06 Jun 2019 11:30:14 AM PDT | SETI@home | Task 15ap10ab.29110.4161.4.31.14_0 postponed for 30 seconds: Thu 06 Jun 2019 11:30:28 AM PDT | SETI@home | Task 12mr09aa.24054.23794.9.36.36_0 postponed for 30 seconds: Thu 06 Jun 2019 11:30:45 AM PDT | SETI@home | Task 12jl12ab.17477.81727.10.37.195_1 postponed for 30 seconds: Thu 06 Jun 2019 11:31:01 AM PDT | SETI@home | Task 15ap10ab.29110.4161.4.31.14_0 postponed for 30 seconds: Thu 06 Jun 2019 11:31:15 AM PDT | SETI@home | Task 12mr09aa.24054.23794.9.36.36_0 postponed for 30 seconds: Thu 06 Jun 2019 11:31:33 AM PDT | SETI@home | Task 12jl12ab.17477.81727.10.37.195_1 postponed for 30 seconds: Thu 06 Jun 2019 11:31:49 AM PDT | SETI@home | Task 15ap10ab.29110.4161.4.31.14_0 postponed for 30 seconds: Thu 06 Jun 2019 11:32:03 AM PDT | SETI@home | Task 12mr09aa.24054.23794.9.36.36_0 postponed for 30 seconds: Thu 06 Jun 2019 11:32:20 AM PDT | SETI@home | Task 12jl12ab.17477.81727.10.37.195_1 postponed for 30 seconds: Thu 06 Jun 2019 11:32:36 AM PDT | SETI@home | Task 15ap10ab.29110.4161.4.31.14_0 postponed for 30 seconds: Thu 06 Jun 2019 11:32:50 AM PDT | SETI@home | Task 12mr09aa.24054.23794.9.36.36_0 postponed for 30 seconds: Thu 06 Jun 2019 11:33:07 AM PDT | SETI@home | Task 12jl12ab.17477.81727.10.37.195_1 postponed for 30 seconds: Thu 06 Jun 2019 11:33:23 AM PDT | SETI@home | Task 15ap10ab.29110.4161.4.31.14_0 postponed for 30 seconds: |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
My guess is the compute kernel primitives are corrupted. I would exit BOINC and delete all the files in the ComputeCache folder and then restart BOINC. The ComputeCache is here in Ubuntu 18.04. /home/{username}/.nv/ComputeCache The .nv directory is a hidden directory. You can view it by turning on Show Hidden Files in the File Manager hamburger menu. The gpu app will recreate all the primitives once it starts crunching again and that should clear up the too many exits. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 ![]() |
I deleted all files and folders in ComputeCache folder and the postponing problem still persists. BTW the ComputeCache folder was here: /var/lib/boinc-client/.nv/ and not in the home directory. The contents of the ComputeCache folder: /var/lib/boinc-client/.nv/ComputeCache# ls 0 1 2 3 4 5 6 7 8 9 a b c d e f index Also this is a headless system so I have to ssh into the computer using the terminal. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
OK, didn't know you were running the repo client or were headless. Reinstall the graphics drivers if deleting the compute cache didn't work. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 ![]() |
Ok. I upgraded from 390 to 430 (I probably should have just reinstalled 390) and when I restarted boinc I got a missing .cl file error. I aborted the WUs and of course I am in the penalty box. I will keep you updated when I get some GPU WUs. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Ok. I upgraded from 390 to 430 (I probably should have just reinstalled 390) and when I restarted boinc I got a missing .cl file error. I aborted the WUs and of course I am in the penalty box. I will keep you updated when I get some GPU WUs. Shouldn't have aborted, just stopped BOINC and installed the OpenCL drivers. For some reason, some people don't get the full meta package of the new drivers that is missing the OpenCL components. All you have to do is install them with: sudo apt-get install ocl-icd-libopencl1 Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Since I hate dumping work because of a simple oversight, I always run clinfo after any changes to the system to verify both the CUDA and OpenCL drivers are present before I fire up BOINC. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 ![]() |
Ok. I upgraded from 390 to 430 (I probably should have just reinstalled 390) and when I restarted boinc I got a missing .cl file error. I aborted the WUs and of course I am in the penalty box. I will keep you updated when I get some GPU WUs. I tiried to install it and it came back I already have the newest version. I must have gotten it when I installed 430. I installed clinfo. Meanwhile I am still in the penalty box... |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I tiried to install it and it came back I already have the newest version. I must have gotten it when I installed 430. So what does the output of clinfo show for your OpenCL driver component? Look at the end of the output in the Terminal window for the status of OpenCL. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 ![]() |
sudo apt-get install ocl-icd-libopencl1 Reading package lists... Done Building dependency tree Reading state information... Done ocl-icd-libopencl1 is already the newest version (2.2.11-1ubuntu1). 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. clinfo NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] Success [NV] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.176 nvidia-smi Fri Jun 7 07:15:10 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.14 Driver Version: 430.14 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 720 Off | 00000000:01:00.0 N/A | N/A | | 19% 41C P0 N/A / N/A | 0MiB / 980MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GT 730 Off | 00000000:05:00.0 N/A | N/A | | 30% 41C P0 N/A / N/A | 0MiB / 2002MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | | 1 Not Supported | +-----------------------------------------------------------------------------+ |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 ![]() |
I got it fixed. The t1700 is crunching away on its GPUs I changed to this special app: MBv8_8.23r3602_sse2_clNV_SoG_x86_64-pc-linux-gnu and added this file: MultiBeam_Kernels_r3602.cl to the /var/lib/boinc-client/projects/setiathome.berkeley.edu folder Thanks for the help |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Didn't realize you had the CUDA toolkit installed. That was probably the problem. The toolkit installs its own files and drivers outside of the normal downloaded versions in a different location. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 ![]() |
Shows how much I know. I thought that the CUDA toolkit was necessary to crunch. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Shows how much I know. I thought that the CUDA toolkit was necessary to crunch. No the CUDA toolkit is not necessary to crunch. All that is needed is the manufacturer drivers with both the CUDA and OpenCL components. The CUDA toolkit installs its own driver versions that is normally not the same as the current manufacturer driver version unless you specifically tell the toolkit installer NOT to install the toolkit driver and keep the manufacturer drivers. The toolkit drivers get installed into a different location from the normal manufacturer drivers. They also change the Path and LD Library paths and environment variables. Likely the reason why even after you installed the OpenCL library separately they were not found. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 ![]() |
I have this same problem. Just started on a new 18.04 system I put together yesterday. I googled around and found several of the "postponed for 30 seconds" problems but no solutions seems to work and I probably need to enable some log flags to get more info about that is happening. Ati drivers were installed from amdgpu-pro-19.10-785425-ubuntu-18.04.tar using sudo ./amdgpu-install --OpenCL=legacy WCG tasks run fine but gpu tasks have problems: all milkway error out (no 30 second stuff) all setiathome run 2 seconds and generate that 30 second error message Going to post this over at boinc questions also. This is the first Linux system I have built that uses those S9000 "pro" AMD cards. I noticed that "amdgpu-pro-install" is just a link to amdgpu-install so there is no longer any specific "pro" driver from AMD for Linux unlike windows.. 1 7/19/2019 12:00:16 PM Starting BOINC client version 7.14.2 for x86_64-pc-linux-gnu 2 7/19/2019 12:00:16 PM log flags: file_xfer, sched_ops, task 3 7/19/2019 12:00:16 PM Libraries: libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3 4 7/19/2019 12:00:16 PM Data directory: /var/lib/boinc-client 5 7/19/2019 12:00:17 PM OpenCL: AMD/ATI GPU 0: ATI FirePro V (FireGL V) Graphics Adapter (driver version 2841.4, device version OpenCL 1.2 AMD-APP (2841.4), 3072MB, 3072MB available, 1613 GFLOPS peak) 6 7/19/2019 12:00:17 PM OpenCL: AMD/ATI GPU 1: ATI FirePro V (FireGL V) Graphics Adapter (driver version 2841.4, device version OpenCL 1.2 AMD-APP (2841.4), 3072MB, 3072MB available, 1613 GFLOPS peak) 7 7/19/2019 12:00:17 PM OpenCL: AMD/ATI GPU 2: ATI FirePro V (FireGL V) Graphics Adapter (driver version 2841.4, device version OpenCL 1.2 AMD-APP (2841.4), 3072MB, 3072MB available, 1613 GFLOPS peak) 8 7/19/2019 12:00:17 PM [libc detection] gathered: 2.27, Ubuntu GLIBC 2.27-3ubuntu1 9 7/19/2019 12:00:17 PM Host name: jysdualxeon 10 7/19/2019 12:00:17 PM Processor: 24 GenuineIntel Intel(R) Xeon(R) CPU X5675 @ 3.07GHz [Family 6 Model 44 Stepping 2] 11 7/19/2019 12:00:17 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmper 12 7/19/2019 12:00:17 PM OS: Linux Ubuntu: Ubuntu 18.04.2 LTS [4.18.0-25-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] 13 7/19/2019 12:00:17 PM Memory: 11.72 GB physical, 2.00 GB virtual 14 7/19/2019 12:00:17 PM Disk: 109.53 GB total, 95.54 GB free 15 7/19/2019 12:00:17 PM Local time is UTC -5 hours 16 7/19/2019 12:00:17 PM Config: GUI RPC allowed from any host 17 7/19/2019 12:00:17 PM Config: GUI RPCs allowed from: 18 7/19/2019 12:00:17 PM jysarea51 19 7/19/2019 12:00:17 PM Config: use all coprocessors 20 Milkyway@Home 7/19/2019 12:00:17 PM URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 810090; resource share 100 21 SETI@home 7/19/2019 12:00:17 PM URL http://setiathome.berkeley.edu/; Computer ID 8772790; resource share 100 and 100's of the following but no error'ed tasks 70 SETI@home 7/19/2019 12:01:51 PM Task 17jl19ab.1022.17245.15.42.242.vlar_1 postponed for 30 seconds: 71 SETI@home 7/19/2019 12:01:52 PM Task blc45_2bit_guppi_58642_03722_HIP55210_0016.25435.818.22.45.130.vlar_1 postponed for 30 seconds: 72 SETI@home 7/19/2019 12:01:53 PM Task blc42_2bit_guppi_58642_03722_HIP55210_0016.26020.409.22.45.219.vlar_1 postponed for 30 seconds: 73 SETI@home 7/19/2019 12:01:54 PM Task blc42_2bit_guppi_58642_03722_HIP55210_0016.26020.409.22.45.216.vlar_1 postponed for 30 seconds: 74 SETI@home 7/19/2019 12:01:55 PM Task blc42_2bit_guppi_58642_04042_HIP55426_0017.26054.409.22.45.119.vlar_1 postponed for 30 seconds: 75 SETI@home 7/19/2019 12:01:56 PM Task blc42_2bit_guppi_58642_04362_HIP55210_0018.26043.409.22.45.161.vlar_1 postponed for 30 seconds: 76 SETI@home 7/19/2019 12:01:57 PM Task 17jl19ab.1022.17245.15.42.252.vlar_0 postponed for 30 seconds: |
![]() Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 ![]() |
Plugged directly into the MB or on Riser cards? When the riser card stuff is not quite right. It postpones but then the gpu tasks are "waiting". Because it starts another up. So it generates a LOT of gpu tasks waiting. This doesn't exactly sound like your symptoms though. Tom A proud member of the OFA (Old Farts Association). |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 ![]() |
Plugged directly into the MB or on Riser cards? I just checked that and the risers are locked into position correctly. I had actually damaged an RX570 by not watching how the riser fit it in and have been careful ever since. While there are fusers for the 12v line there are none for the 3.3 and while there is no obvious damage on the PCB the lights blink and the fans spin but board never worked again. |
![]() Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 ![]() |
Plugged directly into the MB or on Riser cards? Unfortunately, I have had to resort to "unplug/plugin again" as well as replacing (if that is what you are using) the USB cables with higher grade Ugreen data transfer 3.0 cables. Tom A proud member of the OFA (Old Farts Association). |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.