Message boards :
Number crunching :
Host keeps on freezing [4] NVIDIA GeForce GTX 690 (1999MB) driver: 430.40 OpenCL: 1.2
Message board moderation
Author | Message |
---|---|
elec999 ![]() Send message Joined: 24 Nov 02 Posts: 375 Credit: 416,969,548 RAC: 141 ![]() ![]() |
Host keeps on freezing [4] NVIDIA GeForce GTX 690 (1999MB) driver: 430.40 OpenCL: 1.2 This host keeps on freezing every few days. I need to pull its power and plug back in. Anyway to see why? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13882 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Anyway to see why? No. You need to track down the cause, using the process of elimination to find out what's causing the issue. Check the GPU temperatures. Test the system's memory.Try it with 1, 2, 3 then 4 GPUs. Try a different power supply if you have one. Run something like Process Explorer and make sure there isn't some other software process causing issues. Considering all it is doing is pumping out mostly errors, i'd suggest running it with just one video card at a time for several days to make sure each card is OK, then when all of them check out OK, add another card & monitor the results. I'd also keep an eye on PSU voltages as you add cards. <![CDATA[ <message> too many boinc_temporary_exit()s</message> <stderr_txt> Not using mb_cmdline.txt-file, using commandline options. Running on device number: 1 WARNING: boinc_get_opencl_ids failed with code -1 Error: Getting Platforms. (clGetPlatformsIDs) BOINC assigns slot on device #1. WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities ERROR: OpenCL kernel/call 'clGetDeviceIDs (second call)' call failed (-32) in file ../../src/GPU_lock.cpp near line 1315. Waiting 30 sec before restart... Not using mb_cmdline.txt-file, using commandline options. Running on device number: 3 WARNING: boinc_get_opencl_ids failed with code -1 Error: Getting Platforms. (clGetPlatformsIDs) BOINC assigns slot on device #3. WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities ERROR: OpenCL kernel/call 'clGetDeviceIDs (second call)' call failed (-32) in file ../../src/GPU_lock.cpp near line 1315. Waiting 30 sec before restart... Not using mb_cmdline.txt-file, using commandline options. Running on device number: 0 WARNING: boinc_get_opencl_ids failed with code -1 Error: Getting Platforms. (clGetPlatformsIDs) BOINC assigns slot on device #0. WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities ERROR: OpenCL kernel/call 'clGetDeviceIDs (second call)' call failed (-32) in file ../../src/GPU_lock.cpp near line 1315. Waiting 30 sec before restart... Not using mb_cmdline.txt-file, using commandline options. Running on device number: 2 WARNING: boinc_get_opencl_ids failed with code -1 Error: Getting Platforms. (clGetPlatformsIDs) BOINC assigns slot on device #2. WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities ERROR: OpenCL kernel/call 'clGetDeviceIDs (second call)' call failed (-32) in file ../../src/GPU_lock.cpp near line 1315. Waiting 30 sec before restart... Not using mb_cmdline.txt-file, using commandline options. Running on device number: 1 WARNING: boinc_get_opencl_ids failed with code -1 Error: Getting Platforms. (clGetPlatformsIDs) BOINC assigns slot on device #1. WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities etc, etc, etc, etc, etc... Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 ![]() ![]() |
Host keeps on freezing [4] NVIDIA GeForce GTX 690 (1999MB) driver: 430.40 OpenCL: 1.2 How much power is that computer consuming? You have a kill o watt meter on it? How big is your PSU? ![]() ![]() |
elec999 ![]() Send message Joined: 24 Nov 02 Posts: 375 Credit: 416,969,548 RAC: 141 ![]() ![]() |
PSU is XFX 1250watt |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 ![]() ![]() |
PSU is XFX 1250watt I think your PSU is insufficient for those 4 GPUs. I run EVGA T2 1600W for my 4 1080Ti FTW. The only way to know for sure is get a Watt-a -meter. Amazon sells them and so do electronic stores. https://www.amazon.com/P3-International-P4460-Electricity-Monitor/dp/B000RGF29Q/ref=sr_1_5?crid=1X0Q09PNJ5LE3&keywords=watt+a+meter&qid=1565620124&s=gateway&sprefix=watt+a+m%2Caps%2C154&sr=8-5 ![]() ![]() |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
1200W should be more than enough. however it's possible the PSU could still be at fault for other reasons. I'm not sure XFX has the best track record for PSUs and that model is quite old now. Boinc reports 4x GPUs because the 690 is a dual GPU card, he only has 2 physical cards, each with 2 GPUs = Boinc says it's 4. each card is rated for ~300W Your 4x 1080ti will use more power than his 2x 690 Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 ![]() ![]() |
1200W should be more than enough. however it's possible the PSU could still be at fault for other reasons. I'm not sure XFX has the best track record for PSUs and that model is quite old now. I knew that about the 690 (Juan had some I believe) but he himself post that he has 4 - 690s not 2 -690s so if it really is 4 then he's running at a min 1200 watts ( usually it's more) and then you have the degradation of the PSU over time. I'm just surprised it hadn't act up before now. ![]() ![]() |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
based on the list of hosts you have, I can't tell which system used to have the 690s in it. which host ID was it? there are several OpenCL related errors in the file Grant posted, and I see several of your systems are lacking OpenCL drivers. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
but he himself post that he has 4 - 690s not 2 -690s he copy and pasted the BOINC reporting info. I do not believe he actually has 4x 690 cards. BOINC would show [8] in that case. he actually confirmed that it is only 2x690s in a previous post: https://setiathome.berkeley.edu/forum_thread.php?id=84490&postid=2005512#2005512 What's me doing wrong with my 690x2 host. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
based on the list of hosts you have, I can't tell which system used to have the 690s in it. which host ID was it? it appears to be this Host: https://setiathome.berkeley.edu/show_host_detail.php?hostid=8782986 It says OpenCL is installed, but maybe it's a good idea to just wipe out all drivers and reinstall them fresh. You will have to do some hands on troubleshooting. Possibilities: one or more GPUs are defective one or more GPUs are having thermal issues, what are the temps when it's running? driver problems, try a fresh install while your PSU has enough capacity, it's an old model and could also be degraded/failing other hardware issues, defective memory or SSD/HDD/MB you'll need to check all of these things. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
From my 690 days.... long time ago... IIRC Besides the usual problems... old PSU, bad capacitors, etc. Look at the memory usage, the 690 has 2 GB but each 1/2 GPU could use only 1 GB, so keep that in mind on your config file or you will easy get a lot of errors. ![]() |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
that's a good point Juan, only 1GB available to each GPU. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
elec999 ![]() Send message Joined: 24 Nov 02 Posts: 375 Credit: 416,969,548 RAC: 141 ![]() ![]() |
From my 690 days.... long time ago... IIRC How can I fix the ram issue? |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
If you have a -sbs statement in any command line, reduce it to only 512 or 768 to stay under the 1024 limit for each gpu. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.