Message boards :
Number crunching :
have more GPUs than actually exist
Message board moderation
Author | Message |
---|---|
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 ![]() |
I have posted similar problems months ago at BOINC forum but they claim it is an AMD driver problem. Going to post it here though it wont do much good as they (BOINC) claim I am the only one with this problem. I do have a solution. Using a really old core-2 quad motherboard that had 5 Pci-E slots, I first put in 4 RX560 gpus on a fresh install of windows 10 and got SETI crunching on 4 V8-ATI units. All was fine until I added the remaining RX560 for a total of 5. On reboot, windows device manager shows 5 RX560, Techpowerup's GPUz shows 5 and so does CPUID. All windows apps show 5 GPUs but, unfortunately, BOINC 7.14.2 thinks I have 10 of them and assigns 10 tasks 5 of which will never complete as I have gone through this before. My solution then and now was to edit coproc_info.xml and remove the extra 5 GPU's and then make that file read-only so it cannot be changed back to 10 GPUs on BOINC restarting. Hope this helps someone yes, 5 can be done ![]() |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
It would help us BOINC developers to track down this problem if you could provide the exact details. I'd like to see the Event Log entries showing the 10 devices detected at startup, please - an exact copy'n'paste, showing driver version numbers and suchlike. I think you'll find help here from other setizens who have seen the same as you - you're not actually alone, but you need to collaborate with the wider community to get it sorted. Remember you'll need to unlock coproc_info.xml if you ever decide to change your real GPUs again. |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 ![]() |
OK, was able to duplicate the problem by renaming that coproc_info file and restarting boinc First, want to clarify what happened originally. The mombo had one x16 and four x1 and I put an RX560 into the X16 which covered the adjecent x1. The other three x1 had risers for their RX560s. The monitor was connected to the RX560 in the X16 slot. After verify SETI was crunching on 4 work units, I powered down, pulled the X16 which exposed the X1 and then used another pair of risers to get all 5 RX boards up and running. I suspect if I had put all 5 gpus into position at first then the problem may not have happened. Here are the 10 work units supposidly being crunched. IMG Here are the events corresponding to the 10 GPUs. TEXT Here is the 10 GPU coproc_info.xml file. TEXT Here are the 5 GPUs that reflect reality (from event file) TEXT Here is the working fine coproc file. TEXT If you compare the two coproc_info files you will see that I deleted the bottom five </ati_opencl> paragraphs after the edit, I marked the file as read only to prevent it from being re-created (one of the) Original discussion of problem HERE [EDIT] that XML was edited to only delete the gpus numbered from "5" on. windows and gpuz report only 5, boinc not getting correct info. ![]() |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Many thanks for that. I'm on UTC+1, so ten minutes to midnight - on my way to bed. I'll have a proper look in the morning. |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 ![]() |
Left ".xml" off of one of the urls corrected below Here is the 10 GPU coproc_info.xml file. TEXT unaccountably, the device numbers go from 0 to 9 instead of 0 to 4. Have no idea how the AMD driver could cause this problem. I had the idea originally of debugging this behavior using VS2017 and boinc GitHub sources, but the problem is probably in the openCL (or cuda??) library that boinc uses to enumerate the display devices. Apparently boinc cannot directly ask windowsr for this info unlike the device manager, GPUz, or CPUiD |
![]() ![]() Send message Joined: 8 Dec 08 Posts: 231 Credit: 28,112,547 RAC: 1 ![]() |
thank you. yeah for some reason when you have more then 1 gpu. it acts strange. not showing a said amount or outright wrong spec for a card ![]() |
![]() Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 ![]() |
I have had that issue in some sense of the word on both a Nvidia and an Amd gpu (2400G). My fix was to back grade to significantly lower driver versions. Since I don''t care that I am not running the very latest, it was another reasonable solution. What both of my excess gpus had in common is they were under Windows 10. Tom A proud member of the OFA (Old Farts Association). |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 ![]() |
I have a small boinc farm but I generally shut most of them down when summer starts as it is hot here in south Texas. I was experimenting with an old core2 quad Q9550s (low power) with five of those low power RX560 GPUs and put together a web based program that anyone can use to calculate efficiency. I estimate my "breadboard" system as using 45*5 + 65 + 110 (GPUs +CPU + remainder & loss) = 400 watts for a total of 1,253 watts per credit. Stats can be seen HERE. I was going to let this low power system run all summer. I do have an APC that can show exact power consumption but it is being used on another system. [EDIT] for some reason I had to take the www out of the above urls. Some sites require the www others seem not to. Something does not seem right since my previous posts used www and they are working unlike the above in the preview. If the above don't work on your browser then add www to the address. If I am not calculating the wattage correctly in the program then PM me. The sources are at GitHub under my name. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Left ".xml" off of one of the urlsNote that each virtual GPU has both a <device_num> and a <opencl_device_index>. The device numbers go from 0 to 9: the device indexes go from 0 to 4 and then restart, 0 to 4 again. As Juha - who is a very experienced developer - said in the 'previous' discussion, BOINC evaluates devices through software: it doesn't pretend to have direct hardware detection code. My analysis would be that BOINC enumerates the available drivers, and then runs through each driver, enumerating the devices it reports. <device_num> is an internal number, created and used only by BOINC <opencl_device_index> is an external number, reported by the OpenCL stack component from one or more driver installations. We can perhaps suggest to BOINC developers that the enumeration code should watch for and flag duplicated device_index numbers. There is already a complicated process for trying to uniquely identify devices which are both CUDA capable and OpenCL capable, so that BOINC doesn't try to run both a CUDA app and an OpenCL app on the same silicon at the same time. In the previous thread, Juha suggested that you inspect HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors, but I can't see any reply to that particular question. When I look at that key on my machine here, I see [HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors] "IntelOpenCL64.dll"=dword:00000000 "C:\\Windows\\System32\\nvopencl.dll"=dword:00000000showing how two OpenCL libraries can co-exist. It would be worth checking that, since you seem to have found a workround for the problem but not yet isolated the root cause. |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 ![]() |
thanks for looking Richard!
I have only the AMD driver at that location C:\WINDOWS\System32\DriverStore\FileRepository\c0340998.inf_amd64_4e7ad8ec950b7e37\B340755\amdocl64.dll dword:0 |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
it might be worthwhile to run DDU and then re-install the drivers fresh on a clean slate, to eliminate the possibility that old drivers are causing this problem. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
I have a small boinc farm but I generally shut most of them down when summer starts as it is hot here in south Texas. I was experimenting with an old core2 quad Q9550s (low power) with five of those low power RX560 GPUs and put together a web based program that anyone can use to calculate efficiency. I estimate my "breadboard" system as using 45*5 + 65 + 110 (GPUs +CPU + remainder & loss) = 400 watts for a total of 1,253 watts per credit. Stats can be seen HERE. I was going to let this low power system run all summer.There's something strange about that argument, with both the numbers and the units. You seem to have used rated TDP values for the power consumption, measuring the maximum possible power draw (needed to specify a safe power cabling solution), rather than the actual power draw during use. Some time ago, I put together this little table of power consumption, taken from a Killa-watt meter measuring the mains input to the system case only: Idle - BOINC not running: 22 watts Running NumberFields on 4 cores: 55 watts Running SETI x64 AVX on 4 cores: 69 watts ditto at VHAR: 71 wattsThat's a full i5-6500 CPU @ 3.20GHz system with SSD and HDD. Currently, that meter is showing 125 watts maximum, with the 4 cores, the Intel GPU, and a NVidia GTX 1050Ti all running. To discuss watts and credits in the same breath, you have to take the time dimension into account: 'watts' is an instantaneous measurement, 'credit' is earned over a period of time. Your utility company will bill you in kilowatt-hours: you'd probably measure credits in watt-seconds, aka Joules. for some reason I had to take the www out of the above urls. Some sites require the www others seem not to.That would depend how many versions of the address have been registered on the DNS servers for the domain. |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 ![]() |
There's something strange about that argument, with both the numbers and the units. You seem to have used rated TDP values for the power consumption, measuring the maximum possible power draw (needed to specify a safe power cabling solution), rather than the actual power draw during use. Some time ago, I put together this little table of power consumption, taken from a Killa-watt meter measuring the mains input to the system case only: Bear with me for a sec: Your i6500 is at http://setiathome.berkeley.edu/results.php?hostid=8121358&offset=0&show_names=0&state=4&appid= Your wattage of 125 is nice, I was just guessing on my system. Putting the above url into my program calculates 11.94 seconds for a single credit which gives 1492 joules expended during the 12 or so seconds. Lemme know what you think? ![]() |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Ah - I see what you've done. 125 watts of power x 11.94 seconds per credit = 1,492.5 watt-seconds, or joules, of energy. But if I come upstairs and suspend SETI (which was running on the 1050TI), the meter only drops from 125 watts to 80 watts - so the 1050Ti alone is drawing 45 watts (below rated TDP), and the marginal cost of a SETI credit is only 537.3 joules. |
![]() ![]() Send message Joined: 8 Dec 08 Posts: 231 Credit: 28,112,547 RAC: 1 ![]() |
on video cards. i have 3 set of dual video card set up. and the power draw they have ref. on the box/ spec site. is wrong under load at times. 580 from power color draw different then what spec on site ![]() |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
The TDP power spec that graphics cards companies publish is for graphics loads in games. Different animal when the card is doing compute loads. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 ![]() ![]() |
Oooh nice find.. This is my 2080Ti Host I took 230W of usage as i'm using the CPU core to feed the GPU. Avg: 31.7 30.0 63.4 STD: 54.4 54.2 119.4 0.50 seconds per credit from above info one device 1.9998 Credits per second for one device Times shown above were divided by number of concurrent tasks(1) 7,199 number of credits in an hour this system 170 total watts used by a single producing device (avg each work unit) 7,199 credits per hour for exactly one device(1 tasks) A kilowatt hour will theoretically produce maximum of 42,348 credits each device this PC Use the above KWH credits to compare this device with any other device as the overhead (idle) has been removed Actual credit product is less because the GPU has idle time between tasks ![]() _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
![]() Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 ![]() ![]() |
V0.99 is coming. You can run 2 at a time to fill the initialisation and post processing gap. GPU part is run one at a time. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
V0.99 is coming. You can run 2 at a time to fill the initialisation and post processing gap. GPU part is run one at a time. ![]() ![]() |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
V0.99 is coming. You can run 2 at a time to fill the initialisation and post processing gap. GPU part is run one at a time. What Juan said. WoW! I guess you managed to wrangle the code snippet oddbjornik threw in here for pre-initialization. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.