Message boards :
Number crunching :
cuda 90 app not using as much GPU as it should?
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 ![]() ![]() |
OS Linux - This started as a discussion between myself and another person on why my i9-9900k was not as productive as my i7-3960x even though the i9 was running at a higher frequency. Well that turned out to be due to a memory module placed in the wrong slot (2 modules 4 slots) Anyway, in the course of things I sent him a message showing my nvidia state while crunching and he mentioned that he thought the GPU wasn't performing as highly as it should. So I am posting it here in the hopes that someone can point out how i can get the full use of it The i9 is at 48% and my i7 is running its nvidia at 28%, (this varies up and down over time but it seems to always be below 50%) both systems have the same nvidia card a GeForce GTX 1660 Ti and are using the same driver(440.26) and kernel (5.3.7-1-default) the app is setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90 The pcie slot in both machines is running at PCIe 3.0 x16 as it should according to lspci: "LnkSta: Speed 8GT/s (ok), Width x16 (ok)" with a total "possible" throughput of 15.75GB/s specified when i look up the spec for that "pcie ver/slot width" combo +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.26 Driver Version: 440.26 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 166... Off | 00000000:01:00.0 Off | N/A | | 40% 60C P2 107W / 130W | 1464MiB / 5941MiB | 48% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1479 G /usr/bin/X 85MiB | | 0 1946 G /usr/bin/kwin_x11 22MiB | | 0 1953 G /usr/bin/plasmashell 35MiB | | 0 2885 G /usr/bin/krunner 8MiB | | 0 25576 C ...x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90 1305MiB | +-----------------------------------------------------------------------------+ |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Well first I would add the -nobs parameter that has been suggested to you multiple times. <cmdline>-nobs</cmdline> Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
In addition to the -nobs flag, what % of the CPU are you using? You need to reserve some spare CPU resources to feed the GPU. If you allow BOINC to use 100% of the CPU, you will bottle the GPU. Set the CPU resource to 80-85% in the compute settings. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
![]() Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 ![]() ![]() |
Ok, so then what is the purpose of these in app_info.xml? <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> Another question: if you set 80-85% in computing preferences doesn't that end up reducing overall performance? When i try that the nvidia bounces all over the place, its not steady at some particular load level (within a few points) as I would expect |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Ok, so then what is the purpose of these in app_info.xml? They are configurations to tell the scheduler how much work can be processed when your host contacts the scheduler. From the client configuration document: avg_ncpus <ncpus>N</ncpus> https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Another question: if you set 80-85% in computing preferences doesn't that end up reducing overall performance? No, you need to reserve some cpu to handle the desktop or any other background process. The reason that the gpu task is under utilized is because you have overcommitted too much cpu to cpu tasks and the desktop and there is not enough cpu to support the gpu task. The gpu task thread is being starved for resources and is constantly being ignored for timeslices because the cpu threads are busy with cpu work and the desktop maintenance. You need to allocate at least one cpu thread for each gpu task. You do that by setting: <avg_ncpus>1</avg_ncpus> <ngpus>1</ngpus> You tell the gpu application to use a full cpu core to support the gpu thread by adding the -nobs parameter to the cmdline entry in the app_info or app_config files. <cmdline>-nobs</cmdline> Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 ![]() ![]() |
Ok, this is what i have now - look good? <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>801</version_num> <plan_class>cuda90</plan_class> <cmdline>-nobs</cmdline> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>1</avg_ncpus> <ngpus>1</ngpus> <file_ref> <file_name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90</file_name> <main_program/> </file_ref> </app_version> |
![]() Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 ![]() ![]() |
Watching nvidia-smi (watch -n 1) for a few minutes now shows my gpu usage at roughly 98% give or take 1-2% - thanks! I also had set computing preferences to 85% in boinc manager as you recommended earlier. I can check 'Host Average' in a day or two and see how things are affected. +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.26 Driver Version: 440.26 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 166... Off | 00000000:01:00.0 On | N/A | | 56% 63C P2 115W / 130W | 1482MiB / 5941MiB | 99% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1479 G /usr/bin/X 93MiB | | 0 1946 G /usr/bin/kwin_x11 22MiB | | 0 1953 G /usr/bin/plasmashell 35MiB | | 0 2885 G /usr/bin/krunner 8MiB | | 0 21797 C ...x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90 1307MiB | +-----------------------------------------------------------------------------+ asus-isa-0000 Adapter: ISA adapter cpu_fan: 0 RPM coretemp-isa-0000 Adapter: ISA adapter Package id 0: +64.0°C (high = +86.0°C, crit = +100.0°C) Core 0: +63.0°C (high = +86.0°C, crit = +100.0°C) Core 1: +61.0°C (high = +86.0°C, crit = +100.0°C) Core 2: +64.0°C (high = +86.0°C, crit = +100.0°C) Core 3: +62.0°C (high = +86.0°C, crit = +100.0°C) Core 4: +64.0°C (high = +86.0°C, crit = +100.0°C) Core 5: +62.0°C (high = +86.0°C, crit = +100.0°C) Core 6: +62.0°C (high = +86.0°C, crit = +100.0°C) Core 7: +62.0°C (high = +86.0°C, crit = +100.0°C) acpitz-acpi-0 Adapter: ACPI interface temp1: +27.8°C (crit = +119.0°C) Frequency: 4.182Ghz |
![]() Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 ![]() ![]() |
Whats the difference between setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90 and setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 is one faster than the other? |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22652 Credit: 416,307,556 RAC: 380 ![]() ![]() |
the "....90" app uses CUDA version 9 while the "...101" uses CUDA verson 10.1 Which is faster really depends on how old your hardware is, reports suggest that with the very latest GPUs CUDA 10.1 is faster, while for many of the oldest CUDA 9.0 is faster. But be aware that not all of the very oldest GPUs are supported by the latest versions of CUDA. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
![]() Send message Joined: 28 Apr 00 Posts: 35 Credit: 128,746,856 RAC: 230 ![]() ![]() |
Do not run more CPU tasks simultaneously as many cores your CPU has. The i9-9900K has 8 cores. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
The app_info looks fine as does your reported gpu utilization. Your cpu times look reasonable. Not too much difference between cpu_time and run_time, only about 5 minutes so your 85% cpu utilization shows you are not overcommitted with your cpu resources. You setup looks good now. Your cpu times and gpu times look reasonable for your hardware and applications now. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14686 Credit: 200,643,578 RAC: 874 ![]() ![]() |
You actually have a lot of (invisible) trailing spaces on the </coproc> line, which are causing the thread to stretch and become hard to read. Too late to do anything about it now, but if you clean up the file now, it'll be ready if you ever feel the need to post it again. |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
Whats the difference between the 101 app will be slightly faster on your 1660ti Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
![]() Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 ![]() ![]() |
I just switched one of my systems to cuda101 - if it pans out as a bit faster I'll switch the other also I didn't know what to put for version_num so i just used 801 as before and it seems to work Also I checked for trailing spaes In vim use this :highlight ExtraWhitespace ctermbg=red guibg=red :match ExtraWhitespace /\s\+$/and any trailing spaces show up in red |
![]() ![]() Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238 ![]() ![]() |
Whats the difference between Hi Ian, [edit] Duh! Doh!!! Belay that. I just looked at my ap_info.xml file and see that 10.1 is assigned to setiathome_v8. [/edit] [edit2] Just shows to go ya how long it's been since I upgraded to 10.1. I forgot how I did it. ;) [/edit2] I'm running a GTX 1660Ti on my i7 8086K PC. I was just looking at my app_config.xml file and I have a question about it: <app_config> <app_version> <app_name>setiathome_v8</app_name> <plan_class>cuda90</plan_class> <avg_ncpus>1</avg_ncpus> <ngpus>1</ngpus> <cmdline>-nobs -pfb 32</cmdline> </app_version> <app_version> <app_name>astropulse_v7</app_name> <plan_class>opencl_nvidia_100</plan_class> <avg_ncpus>1</avg_ncpus> <ngpus>1</ngpus> </app_version> </app_config> Have a great day! :) Siran CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath |
![]() Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 ![]() ![]() |
So after running awhile with the new settings I see the GPU's are working as expected but the CPU performance isn't. The i9 is lagging behind the i7 and the i9 has more threads. I gathered some data and it seems to confirm my visual observations. The average time to process MB WU's on the i9 is 4184 seconds each compared to the i7 at 3790. This isn't averaged over a lot of MB WU's but nevertheless it doesn't seem right to me, especially given that the i9 is running at 4.2Ghz vs the i7 at 3.7Ghz. Average GPU time on ERB1: 100 GPU WUs - average of 73.3 seconds each Average GPU time on ERB2: 100 GPU WUs - average of 61.4 seconds each Average CPU time on ERB1: 19 CPU WUs - average of 3790.4 seconds each Average CPU time on ERB2: 33 CPU WUs - average of 4184.6 seconds each both PC's are using MBv8_8.05r3345_avx_linux64 I have two other MB apps I can try (if people think they would be better) MBv8_8.22r3711_sse41_intel_x86_64-pc-linux-gnu MBv8_8.22r4008_avx2_intel_x86_64-pc-linux-gnu avx2 seems a step up from avx but probably it depends heavily on the app and if it can be taken advantage of or not. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
The only way to be sure is run few rounds with each app to be sure what is the best one on your particular host. I done that some time ago and in my case the best performance was with MBv8_8.22r3711_sse41_intel_x86_64-pc-linux-gnu So YMMV is in place, test all the sauces to be sure, just not forget, compare similar types of WU and with similar AR or your test is not valid. Otherwise be advice, there are some sweet spot on the number of CPU WU running at the same time on the CPU, above or below that the number of total crunched WU per hr/day is less. Again, test is the only way to be sure on a particular host. On mine old 12 thread i7-6850K CPU driving 4 GPUs, the best point is 4 GPU WU + 6 CPU WU at a time leaving 2 threads free. ![]() |
![]() Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 ![]() ![]() |
Well, the same app AND the same WU - which brings me back to my original question - is there some way to load the same WU on 2 different machines (multiple instances, 1 per boinc/seti cpu thread) so I can properly compare things? I'd be using it as a benchmark WU. There needs to be safeguards so it is never uploaded to a seti server |
![]() ![]() ![]() Send message Joined: 6 Nov 99 Posts: 716 Credit: 8,032,827 RAC: 62 ![]() ![]() |
there is some benchmark tools here http://lunatics.kwsn.info/index.php?action=downloads;cat=45 |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.