Message boards :
Number crunching :
Back after a 10 year hiatus. Catching up with the eGPU community, have some issues.
Message board moderation
Author | Message |
---|---|
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 ![]() ![]() |
So, my main system is a Mac mini 2018 with 2 eGPUs, a Vega 56 and an RX 580, and they're running 3 instances on each. I have them pretty well configured and stable, pushing out around 30K credits per day or more. It's a decent little system. Here's the issue. The internal intel_gpu is far too small to run 3 instances, and takes far too many processor resources to run, so I've excluded it in cc_config.xml, and run 4 instances on the intel CPUs. This keeps odd things from happening. Here is the machine: https://setiathome.berkeley.edu/show_host_detail.php?hostid=8704998 About 4 or 5 days ago, I start getting intel workunits, completely bypassing my exclusion. This has led me to aborting 102 work units every day since. I have tried my best to deal with this in every method I could research, command line options in the mb_blah.txt files, attempting to localize the intel_gpu in the app_config.xml file by using plan_class tags, which utterly failed badly... I'm not averse to running them if I can limit it to one single instance, but apparently the "-total_GPU_instances_num 7 -instances_per_device 1" options don't work as expected, though the rest of the command line is being used in the configuration. For whatever reason, this machine hasn't downloaded the astropulse program since my last project reset, so at least I don't have to deal with that in any way, but I have a COA for that if it comes back. Here's the configs, I have them all hard linked to a Configs directory in my home directory (a way to keep a good backup if I have to reset the project): The exclusion in cc_config: <exclude_gpu> <url>http://setiathome.berkeley.edu/</url> <type>intel_gpu</type> <app>setiathome_v8</app> </exclude_gpu> mac-mini-2018:~ guyallgood$ cat Configs/mb* (intel mb_cmdline-8.00-opencl_intel_gpu_sah.txt) -total_GPU_instances_num 7 -instances_per_device 1 -period_iterations_num 80 -sbs 512 -spike_fft_thresh 1024 -tune 1 2 1 16 (AMD mb_cmdline-8.20-opencl_ati5_mac.txt) -total_GPU_instances_num 7 -instances_per_device 3 -hp -high_perf -period_iterations_num 20 -sbs 1536 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 mac-mini-2018:~ guyallgood$ cat Configs/app_config.xml <app_config> <app> <name>setiathome_v8</name> <max_concurrent>10</max_concurrent> <gpu_versions> <gpu_usage>.33</gpu_usage> <cpu_usage>.666666</cpu_usage> </gpu_versions> </app> </app_config> Any ideas? |
![]() ![]() Send message Joined: 24 Jan 00 Posts: 37564 Credit: 261,360,520 RAC: 489 ![]() ![]() |
Why not just go to your online preferences, https://setiathome.berkeley.edu/prefs.php?subset=project, and deselect the use of the iGPU? That way you'll certainly get none of them. ;-) Cheers. |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 ![]() ![]() |
Why not just go to your online preferences, https://setiathome.berkeley.edu/prefs.php?subset=project, and deselect the use of the iGPU? I'd actually like to have the coprocessor working if there is a way to use it. Before I got the eGPUs, it was responsible for most of the work on the machine. It is fairly capable, but if I cannot isolate it like I want, that may be the next COA. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13903 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I'd actually like to have the coprocessor working if there is a way to use it. Before I got the eGPUs, it was responsible for most of the work on the machine. It is fairly capable, but if I cannot isolate it like I want, that may be the next COA. The problem with the iGPU is that it shares resources with the CPU- cache, memory bandwidth, power, thermal limits. Where you have a low clock speed/low core count CPUs, the iGPU can provide enough work to offset the reduced production of the CPU. But when you have plenty of CPU cores the output of the iGPU is generally less than the lost output of the CPU cores from using the iGPU- even with lower clock speed CPUs. With laptops, since they tend to have smaller caches and much lower thermal & power limits, the point at which using the iGPU becomes a losing proposition comes a lot later than for a desktop- in the case of a desktop if you've got more than 2 cores & better than 2GHz clock speed, using the iGPU just isn't worth it*. But for laptops it really is a case of try it without the iGPU & see how many WUs per hour you're getting, then try it with the iGPU and see if you get more or less WUs per hour through the system. The last few weeks are probably the best time possible to be doing thins, as the work is pretty much all the same (apart from the odd resend) & all the WUs all run within a few seconds of each other. Edit- * With the current AMD iGPUs even on desktops it may be worth running them as they have much, much greater capabilities than the Intel iGPUs, although that will most likely only be the case for the lower core count (less than 6) CPUs. Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 29 Jun 99 Posts: 11449 Credit: 29,581,041 RAC: 66 ![]() ![]() |
The bottomline is do not use the Igpu, they are a waste of resources. |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 ![]() ![]() |
Well...I guess that sums it all up then. Kill the intel_gpu. I was just getting greedy. Hate to see a resource go to waste. I can't complain, I'm putting out more in a single day now, than I did during the entire 2001 to 2006 period combined. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13903 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Prior to adding the eGPUs, the iGPU was operating about 10 times faster than any single instance running on the CPUs so it was fairly capable. But did you check how the CPU performed without the iGPU running at all? Running the iGPU clobbers the CPU performance. An iGPU 10 times faster than a single CPU thread is good. But if the CPU is capable of 12 threads then it will output more than the iGPU, and with the iGPU not running it's output will be even greater still. Grant Darwin NT |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 ![]() ![]() |
But did you check how the CPU performed without the iGPU running at all? Running the iGPU clobbers the CPU performance. I did not. I wouldn't even have considered it at the time. That's exactly what piqued my interest in running the eGPUs to begin with, so that part was good. :) An iGPU 10 times faster than a single CPU thread is good. But if the CPU is capable of 12 threads then it will output more than the iGPU, and with the iGPU not running it's output will be even greater still. That's exactly what caused me to exclude it to begin with. I seen that clobbering when I made my initial attempts at running multiple instances, and it didn't take long at all to realize something had gone afoul. I'll keep it off, and excluded it from the website as suggested to keep this anomaly from happening again. Next step will be to get another eGPU card, so the Radeon VII will be the next step up. That should eat up the remaining cores of the CPU so this little box will be pretty well maxed out. |
![]() ![]() Send message Joined: 24 Jan 00 Posts: 37564 Credit: 261,360,520 RAC: 489 ![]() ![]() |
I certainly wouldn't even consider to use either of my iGPU's in a pink fit (it just hasn't been worth it IMHO). ;-) Cheers. |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 ![]() ![]() |
I certainly wouldn't even consider to use either of my iGPU's in a pink fit (it just hasn't been worth it IMHO). ;-) Cheers to you as well mate! Thanks for your advice. The better option for me is to be patient, and upgrade & add GPUs as money permits, and perhaps put one on a smaller/slower/older Mac mini for giggles. |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 ![]() ![]() |
As a side note, and having been gone for so long, I wasn't around when astropulse came out, and other changes which happened. Not really sure what happened during the last reset (brought on by a mixture of bad things), but the astropulse program didn't download. If I'm reading the server status correctly, it doesn't appear that there even are any AP files to work on, which might help explain what I'm perceiving as odd behavior. Is this a common thing? I was considering how to approach this when there were work units available, and I'd like to approach it like this: cc_config: <exclude_gpu> <url>http://setiathome.berkeley.edu/</url> <type>amd</type> <device_num>0</device_num> <app>setiathome_v8</app> </exclue_gpu> <exclude_gpu> <url>http://setiathome.berkeley.edu/</url> <type>amd</type> <device_num>1</device_num> <app>astropulse_v7</app> </exclude_gpu> app_config: <app_config> <app> <name>setiathome_v8</name> <max_concurrent>7</max_concurrent> <gpu_versions> <gpu_usage>.33</gpu_usage> <cpu_usage>.666666</cpu_usage> </gpu_versions> </app> <app> <name>astropulse_v7</name> <max_concurrent>2</max_concurrent> <gpu_versions> <gpu_usage>.5</gpu_usage> <cpu_usage>2</cpu_usage> </gpu_versions> </app> </app_config> I'm willing to dedicate a GPU to crunching these, but if they're not going to be fed in a steady fashion, I'd be forever modifying config files via script, probably running a script to check on the status actual processes running to determine when to swap configs. I'm already doing something like this to renice the BOINC manager to -10, which starts each setiathome instance to a nice value of -10 when they start. ps axl | head -n 1; ps axl | grep boinc | grep -v grep UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND 502 453 269 0 31 -10 4406280 33504 - S< ?? 0:34.42 /Applications/BOINCManager.app/Contents/Resources/boinc --redirectio --launched_by_manager ps axl | head -n 1; ps axl | grep seti | grep -v grep UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND 503 7054 453 0 31 -10 4512772 101404 - R< ?? 51:01.57 setiathome_8.05_x86_64-apple-darwin 503 7166 453 0 31 -10 4538268 127032 - R< ?? 43:12.69 setiathome_8.05_x86_64-apple-darwin 503 7452 453 0 31 -10 4527696 117696 - R< ?? 19:59.57 setiathome_8.05_x86_64-apple-darwin 503 7505 453 0 31 -10 6639036 77308 - U< ?? 1:11.94 setiathome_8.20_x86_64-apple-darwin__opencl_ati5_mac 503 7536 453 0 31 -10 6612632 57344 - U< ?? 0:55.96 setiathome_8.20_x86_64-apple-darwin__opencl_ati5_mac 503 7556 453 0 31 -10 6620612 66076 - U< ?? 1:03.94 setiathome_8.20_x86_64-apple-darwin__opencl_ati5_mac 503 7593 453 0 31 -10 6610056 55464 - U< ?? 0:44.00 setiathome_8.20_x86_64-apple-darwin__opencl_ati5_mac 503 7612 453 0 31 -10 4508728 97080 - R< ?? 7:36.78 setiathome_8.05_x86_64-apple-darwin 503 7645 453 0 31 -10 6614732 55172 - U< ?? 0:40.05 setiathome_8.20_x86_64-apple-darwin__opencl_ati5_mac 503 7715 453 0 31 -10 6594304 49444 - U< ?? 0:10.58 setiathome_8.20_x86_64-apple-darwin__opencl_ati5_mac Any other little tricks, or considerations? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13903 Credit: 208,696,464 RAC: 304 ![]() ![]() |
As a side note, and having been gone for so long, I wasn't around when astropulse came out, and other changes which happened. Not really sure what happened during the last reset (brought on by a mixture of bad things), but the astropulse program didn't download. If I'm reading the server status correctly, it doesn't appear that there even are any AP files to work on, which might help explain what I'm perceiving as odd behavior. Is this a common thing? AP work only comes from Arecibo data, which is sporadic at best. When there is some AP work available, as long as you enable it in your settings once a system downloads a AP WU, the Manager will realise it doesn't have the application, so will download it then. A few years ago the logic for which application to run, AP, MB or both got broken. I only processed MB, others processed AP, but would do MB if there were no AP available. When the mechanism crapped out, those that had a preference of AP, but MB was ok if there were no AP just couldn't get any MB unless they specifically selected that application. And those of us that did just MB would have periods where we wouldn't get any work allocated even though there was plenty available. Enabling AP (even though I didn't have the application for it) allowed me to get MB work more regularly, but in the end I installed the AP application & will now do either. On my video cards with AP 2 WU at a time gives the best results. For MB, i'm still finding that 1 WU at a time gives the most work per hour overall. Extreme high end cards 2 at a time is meant to be best. Of course, your mileage may vary. Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60 ![]() ![]() |
Edit-I am finding that running GPU tasks with the AMD APUs are worth it to one degree or another. I've been tinkering with my settings a bit lately. I have another computer with Intel HD630 that I have disabled from SETI due to it slowing everything else down. I found that BOINC would suspend and resume constantly with the iGPU crunching. However, if I have time this weekend I might give it a spin on another project like Einstein just to see (I'm not holding my breath). Seti@home classic: 1,456 results, 1.613 years CPU time |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13903 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I found that BOINC would suspend and resume constantly with the iGPU crunching. That would be a result of your BOINC computing, "Usage limits" & "When to suspend" settings. Grant Darwin NT |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.