Message boards :
Number crunching :
Developing AMD GPU Utilities
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
Master branch. Can you try the development branch https://github.com/Ricks-Lab/amdgpu-utils/tree/v2.1.0-Features Also, try amdgpu-ls. This should be good to verify if the utility can read relevant device files. I think this is the first time the app is seeing Radeon VII, so I’m not sure if there is some different behavior. GitHub: Ricks-Lab Instagram: ricks_labs ![]() |
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
Master branch initially. I've now switched to the v2.1.0 branch and the results appear the same. Are you on GitHub? It may be easier to troubleshoot using the issue feature there. I will add some statements indicating issues reading device files. Also, which Linux distribution are you using? |
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
Master branch initially. I've now switched to the v2.1.0 branch and the results appear the same. Hi Sean, I found this posting of someone having the same issue reading p-states on Radeon vii in Mint: https://github.com/RadeonOpenCompute/ROC-smi/issues/55 Perhaps the same issue is happening on your system. I will make some changes to handle this situation more elegantly. |
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
Master branch initially. I've now switched to the v2.1.0 branch and the results appear the same. I added more error checking, messages for missing device files, and a check if dpm is enabled. They are on the v2.1.0 branch. Can you give amdgpu-ls a try and let me know the error messages? |
![]() ![]() Send message Joined: 10 Aug 00 Posts: 33 Credit: 125,775,158 RAC: 199 ![]() ![]() |
amdgpu-ls output is now: ./amdgpu-ls AMD Wattman features enabled: 0xffffffff amdgpu version: 18.50-725072 2 AMD GPUs detected Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/power1_cap_max Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/power1_cap Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/power1_average Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/temp1_input Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/temp1_crit Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/fan1_enable Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/fan1_target Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/fan1_input Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/fan1_max Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/pwm1_enable Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/pwm1 Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/pwm1_max Error: HW file doesn't exist: /sys/class/drm/card1/device/hwmon/hwmon1/in0_label Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/power1_cap_max Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/power1_cap Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/power1_average Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/temp1_input Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/temp1_crit Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/fan1_enable Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/fan1_target Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/fan1_input Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/fan1_max Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/pwm1_enable Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/pwm1 Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/pwm1_max Error: HW file doesn't exist: /sys/class/drm/card0/device/hwmon/hwmon0/in0_label 2 are Compatible Error: Invalid pstate entry: /sys/class/drm/card0/device/pp_od_clk_voltage Error: Invalid pstate entry: /sys/class/drm/card0/device/pp_od_clk_voltage Error: Invalid pstate entry: /sys/class/drm/card0/device/pp_od_clk_voltage UUID: 5d20111fb1d24b97a38ea653c57c55af Card Model: Vega 10 XT [Radeon RX Vega 64] Short Card Model: RX Vega 64 Card Number: 1 Card Path: /sys/class/drm/card1/device/ PCIe ID: 06:00.0 Driver: amdgpu HWmon: /sys/class/drm/card1/device/hwmon/hwmon1/ Current Power (W): -1 Power Cap (W): -1 Power Cap Range (W): [-1, -1] Fan Enable: -1 Fan PWM Mode: [-1, 'UNK'] Current Fan PWM (%): -1 Current Fan Speed (rpm): -1 Fan Target Speed (rpm): -1 Fan Speed Range (rpm): [-1, -1] Fan PWM Range (%): [-1, -1] Current Temp (C): -1 Critical Temp (C): -1 Current VddGFX (mV): -1 Vddc Range: ['800mV', '1200mV'] Current Loading (%): 83 Link Speed: 8 GT/s Link Width: 16 vBIOS Version: 113-D0500300-101 Current SCLK P-State: 7 Current SCLK: 1590Mhz SCLK Range: ['852MHz', '2400MHz'] Current MCLK P-State: 3 Current MCLK: 800Mhz MCLK Range: ['167MHz', '1500MHz'] Power Performance Mode: 2-VIDEO Power Force Performance Level: auto UUID: 2ffbc1178e06458783b121e71dc487bd Card Model: Device 081e Short Card Model: Device 081e Card Number: 0 Card Path: /sys/class/drm/card0/device/ PCIe ID: 03:00.0 Driver: amdgpu HWmon: /sys/class/drm/card0/device/hwmon/hwmon0/ Current Power (W): -1 Power Cap (W): -1 Power Cap Range (W): [-1, -1] Fan Enable: -1 Fan PWM Mode: [-1, 'UNK'] Current Fan PWM (%): -1 Current Fan Speed (rpm): -1 Fan Target Speed (rpm): -1 Fan Speed Range (rpm): [-1, -1] Fan PWM Range (%): [-1, -1] Current Temp (C): -1 Critical Temp (C): -1 Current VddGFX (mV): -1 Vddc Range: ['', ''] Current Loading (%): 97 Link Speed: 8 GT/s Link Width: 16 vBIOS Version: 113-D3600200-105 Current SCLK P-State: -1 Current SCLK: SCLK Range: ['808Mhz', '2200Mhz'] Current MCLK P-State: -1 Current MCLK: MCLK Range: ['351Mhz', '1200Mhz'] Power Performance Mode: 2-VIDEO Power Force Performance Level: auto Running Ubuntu 18.04 LTS. I am on GitHub, although I will say this is the first time I have actually used it. The thread you linked has the GRUB setting as "0xffffffff". Is there a difference between that and "0xffff7fff"? |
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
Running Ubuntu 18.04 LTS. Near the top of the GitHub repository page is a set of tabs (Code, Issues, Pull requests,...). Select "Issues" and then select "New Issue" at the right side near the top. You can also review all other issues from this page. The difference between the 2 grub settings is that "0xffffffff" enables everything and "0xffff7fff" enables what you need for this utility. I have tested with both. I noticed you are running a different glibc which I suspect may be part of the issue. I have tested with RX Vega 64 with no issues using the same driver and kernel as you and glibc seems to be the only notible difference. No user has reported back trying it with Radeon VII though. |
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
I have just released a new version of amdgpu-utils: https://github.com/Ricks-Lab/amdgpu-utils/releases/tag/v2.1.0 This release includes significant stability improvements and the following new features:
Added fan monitor and control features. Implemented --no_fan option across all tools. This eliminates the reading and display of fan parameters and useful for those who have installed GPU waterblocks. Implemented P-state masking, which limits available P-states to those specified. Useful for power management. Fixed implementation of global variables that broke with implementation of modules in library. Added more validation checks before writing parameters to cards.
|
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Sorry didn't follow whole thread so have question- does your utility support old HD69xx family of ATi GPU cards? SETI apps news We're not gonna fight them. We're gonna transcend them. |
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
Sorry didn't follow whole thread so have question- does your utility support old HD69xx family of ATi GPU cards? The utility requires the use of the new AMD open source drivers. Not sure if that package supports HD69xx, but I don't see it listed as compatible: https://www.amd.com/en/support/kb/release-notes/rn-rad-lin-18-50-unified |
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
I have just released a new version of amdgpu-utils: https://github.com/Ricks-Lab/amdgpu-utils/releases/tag/v2.2.0 This version includes a major bug fix and new features:
Implemented logging option --log for amdgpu-monitor. A red indicator will indicate active logging and the target filename. Implemented energy meter in amdgpu-monitor. Implemented the ability to check the GPU extracted ID in a pci.ids file for correct model name. Implemented a function to extract only AMD information for the pci.ids file and store in the file amd_pci_id.txt which is included in this distribution. Optimized long, short, and decoded GPU model names. Alpha release of a utility to update device decode data from the pci.ids website.
|
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
I have used the new logging function and energy metric in both amdgpu-monitor and benchMT with the following results. I used benchMT to run 4 instances of a WU on all 4 GPUs simultaneously, with the 4 cards modified as described in the descriptions in the plots (specified p-states, power cap, ppm mode). The results show that processing time is impacted by only a few percent, while Energy consumption varies by almost 30%. Not sure of an explanation for this other than processing a WU is gated by something other than GPU clk frequency. Perhaps it is mclk. ![]() I also ran all 15 test WUs included in benchMT on the same GPU with conditions and results as below GPU_Energy_card_0_default: time=3654.65s Energy=0.17086kWh GPU_Energy_card_0_opt_ps_mc_63_sclk_1530_pcap_150: time=3713.35s Energy=0.135071kWhWhich is a 1.6% increase in processing time and 21% reduction in energy. |
![]() Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 ![]() |
Which is a 1.6% increase in processing time and 21% reduction in energy. Which is a great "bang for the buck" finding!!! Tom A proud member of the OFA (Old Farts Association). |
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
More data shows that mclk is unlikely the bottleneck: RunName real_time user_time sys_time energy GPU_Energy_m1000_s1630_pc220 185.0 25.3 51.1 0.008418 GPU_Energy_m945_s1630_pc220 186.3 26.8 51.0 0.008496 GPU_Energy_m945_s1530_pc130 189.0 26.0 51.3 0.006261 GPU_Energy_m1000_s1630_pc130 189.7 26.7 50.9 0.006307 GPU_Energy_m945_s1630_pc130 190.0 26.6 51.4 0.006370 GPU_Energy_m945_s1401_pc130 192.1 26.4 51.2 0.006327 GPU_Energy_m945_s1200_pc130 201.4 25.9 53.0 0.006314 GPU_Energy_m945_s1138_pc130 203.8 27.0 51.9 0.006164 |
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
I have just released a new version of amdgpu-utils: https://github.com/Ricks-Lab/amdgpu-utils/releases/tag/v2.3.0 This version includes a major bug fix and new features:
|
![]() Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 ![]() |
I have just released a new version of amdgpu-utils: +1 :) A proud member of the OFA (Old Farts Association). |
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
I just did a quick comparison of RX Vega64 and the Radeon VII using default and optimized conditions: GPU/Condition real_time Energy Radeon VII 140W_COMPUTE 131.15 0.0047 DEFAULTS 140.02 0.004698 RX Vega64 140W_COMPUTE 186.84 0.005827 DEFAULTS 184.3 0.007512 |
![]() Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 ![]() |
I just did a quick comparison of RX Vega64 and the Radeon VII using default and optimized conditions: If I am understanding the results right, you can now make a serious case for reducing the operating cost and heat of the Vega 64 and probably Vega 56's gpus? Which should make them more Seti friendly. Tom A proud member of the OFA (Old Farts Association). |
![]() Send message Joined: 8 Jan 01 Posts: 15 Credit: 5,947,861 RAC: 15 ![]() |
Hi Rick, thanks for creating these tools, very useful so far! I tried Ubuntu 19.04 this weekend, it ships kernel 5.0. This seems to be problematic, at least when it comes to the amdgpu-utils: root@host:~# ./amdgpu-utils/amdgpu-ls Using Linux Kernel 5.0.0-7-generic but benchMT requires > 4.17. Error in environment. Exiting... root@host:~# ./amdgpu-utils/amdgpu-monitor Unable to init server: Could not connect: Connection refused Unable to init server: Could not connect: Connection refused Using Linux Kernel 5.0.0-7-generic but benchMT requires > 4.17. Error in environment. Exiting... Most likely easy to fix though :-) Thanks! |
![]() ![]() ![]() Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 ![]() ![]() |
Hi Rick, Thanks for reporting the issue! Yes, it was an easy fix. I just updated master with the change. Are you able to install the required amdgpu driver on 19.04? I got errors installing on 18.04.2! |
![]() Send message Joined: 8 Jan 01 Posts: 15 Credit: 5,947,861 RAC: 15 ![]() |
Thanks, the message disappeared, now it complains about amdgpu package missing: user@host:~/amdgpu-utils# ./amdgpu-ls AMD Wattman features enabled: 0xffff7fff Command '['dpkg', '-l', 'amdgpu']' returned non-zero exit status 1. Error: amdgpu drivers not installed, exiting... Inspired by an Arch Linux package build script, I extracted the OpenCL libraries from the 18.50 driver, then packaged these into a DEB file that can be easily installed/removed. https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=opencl-amd It works, WUs that got downloaded with Mesa/Clover and ROCm had shown "coproc missing" after I removed those again. Then installing the package/libraries, BOINC restart and it worked again. The card shows up just like it would with full AMDGPU-PRO drivers. Sun 17 Mar 2019 16:14:07 CET | | OpenCL: AMD/ATI GPU 0: Radeon RX 580 Series (driver version 2766.4, device version OpenCL 1.2 AMD-APP (2766.4), 8169MB, 8169MB available, 5161 GFLOPS peak) Wattman interface is accessible, so its just that check against the amdgpu package that is blocking. For people not seeking OpenCL support, they won't need the official driver, the mainline amdgpu is sufficient. Maybe that check can be dropped, or configured to be overwritten with a CLI flag. Thanks! |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.