Developing AMD GPU Utilities

Message boards : Number crunching : Developing AMD GPU Utilities
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 1977456 - Posted: 28 Jan 2019, 7:10:38 UTC - in response to Message 1977370.  

I wonder if asking some of the Windows monitoring app developers like Ray Hinchcliffe of SIV or Martin Malik of HwInfo64 for hints about how they poll for a ATI/AMD devices capabilities in their programs.

Seems the developer of GPU-Z would be the perfect person to ask but I don't know who that person is. Might be an effort by the team at TechPowerUp since they host the forums and downloads for the program.


GPU-Z doesn't properly see my 2400G gpu. It leaves the gpu load box empty. :(

Tom
A proud member of the OFA (Old Farts Association).
ID: 1977456 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1977480 - Posted: 28 Jan 2019, 12:06:09 UTC

I just updated a new version on GitHub with a tuned Gtk based display available with the --gui option. I wanted to left justify the first column, but was unsuccessful. Seemed simple in the documentation, but I could not make it work. If you are familiar with Gtk, please check out the code and let me know what I am doing wrong. Thanks!
https://github.com/Ricks-Lab/amdgpu-utils
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1977480 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1977652 - Posted: 29 Jan 2019, 12:59:14 UTC

I have just released the initial version of amdgpu-utils:
https://github.com/Ricks-Lab/amdgpu-utils/releases/tag/v1.0.0

So far it only includes 1 of 3 planned utilities. The monitor utility, amdgpu-monitor, is complete. I probably need to learn a lot more about Gtk before I can finish the utility to modify p-states. I hope someone with AMD GPUs on Linux can test it out and let me know of any issues. One last minute fix in v1.0.1, available on master, is the left justify of labels. Finally got it working, with some help.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1977652 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1977981 - Posted: 31 Jan 2019, 12:22:16 UTC

I have just released a new version of amdgpu-utils:
https://github.com/Ricks-Lab/amdgpu-utils/releases/tag/v1.1.0

This release include bug fixes and a new feature to display the current p-state tables instead of the detailed parameters of each card when the
--pstates option is specified.
ID: 1977981 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1981056 - Posted: 18 Feb 2019, 22:22:00 UTC

I have made significant progress over the CNY holidays and now have a working version of the utility to modify GPU performance settings! I am still working a few issues, but it is fully functional. Hoping to get some others with different setups to test it out. It can be downloaded from the master branch (updated frequently) here:
https://github.com/Ricks-Lab/amdgpu-utils

The new capability is in the amdgpu-pac app. You must be running a recent version of the amdgpu Linux drivers for the utilities to work.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1981056 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1981059 - Posted: 18 Feb 2019, 22:30:36 UTC

Rick can your utility undervolt the AMD cards? Have you seen anybody using it with the new Radeon VII card. Have you also posted over in the MilkyWay and Einstein forums?

Those projects are heavy users of AMD cards because of their very good floating point performance. Seti tends to favor Nvidia so your prospective field of users will not be as large.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1981059 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1981063 - Posted: 18 Feb 2019, 22:36:10 UTC - in response to Message 1981059.  

Rick can your utility undervolt the AMD cards? Have you seen anybody using it with the new Radeon VII card. Have you also posted over in the MilkyWay and Einstein forums?

Those projects are heavy users of AMD cards because of their very good floating point performance. Seti tends to favor Nvidia so your prospective field of users will not be as large.



Yes, you can customize all pstates to be specific Frequency and Voltage, but one problem is that the GPU will always use max voltage when fully loaded and in the highest p-state. I read somewhere that you can limit which pstates are available, but have not figured out how to implement it. The best alternative that I have found is to set a power cap. Also, a big benifit to compute is to use the compute performance mode, which limits down clocking when loading is low.
ID: 1981063 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1981066 - Posted: 18 Feb 2019, 22:39:25 UTC - in response to Message 1981063.  

I asked because the number one request over at MW and Einstein is undervolting to get the power usage down. But if you can cap the p-states or the power limit I guess that achieves the same thing.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1981066 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1981071 - Posted: 18 Feb 2019, 22:47:24 UTC - in response to Message 1981066.  

I asked because the number one request over at MW and Einstein is undervolting to get the power usage down. But if you can cap the p-states or the power limit I guess that achieves the same thing.


My main motivation for this project is that I have 2 cards that are unstable after 1.5 years of use including heavy mining for part of that time. I am running one the problematic cards now with the power cap reduced from 220W to 150W and no problems so far. Also, overall performance is not drastically reduced. It may be partially offset by using Compute ppm.

If any user over there has any questions they can connect with me here or on GitHub.
ID: 1981071 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1981072 - Posted: 18 Feb 2019, 22:54:00 UTC - in response to Message 1981071.  
Last modified: 18 Feb 2019, 22:54:57 UTC

Yes, that is the gist I got out of comments there. It doesn't appear that undervolting impacts the compute performance in any significant way but drastically saves on power budget.

Do you think your years of mining have degraded the silicon? Or could it be the thermal paste on the die/cooler interface is degraded and simply needs replacing?

[Edit] I'm going to post the news there since I think your utility would have interest in those projects.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1981072 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1981073 - Posted: 18 Feb 2019, 23:07:55 UTC

Here is what the user interface looks like:

There are a couple of issues:
Save All button only saves first card, but all of the card level Save buttons work.
Non-compatible cards will show up in the interface.
Need to figure out how to get the bottom set of buttons to span all columns.
ID: 1981073 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1981074 - Posted: 18 Feb 2019, 23:12:37 UTC - in response to Message 1981072.  

Yes, that is the gist I got out of comments there. It doesn't appear that undervolting impacts the compute performance in any significant way but drastically saves on power budget.

Do you think your years of mining have degraded the silicon? Or could it be the thermal paste on the die/cooler interface is degraded and simply needs replacing?

[Edit] I'm going to post the news there since I think your utility would have interest in those projects.


I am not sure about Si degredation. I have put my Fiji cards through a lot worse with no issues. I noticed the default power draw of the 2 problematic cards is higher than the other 2. Also, I don't want to jump to the conclusion that the cards are fine now until I have them running for at least a week. It could still be a platform issue.
ID: 1981074 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1981455 - Posted: 21 Feb 2019, 10:26:15 UTC

I have just released a new version of amdgpu-utils:
https://github.com/Ricks-Lab/amdgpu-utils/releases/tag/v2.0.0

With the help of Craig over at Einstein@Home, amdgpu-utils was tested on 2 more models of GPUs which identified significant bugs that were not apparent on my development system. This release includes those fixes and additional features as describe below:
    First release of amdgpu-pac, which is a utility to set GPU performance parameters.
    Add check of amdgpu driver in the check of environment for all utilities. Add display of amdgpu driver version.
    Split list functions of the original amdgpu-monitor into amdgpu-ls.
    Added --clinfo option to amdgpu-ls which will list openCL platform details for each GPU.
    Added --ppm option to amdgpu-ls which will display the table of available power/performance modes available for each GPU.
    Error messages are now output to stderr instead stdout.
    Added power cap and power/performance mode to the monitor utilities. I have also included them in the amdgpu-ls display in addtion to the power cap limits.


GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1981455 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 1981565 - Posted: 21 Feb 2019, 20:20:41 UTC - in response to Message 1981455.  

+1
A proud member of the OFA (Old Farts Association).
ID: 1981565 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1981988 - Posted: 24 Feb 2019, 10:03:28 UTC

I am nearly done with the capability I originally planned for the amdgpu-utils. I have been working with the latest feature of p-state masking to reduce power consumption. The default behavior of the card is to use max voltage at high loading in the highest p-state even if you define a lower voltage. With p-state masking, I can specify which p-states to be available and limit to the second highest p-state. To measure the effect, I have done a test case using benchMT with the following conditions:
    1) Default condition - power consumption often hits 220W
    2) Default conditions, but power cap of 150W and using compute mode (stays in high p-state more) - spends a lot of time at 150W
    3) Limit to sclk ps of 6 and mclk ps of 3 (2nd highest sclk and highest mclk), power cap of 150W and compute mode (probably no effect, since always in p-state 6) - most of time at 120W


Performance results when running the full set of WUs included in benchMT on the same GPU:

    Condition 2 is 1.7% slower than default and Condition 3 is 2.3% slower. I suspect that power/performance is best in Condition 3, but it is difficult to measure in a way to be certain.


I plan to release v2.1.0 soon, but the development branch is available for anyone interested in giving it a try.
https://github.com/Ricks-Lab/amdgpu-utils/tree/v2.1.0-Features

ID: 1981988 · Report as offensive
Profile Sean Project Donor
Volunteer tester

Send message
Joined: 10 Aug 00
Posts: 33
Credit: 125,775,158
RAC: 199
United States
Message 1982020 - Posted: 24 Feb 2019, 16:23:52 UTC - in response to Message 1981988.  

I really appreciate your efforts in developing this. I haven't installed your utilities just yet, but plan to sometime during the next week. Thank you!
ID: 1982020 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 1982032 - Posted: 24 Feb 2019, 18:33:56 UTC - in response to Message 1981988.  

+42

:)
A proud member of the OFA (Old Farts Association).
ID: 1982032 · Report as offensive
Profile Sean Project Donor
Volunteer tester

Send message
Joined: 10 Aug 00
Posts: 33
Credit: 125,775,158
RAC: 199
United States
Message 1982089 - Posted: 25 Feb 2019, 5:28:46 UTC

Here are my results with the Vega56 and Radeon VII
amdgpu-monitor outputs:
┌────────────┬────────────┬────────────┐
│Card #      │card1       │card0       │
├────────────┼────────────┼────────────┤
│Model       │ RX Vega 64 │ Device 081e│
│Load %      │88          │97          │
│Power (W)   │-1          │-1          │
│Power Cap (W│-1          │-1          │
│T (C)       │-1          │-1          │
│VddGFX (mV) │-1          │-1          │
│Sclk (MHz)  │1590Mhz     │            │
│Sclk Pstate │7           │-1          │
│Mclk (MHz)  │800Mhz      │            │
│Mclk Pstate │3           │-1          │
│Perf Mode   │2-VIDEO     │2-VIDEO     │
└────────────┴────────────┴────────────┘

amdgpu-pac gives me the following:
~/amdgpu-utils$ sudo ./amdgpu-pac
AMD Wattman features enabled: 0xffff7fff
amdgpu version: 18.50-725072
2 AMD GPUs detected
Traceback (most recent call last):
  File "./amdgpu-pac", line 758, in <module>
    main()
  File "./amdgpu-pac", line 731, in main
    gpu_list.get_pstates()
  File "/home/sean/amdgpu-utils/GPUmodules/GPUmodules.py", line 457, in get_pstates
    v.get_pstates()
  File "/home/sean/amdgpu-utils/GPUmodules/GPUmodules.py", line 320, in get_pstates
    self.sclk_state[lineitems[0]] = [lineitems[1],lineitems[2]]
IndexError: list index out of range

Something I'm doing wrong?
ID: 1982089 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1982090 - Posted: 25 Feb 2019, 5:37:45 UTC - in response to Message 1982089.  
Last modified: 25 Feb 2019, 5:41:52 UTC

Here are my results with the Vega56 and Radeon VII
amdgpu-monitor outputs:
┌────────────┬────────────┬────────────┐
│Card #      │card1       │card0       │
├────────────┼────────────┼────────────┤
│Model       │ RX Vega 64 │ Device 081e│
│Load %      │88          │97          │
│Power (W)   │-1          │-1          │
│Power Cap (W│-1          │-1          │
│T (C)       │-1          │-1          │
│VddGFX (mV) │-1          │-1          │
│Sclk (MHz)  │1590Mhz     │            │
│Sclk Pstate │7           │-1          │
│Mclk (MHz)  │800Mhz      │            │
│Mclk Pstate │3           │-1          │
│Perf Mode   │2-VIDEO     │2-VIDEO     │
└────────────┴────────────┴────────────┘

amdgpu-pac gives me the following:
~/amdgpu-utils$ sudo ./amdgpu-pac
AMD Wattman features enabled: 0xffff7fff
amdgpu version: 18.50-725072
2 AMD GPUs detected
Traceback (most recent call last):
  File "./amdgpu-pac", line 758, in <module>
    main()
  File "./amdgpu-pac", line 731, in main
    gpu_list.get_pstates()
  File "/home/sean/amdgpu-utils/GPUmodules/GPUmodules.py", line 457, in get_pstates
    v.get_pstates()
  File "/home/sean/amdgpu-utils/GPUmodules/GPUmodules.py", line 320, in get_pstates
    self.sclk_state[lineitems[0]] = [lineitems[1],lineitems[2]]
IndexError: list index out of range

Something I'm doing wrong?


Are you running the development branch or master? Also, I haven’t tried it with sudo. It will prompt you for sudo credentials when needed.

Can you also try amdgpu-ls? This may provide more insight into what is happening.
ID: 1982090 · Report as offensive
Profile Sean Project Donor
Volunteer tester

Send message
Joined: 10 Aug 00
Posts: 33
Credit: 125,775,158
RAC: 199
United States
Message 1982091 - Posted: 25 Feb 2019, 5:41:04 UTC - in response to Message 1982090.  
Last modified: 25 Feb 2019, 5:49:33 UTC

Master branch initially. I've now switched to the v2.1.0 branch and the results appear the same.
Output is the same with and without using sudo.

amdgpu-ls outputs:
./amdgpu-ls
AMD Wattman features enabled: 0xffff7fff
amdgpu version: 18.50-725072
2 AMD GPUs detected
2 are Compatible

Traceback (most recent call last):
  File "./amdgpu-ls", line 136, in <module>
    main()
  File "./amdgpu-ls", line 124, in main
    gpu_list.get_pstates()
  File "/home/sean/amdgpu-utils/GPUmodules/GPUmodules.py", line 512, in get_pstates
    v.get_pstates()
  File "/home/sean/amdgpu-utils/GPUmodules/GPUmodules.py", line 311, in get_pstates
    self.sclk_state[lineitems[0]] = [lineitems[1],lineitems[2]]
IndexError: list index out of range
ID: 1982091 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Developing AMD GPU Utilities


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.