119 lines
6.1 KiB
Markdown
119 lines
6.1 KiB
Markdown
# gpuutil
|
|
|
|
A naive tool for observing gpu status and auto set visible gpu in python code.
|
|
|
|
## How to use
|
|
|
|
1. install the package.
|
|
```shell
|
|
pip install https://git.zmy.pub/zmyme/gpuutil/archive/v0.0.3.tar.gz
|
|
```
|
|
|
|
2. for observing gpu status, just input
|
|
```shell
|
|
python -m gpuutil <options>
|
|
```
|
|
when directly running ```python -m gpuutil```, you would probably get:
|
|
```text
|
|
+----+------+------+----------+----------+------+----------------+
|
|
| ID | Fan | Temp | Pwr | Freq | Util | Vmem |
|
|
+----+------+------+----------+----------+------+----------------+
|
|
| 0 | 22 % | 21 C | 9.11 W | 300 MHz | 0 % | 3089/11019 MiB |
|
|
| 1 | 22 % | 23 C | 6.28 W | 300 MHz | 0 % | 786/11019 MiB |
|
|
| 2 | 38 % | 59 C | 92.04 W | 1890 MHz | 6 % | 3608/11019 MiB |
|
|
| 3 | 40 % | 67 C | 246.38 W | 1740 MHz | 93 % | 3598/11019 MiB |
|
|
+----+------+------+----------+----------+------+----------------+
|
|
| Process Info |
|
|
+----------------------------------------------------------------+
|
|
| [26107|0] user1(737 MiB) python |
|
|
| [34033|0,1] user2(1566 MiB) python |
|
|
| [37190|0] user2(783 MiB) python |
|
|
| [37260|0] user2(783 MiB) python |
|
|
| [30356|2] user3(3605 MiB) python train.py --args --some really |
|
|
| long arguments |
|
|
| [34922|3] user3(3595 MiB) python train.py --args --some really |
|
|
| long arguments version 2 |
|
|
+----------------------------------------------------------------+
|
|
```
|
|
To get more information, run ```python -m gpuutil -h```, you would get:
|
|
```text
|
|
usage: __main__.py [-h] [--profile PROFILE] [--cols COLS] [--style STYLE]
|
|
[--show-process SHOW_PROCESS] [--vertical VERTICAL] [--save]
|
|
|
|
optional arguments:
|
|
-h, --help show this help message and exit
|
|
--profile PROFILE, -p PROFILE
|
|
profile keyword, corresponding configuration are saved in ~/.gpuutil.conf
|
|
--cols COLS, -c COLS colums to show.(Availabel cols: ['ID', 'Fan', 'Temp', 'TempMax', 'Pwr',
|
|
'PwrMax', 'Freq', 'FreqMax', 'Util', 'Vmem', 'UsedMem', 'TotalMem', 'FreeMem',
|
|
'Users']
|
|
--style STYLE, -sty STYLE
|
|
column style, format: |c|l:15|r|c:14rl:13|, c,l,r are align methods, | is line
|
|
and :(int) are width limit.
|
|
--show-process SHOW_PROCESS, -sp SHOW_PROCESS
|
|
whether show process or not
|
|
--vertical VERTICAL, -v VERTICAL
|
|
whether show each user in different lines. (show user vertically)
|
|
--save save config to profile
|
|
```
|
|
|
|
3. To auto set visible gpu in your python code, just use the following python code.
|
|
```python
|
|
from gpuutil import auto_set
|
|
auto_set(1)
|
|
```
|
|
|
|
the ```auto_set```function is defined as follows:
|
|
```python
|
|
# num: the number of the gpu you want to use.
|
|
# allow_nonfree: whether use non-empty gpu when there is no enough of them.
|
|
# ask: if is set to true, the script will ask for a confirmation when using non empty gpu. if false, it will use the non empty gpu directly.
|
|
# blacklist: a list of int, the gpu in this list will not be used unless you mannuly choose them.
|
|
# show: if set to true, it will show which gpu is currently using.
|
|
def auto_set(num, allow_nonfree=True, ask=True, blacklist=[], show=True):
|
|
# some code here.
|
|
```
|
|
|
|
# Use this inside an docker.
|
|
For some reason, codes that running in docker cannot get the correct information about the process that using the gpu.
|
|
To support that, gpuutil supports read the output command of nvidia-smi and ps from an given file, which should be generated by you from host machine
|
|
To use this in docker, try the following steps:
|
|
1. figure out a way to pass the output of command ```nvidia-smi -q -x``` to the docker that your are currently using, save the output as a text file.
|
|
2. pass the output of a ps-like command to the docker. It is a table-like output, the first line is header, which should at least contains user, pid and command. below is an valid output generated by running ```ps -axo user,pid,command```on host machine:
|
|
```
|
|
USER PID COMMAND
|
|
root 1 /bin/bash -c bash /etc/init.docker; /usr/sbin/sshd -D
|
|
root 8 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
|
|
root 9 sshd: user1 [priv]
|
|
user1 19 sshd: user1@pts/0
|
|
user1 20 -zsh
|
|
user1 97 tmux
|
|
user1 98 -zsh
|
|
```
|
|
if your generated output have different name, for example when you are using ```docker top``` instead of ```ps```, the ```COMMAND``` section would be ```CMD```, therefore you need prepare a dict that maps its name to either of ```user, pid, command```, note that its insensitive to upper case.
|
|
3. run the configuration script.
|
|
```shell
|
|
python -m gpuutil.set_redirect -nv path/to/your/nvidia/output -ps /path/to/your/ps/output -pst cmd=command,username=user
|
|
```
|
|
for more information about the script, run ```python -m gpuutil.set_redirect -h```, you will get:
|
|
```
|
|
usage: set_redirect.py [-h] [--nvsmi NVSMI] [--ps PS] [--ps_name_trans PS_NAME_TRANS]
|
|
|
|
optional arguments:
|
|
-h, --help show this help message and exit
|
|
--nvsmi NVSMI, -nv NVSMI
|
|
a file indicates real nvidia-smi -q -x output.
|
|
--ps PS, -ps PS a file indicates real ps-like output.
|
|
--ps_name_trans PS_NAME_TRANS, -pst PS_NAME_TRANS
|
|
a dict of name trans, format: name1=buildin,name2=buildin, buildin can be choosen from cmd,user,pid
|
|
```
|
|
> some advice:
|
|
> 1. you can use a script that run nvidia-smi and ps command and save their output to a directory, the mount the directory to the docker as readonly.
|
|
> 2. you could consider mount the directory as tmpfs.
|
|
|
|
## ps:
|
|
1. You can get more detailed gpu info via accessing gpuutil.GPUStat class, for more information, just look the code.
|
|
2. Since it use ps command to get detailed process info, it can only be used on linux, if you use it on windows, some information might be missing.
|
|
3. If you have any trouble, feel free to open an issue.
|
|
4. The code is straight forward, it's also a good choice to take an look at the code if you got any trouble.
|