前言 本文基于Ubuntu22.04系统
初始系统安装 ssh登录相关 1 sudo apt install openssh-server
安装浏览器 1 2 wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.debsudo dpkg -i google-chrome-stable_current_amd64.deb
zsh 安装zsh
安装oh-my-zsh
1 sh -c "$(wget https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh -O -) "
激活conda(若安装完zsh再安装conda则无需这一步)
1 miniconda3/bin/conda init zsh
开关代理
1 2 3 4 5 6 7 8 9 10 11 12 13 14 proxy () { export NO_PROXY=localhost,127.0.0.1 export HTTP_PROXY="http://127.0.0.1:7890" export HTTPS_PROXY="http://127.0.0.1:7890" echo "HTTP Proxy on" }noproxy () { unset HTTP_PROXY unset HTTPS_PROXY echo "HTTP Proxy off" }
autojump 1 sudo apt install autojump
接下来,将这一行加入.zshrc
:
1 . /usr/share/auto jump/auto jump.sh
之后刷新
为apt
指令设置代理 1 sudo nano /etc/apt/apt.conf.d/proxy.conf
1 2 Acquire::http::Proxy "http://127.0.0.1:7890/" ; Acquire::https::Proxy "http://127.0.0.1:7890/" ;
ifconfig命令 1 sudo apt install net-tools
开发环境相关 conda 1 2 3 wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-latest-Linux-x86_64.shsudo chmod 777 ./Miniconda3-latest-Linux-x86_64.sh ./Miniconda3-latest-Linux-x86_64.sh
CUDA 安装编译环境:
1 sudo apt install gcc-12 make cmake
CUDA使用脚本 安装:
1 2 aria2c -s16 -x16 -k1M --file-allocation=none https://developer.download.nvidia.com/compute/cuda/12.6.1/local_installers/cuda_12.6.1_560.35.03_linux.runsudo sh cuda_12.6.1_560.35.03_linux.run
/=========== = Summary = /===========
Driver: Installed Toolkit: Installed in /usr/local/cuda-12.6/
Please make sure that - PATH includes /usr/local/cuda-12.6/bin - LD_LIBRARY_PATH includes /usr/local/cuda-12.6/lib64, or, add /usr/local/cuda-12.6/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.6/bin To uninstall the NVIDIA Driver, run nvidia-uninstall Logfile is /var/log/cuda-installer.log
cudnn(官网地址 )
1 2 3 4 5 aria2c -s16 -x16 -k1M --file-allocation=none https://developer.download.nvidia.com/compute/cudnn/9.4.0/local_installers/cudnn-local-repo-ubuntu2404-9.4.0_1.0-1_amd64.debsudo dpkg -i cudnn-local-repo-ubuntu2404-9.4.0_1.0-1_amd64.debsudo cp /var/cudnn-local-repo-ubuntu2404-9.4.0/cudnn-*-keyring.gpg /usr/share/keyrings/sudo apt updatesudo apt -y install cudnn-cuda-12
最后在.zshrc
中添加:
1 2 3 export CUDA_HOME=/usr/local/cuda-12.6export LD_LIBRARY_PATH=$LD_LIBRARY_PATH :$CUDA_HOME /lib64:/usr/local/cuda/extras/CUPTI/lib64export PATH=$PATH :$CUDA_HOME /bin
当更新Linux内核后,此方法安装的驱动会失效,此时先使用以下指令卸载参与驱动文件:
1 2 3 4 sudo apt-get --purge remove "*cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" \ "*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*" sudo apt-get remove --purge "*nvidia-driver*" "libxnvctrl*" sudo apt-get autoremove --purge -V
之后重新安装即可
docker 1 2 3 4 5 6 7 8 9 10 11 12 13 sudo apt updatesudo apt install ca-certificates curlsudo install -m 0755 -d /etc/apt/keyringssudo wget -O /etc/apt/keyrings/docker.asc https://download.docker.com/linux/ubuntu/gpgsudo chmod a+r /etc/apt/keyrings/docker.ascecho \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "$VERSION_CODENAME " ) stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt update
1 sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
将当前用户加入docker用户组
1 2 sudo gpasswd -a <user> docker newgrp docker
设置代理
1 2 sudo mkdir -p /etc/systemd/system/docker.service.dsudo nano /etc/systemd/system/docker.service.d/proxy.conf
设置其内容为:
1 2 3 4 [Service] Environment="HTTP_PROXY=http://127.0.0.1:7890/" Environment="HTTPS_PROXY=http://127.0.0.1:7890/" Environment="NO_PROXY=localhost,127.0.0.1"
1 2 3 4 5 6 7 curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listsudo apt updatesudo apt install -y nvidia-container-toolkit
设置docker
1 2 sudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart docker
将docker数据路径映射到数据盘 1 2 3 4 5 sudo docker info | grep "Docker Root Dir" sudo systemctl stop dockersudo systemctl stop docker.socketsudo mv /var/lib/docker /mnt/data01/dockersudo ln -sf /mnt/data01/docker /var/lib/docker
mysql 安装mysql
1 2 3 sudo apt updatesudo apt install mysql-serversudo systemctl start mysql.service
设置密码
1 ALTER USER 'root' @'localhost' IDENTIFIED WITH mysql_native_password BY 'password' ;
1 sudo mysql_secure_installation
其它 切换图形界面和GUI 1 2 sudo systemctl set-default graphical.targetsudo systemctl set-default multi-user.target
创建新用户 1 sudo useradd -m -s /usr/bin/zsh user
切换GCC版本 1 2 sudo ln -s -f /usr/bin/gcc-12 /usr/bin/gccsudo ln -s -f /usr/bin/gcc-11 /usr/bin/gcc
安装latex环境 1 2 3 4 wget https://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz zcat < install-tl-unx.tar.gz | tar xf -cd install-tl-*sudo ./install-tl
安装时取消选择所有其它语言
将下述内容加入环境变量:
1 2 3 export MANPATH=/usr/local/texlive/2024/texmf-dist/doc/manexport INFOPATH=/usr/local/texlive/2024/texmf-dist/doc/infoexport PATH=/usr/local/texlive/2024/bin/x86_64-linux
开启N卡持久化 1 2 3 4 cd /usr/share/doc/NVIDIA_GLX-1.0/samplessudo tar -xvf nvidia-persistenced-init.tar.bz2cd nvidia-persistenced-initsudo ./install.sh
监控 Prometheus install server 从下载界面 下载安装包:
1 2 3 wget https://github.com/prometheus/prometheus/releases/download/v3.0.0-beta.1/prometheus-3.0.0-beta.1.linux-amd64.tar.gz tar -zxvf prometheus-3.0.0-beta.1.linux-amd64.tar.gzsudo mv prometheus-3.0.0-beta.1.linux-amd64 /usr/local/prometheus
设置启动项:
1 sudo nano /etc/systemd/system/prometheus.service
内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 [Unit] Description=Prometheus demo After=network.target [Service] Type=simple User=root ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/usr/local/prometheus/data Restart=on-failure [Install] WantedBy=multi-user.target
node_exporter 同样下载安装包:
1 2 3 wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz tar -zxvf node_exporter-1.8.2.linux-amd64.tar.gzsudo mv node_exporter-1.8.2.linux-amd64 /usr/local/node_exporter
设置启动项:
1 sudo nano /etc/systemd/system/node_exporter.service
内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 [Unit] Description=Prometheus demo After=network.target [Service] Type=simple User=root ExecStart=/usr/local/node_exporter/node_exporter Restart=on-failure [Install] WantedBy=multi-user.target
修改prometheus.yml
,添加以下内容:
1 2 3 4 5 6 7 global: scrape_interval: 15s scrape_configs: - job_name: node static_configs: - targets: ['localhost:9100' ]
nvidia_gpu_exporter 安装:
1 2 wget https://github.com/utkuozdemir/nvidia_gpu_exporter/releases/download/v1.2.1/nvidia-gpu-exporter_1.2.1_linux_amd64.debsudo dpkg -i nvidia-gpu-exporter_1.2.1_linux_amd64.deb
修改prometheus.yml
,添加以下内容:
1 2 3 4 5 6 7 global: scrape_interval: 15s scrape_configs: - job_name: node static_configs: - targets: ['localhost:9835' ]
Grafana 安装 1 2 3 4 5 6 7 sudo apt install apt-transport-https software-properties-common wgetsudo mkdir -p /etc/apt/keyrings/ wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/nullecho "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.listsudo apt updatesudo apt install grafana
会自动创建systemd
启动项
添加数据源