当前位置: 首页 > news >正文

ubuntu2204配置anacondacuda4090nvidia驱动

背景

某个机房的几台机器前段时间通过dnat暴露至公网后被入侵挖矿,为避免一些安全隐患将这几台机器执行重装系统操作;

这里主要记录配置nvidia驱动及cuda&anaconda。

步骤

大概分为几个步骤

  1. 禁用nouveau
  2. 配置grub显示菜单
  3. install nvidia-driver
  4. install cuda
  5. install anaconda
  6. 测试

执行

system info

root@xxx:~# cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
root@xxx:~# uname -a 
Linux exai-121 6.5.0-41-generic #41~22.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun  3 11:32:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

禁用nouveau

echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist-nouveau.conf
update-initramfs -u

配置grub显示菜单

    sed -i 's/^GRUB_TIMEOUT=.*$/GRUB_TIMEOUT=10/' /etc/default/grubsed -i 's/^GRUB_TIMEOUT_STYLE=hidden/#&/' /etc/default/grub# 验证grep -E "GRUB_TIMEOUT" /etc/default/grub# 更新grubupdate-grub

install nvidia-driver

两种安装驱动的方法

  1. 添加ppa源的方式用apt安装
  2. 官网下载指定版本的离线安全包安装

方法一

使用ubuntu带的硬件扫描配合ppa源安装

# 扫描硬件
ubuntu-drivers devices== /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0 ==
modalias : pci:v000010DEd00002684sv000010DEsd0000167Cbc03sc00i00
vendor   : NVIDIA Corporation
manual_install: True
driver   : nvidia-driver-535-server-open - distro non-free
driver   : nvidia-driver-535 - distro non-free recommended
driver   : nvidia-driver-545-open - distro non-free
driver   : nvidia-driver-535-server - distro non-free
driver   : nvidia-driver-545 - distro non-free
driver   : nvidia-driver-535-open - distro non-free
driver   : nvidia-driver-530 - third-party non-free
driver   : xserver-xorg-video-nouveau - distro free builtin
# apt安装需要的驱动版本即可 例如
apt install nvidia-driver-535 -y

方法二

编译器下载

apt update && apt install gcc make -y

官网下载535.183.01版本对应的离线安装run包
这里直接使用curl+proxy下载了

chmod +x NVIDIA-Linux-x86_64-535.183.01.run ./NVIDIA-Linux-x86_64-535.183.01.run

过程中遇到的选项根据提示选择即可

root@xxx:~# nvidia-smi 
Thu Jul 14 09:xx:01 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0 Off |                  Off |
| 32%   62C    P2             224W / 450W |  19736MiB / 24564MiB |     96%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off | 00000000:81:00.0 Off |                  Off |
| 31%   49C    P2             144W / 450W |  11430MiB / 24564MiB |     38%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce RTX 4090        Off | 00000000:C1:00.0 Off |                  Off |
| 37%   63C    P2             140W / 450W |  11430MiB / 24564MiB |     39%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce RTX 4090        Off | 00000000:C2:00.0 Off |                  Off |
| 32%   58C    P2             136W / 450W |  11430MiB / 24564MiB |     40%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

其他常用命令
更详细的卸载请参考https://blog.csdn.net/qq_43652666/article/details/134705794

# 卸载nvidia驱动
nvidia-uninstall

如果系统没有nvidia-uninstall可以使用dpkg包管理器列出系统中的nvidia关键字的包进行卸载

dpkg -l |grep nvidia*
apt remove --purge '^nvidia-.*'

install cuda

https://developer.nvidia.com/cuda-toolkit-archive这里选择要下载的版本
下载对应的离线run包
在这里插入图片描述
执行安装的时候要取消自带安装的530版本驱动,因为本地已经有驱动了。
注意这里cuda后缀标识代表会安装530.30.02版本的驱动,而我们本地已经安装了535的驱动,cuda版本是向下兼容的。

root@xxx:~# chmod +x cuda_12.1.1_530.30.02_linux.run 
# 安装完成
root@exai-121:~# sh cuda_12.1.1_530.30.02_linux.run
===========
= Summary =
===========Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-12.1/Please make sure that-   PATH includes /usr/local/cuda-12.1/bin-   LD_LIBRARY_PATH includes /usr/local/cuda-12.1/lib64, or, add /usr/local/cuda-12.1/lib64 to /etc/ld.so.conf and run ldconfig as rootTo uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.1/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 530.00 is required for CUDA 12.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:sudo <CudaInstaller>.run --silent --driverLogfile is /var/log/cuda-installer.log# 验证
root@xxx:~# /usr/local/cuda-12.1/bin/nvcc --version 
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

ubuntu2204上不推荐使用530版本的驱动,安装会有一些依赖和GCC问题,尝试解决无果。推荐使用高版本驱动来向下兼容cuda版本。

问题:
make: *** [Makefile:82: modules] Error 2
-> Checking to see whether the nvidia kernel module was successfully built
executing: ‘cd ./kernel; /usr/bin/make -k -j32 NV_EXCLUDE_KERNEL_MODULES=“” SYSSRC=“/lib/modules/6.5.0-41-generic/build” SYSOUT=“/lib/modules/6.5.0-41-generic/build” NV_KERNEL_MODULES=“nvidia”’…
make[1]: Entering directory ‘/usr/src/linux-headers-6.5.0-41-generic’
warning: the compiler differs from the one used to build the kernel
The kernel was built by: x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
You are using: cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Warning: Compiler version check failed:

The major and minor number of the compiler used to
compile the kernel:

x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38

does not match the compiler used here:

cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright © 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It is recommended to set the CC environment variable
to the compiler that was used to compile the kernel.

To skip the test and silence this warning message, set
the IGNORE_CC_MISMATCH environment variable to “1”.
However, mixing compiler versions between the kernel
and kernel modules can result in subtle bugs that are
difficult to diagnose.
Failed CC version check.

The kernel was built by: x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0 You are using: cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
make[3]: *** [scripts/Makefile.build:251: /tmp/selfgz2901/NVIDIA-Linux-x86_64-530.30.02/kernel/nvidia/i2c_nvswitch.o] Error 1
make[3]: Target ‘/tmp/selfgz2901/NVIDIA-Linux-x86_64-530.30.02/kernel/’ not remade because of errors.
make[2]: *** [/usr/src/linux-headers-6.5.0-41-generic/Makefile:2039: /tmp/selfgz2901/NVIDIA-Linux-x86_64-530.30.02/kernel] Error 2
make[2]: Target ‘modules’ not remade because of errors.
make[1]: *** [Makefile:234: __sub-make] Error 2
make[1]: Target ‘modules’ not remade because of errors.
make[1]: Leaving directory ‘/usr/src/linux-headers-6.5.0-41-generic’
make: *** [Makefile:82: modules] Error 2
ERROR: The nvidia kernel module was not created.
ERROR: Installation has failed. Please see the file ‘/var/log/nvidia-installer.log’ for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
/tmp/selfgz2901/NVIDIA-Linux-x86_64-530.30.02/kernel/common/inc/nv-mm.h:88:60: warning: passing argument 4 of ‘get_user_pages’ makes pointer from integer without a cast [-Wint-conversion]
/tmp/selfgz2901/NVIDIA-Linux-x86_64-530.30.02/kernel/nvidia/nv-mmap.c:673:23: error: assignment of read-only member ‘vm_flags’

尝试通过更换gcc版本解决:

update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 11
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 12
update-alternatives --config gcc
# 弹出来的选项中选择12版本

这是离线安装驱动方案下的问题,使用apt安装530版本未测试过,不确定包管理器是否会自动解决编译依赖问题

后续

使用sudo定制权限管理
https://blog.51cto.com/154773488/2449180
https://blog.csdn.net/Field_Yang/article/details/51547804

参考

https://www.cnblogs.com/tarsss/p/17433419.html
https://blog.csdn.net/jiexijihe945/article/details/131517630
cuda下载地址
https://developer.nvidia.com/cuda-toolkit-archive

相关文章:

  • 北京网站建设多少钱?
  • 辽宁网页制作哪家好_网站建设
  • 高端品牌网站建设_汉中网站制作
  • 【C#】| 与 及其相关例子
  • [Doris]阿里云搭建Doris,测试环境1FE 1BE
  • k8s学习笔记——dashboard安装
  • KAFKA搭建教程
  • 国产麒麟、UOS在线打开pdf加盖印章
  • C语言:键盘录入案例
  • Android 视频音量图标
  • 视觉巡线小车——STM32+OpenMV
  • MySQL8的备份方案——差异备份(CentOS)
  • 最新 Docker 下载镜像超时解决方案:Docker proxy
  • 【Python数据分析】数据分析三剑客:NumPy、SciPy、Matplotlib中常用操作汇总
  • R语言学习笔记10-向量-矩阵-数组-数据框-列表对比
  • 神经网络中如何优化模型和超参数调优(案例为tensor的预测)
  • 【HarmonyOS开发】弹窗交互(promptAction )
  • opencv,连续拍摄多张图像求平均值减少噪点
  • Consul Config 使用Git做版本控制的实现
  • es6(二):字符串的扩展
  • If…else
  • Java-详解HashMap
  • js继承的实现方法
  • MySQL常见的两种存储引擎:MyISAM与InnoDB的爱恨情仇
  • NLPIR语义挖掘平台推动行业大数据应用服务
  • Objective-C 中关联引用的概念
  • tensorflow学习笔记3——MNIST应用篇
  • vue-loader 源码解析系列之 selector
  • 开源地图数据可视化库——mapnik
  • 如何打造100亿SDK累计覆盖量的大数据系统
  • 入职第二天:使用koa搭建node server是种怎样的体验
  • 问:在指定的JSON数据中(最外层是数组)根据指定条件拿到匹配到的结果
  • 物联网链路协议
  • 一天一个设计模式之JS实现——适配器模式
  • 在Unity中实现一个简单的消息管理器
  • 3月7日云栖精选夜读 | RSA 2019安全大会:企业资产管理成行业新风向标,云上安全占绝对优势 ...
  • LevelDB 入门 —— 全面了解 LevelDB 的功能特性
  • ​软考-高级-信息系统项目管理师教程 第四版【第23章-组织通用管理-思维导图】​
  • ​香农与信息论三大定律
  • # SpringBoot 如何让指定的Bean先加载
  • ######## golang各章节终篇索引 ########
  • (2024)docker-compose实战 (9)部署多项目环境(LAMP+react+vue+redis+mysql+nginx)
  • (6)STL算法之转换
  • (C语言)二分查找 超详细
  • (delphi11最新学习资料) Object Pascal 学习笔记---第2章第五节(日期和时间)
  • (done) 两个矩阵 “相似” 是什么意思?
  • (Python) SOAP Web Service (HTTP POST)
  • (创新)基于VMD-CNN-BiLSTM的电力负荷预测—代码+数据
  • (二)换源+apt-get基础配置+搜狗拼音
  • (附源码)计算机毕业设计SSM教师教学质量评价系统
  • (亲测有效)解决windows11无法使用1500000波特率的问题
  • (太强大了) - Linux 性能监控、测试、优化工具
  • (最新)华为 2024 届秋招-硬件技术工程师-单板硬件开发—机试题—(共12套)(每套四十题)
  • .NET Core、DNX、DNU、DNVM、MVC6学习资料
  • .net dataexcel winform控件 更新 日志
  • .net on S60 ---- Net60 1.1发布 支持VS2008以及新的特性
  • .Net 中Partitioner static与dynamic的性能对比
  • .Net调用Java编写的WebServices返回值为Null的解决方法(SoapUI工具测试有返回值)