this is crazy that this thread came up today.
ive been struggling for the past few days.
same gpu, same drivers - well my gpu is a notebook version...
tl:dr skip tho the arrow - ill try to keep it brief and to the point while including as much as i can...
trying to get cuda functionality out of my GPU, and it told me that i had the wrong driver for cuda functionality. or some similar error. i had the 550 driver. so i tried to go down to the 470 or something like that, but it was next to impossible to clear out the remnants of the pre-installed nvidia drivers. my laptop has an integrated intel GPU, so i was using that, but frick if nvidia doesnt have its tentacles in everything. i got the 47- series driver installed but could still see the 550 remnants when i sweeped around with lsmod, ps aux, modprobe, etc etc.
using google-foo people said dont do the 47- series drivers, even though it was updated like may 2, 2024. so that left the 535 and the 550. well if the 550 was not jiving with the cuda, i went with the 535...
up to this point nvidia-smi would echo that it couldnt read the presence of the gpu or some similar error.
--> the most updated driver is
535.179
i installed the .run file i downloaded from nvidia and during the install it errored out. i checked the logs.
the warning was about the compiler differing from the one used to build the kernel
The kernel was built by: x86_64-linux-gnu-gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
You are using: cc (Ubuntu 13.2.0-4ubuntu3) 13.2.0
im going to paste any other glitches/errors/warnings that came up in the logs.
/tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia/nv-mmap.c:303:5: warning: conflicting types for 'nv_encode_cach>
int-mismatch]
303 | int nv_encode_caching(
| ^~~~~~~~~~~~~~~~~
In file included from /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/common/inc/nv-linux.h:1761,
from /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia/nv-mmap.c:27:
/tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/common/inc/nv-proto.h:46:13: note: previous declaration of 'nv_encode_>
46 | int nv_encode_caching (pgprot_t *, NvU32, NvU32);
| ^~~~~~~~~~~~~~~~~
then....
tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-uvm/uvm_perf_events_test.c: In function 'test_events':
/tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-uvm/uvm_perf_events_test.c:83:1: warning: the frame size of 104>
83 | }
| ^
then....
LD [M] /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-uvm.o
ld -r -o /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia/nv-interface.o /tmp/selfgz4017/NVIDIA-Linux-x86_64-535>
179/kernel/nvidia/nv-pat.o /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia/nv-procfs.o /tmp/selfgz4017/NVIDIA-L>
/tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia/nv-report-err.o /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/ker>
mp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia/libspdm_hkdf_sha.o /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/ke>
ld -r -o /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-modeset/nv-modeset-interface.o /tmp/selfgz4017/NVIDIA->
LD [M] /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-modeset.o
then...
Skipping BTF generation for /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-peermem.ko due to unavailability of vmlinux
BTF [M] /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-modeset.ko
Skipping BTF generation for /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-modeset.ko due to unavailability of vmlinux
BTF [M] /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-drm.ko
Skipping BTF generation for /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-drm.ko due to unavailability of vmlinux
BTF [M] /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-uvm.ko
Skipping BTF generation for /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-uvm.ko due to unavailability of vmlinux
BTF [M] /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia.ko
Skipping BTF generation for /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia.ko due to unavailability of vmlinux
make[1]: Leaving directory '/usr/src/linux-headers-6.5.0-35-generic'
-> done.
-> Kernel module compilation complete.
-> Kernel messages:
[ 15.445181] audit: type=1400 audit(1716411728.604:113): apparmor="DENIED" operation="open" class="file" profile="snap-update-ns.firefox" name="/var/lib/" pid=2609 comm="5" requested_mas>
[ 16.964690] audit: type=1107 audit(1716411730.124:114): pid=792 uid=101 auid=4294967295 ses=4294967295 subj=unconfined msg='apparmor="DENIED" operation="dbus_method_call" bus="system" >
exe="/usr/bin/dbus-daemon" sauid=101 hostname=? addr=? terminal=?'
[ 16.964701] audit: type=1107 audit(1716411730.124:115): pid=792 uid=101 auid=4294967295 ses=4294967295 subj=unconfined msg='apparmor="DENIED" operation="dbus_method_call" bus="system" >
exe="/usr/bin/dbus-daemon" sauid=101 hostname=? addr=? terminal=?'
[ 91.774484] wlp0s20f3: deauthenticating from ba:5e:71:06:d9:33 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 341.413026] VFIO - User Level meta-driver version: 0.3
[ 341.471389] nvidia: loading out-of-tree module taints kernel.
[ 341.471397] nvidia: module license 'NVIDIA' taints kernel.
[ 341.471398] Disabling lock debugging due to kernel taint
[ 341.471400] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 341.471400] nvidia: module license taints kernel.
[ 341.524685] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
.....
[ 341.525437] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none
wns=none
[ 341.571784] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 535.179 Fri Apr 26 21:43:18 UTC 2024
[ 341.583770] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[ 341.608126] nvidia-uvm: Loaded the UVM driver, major device number 507.
[ 341.611558] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 535.179 Fri Apr 26 21:35:21 UTC 2024
[ 341.613641] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 341.613643] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[ 341.616209] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[ 341.652378] nvidia-modeset: Unloading
[ 341.689298] nvidia-uvm: Unloaded the UVM driver.
[ 341.729639] nvidia-nvlink: Unregistered Nvlink Core, major device number 509
then...
WARNING: Your driver installation has been altered since it was initially installed; this may happen, for example, if you have since installed the NVIDIA driver through a mechanism other than "default ubuntu blah blah"
-> Uninstallation of existing driver: NVIDIA Accelerated Graphics Driver for Linux-x86_64 (470.239.06) is complete.
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if your kernel changes later. (Answer: Yes)
-> Registering the kernel modules with DKMS:
-> done.
Looking for install checker script at ./libglvnd_install_checker/check-libglvnd-install.sh
executing: '/bin/sh ./libglvnd_install_checker/check-libglvnd-install.sh'...
Found libglvnd libraries: libGLESv2.so.2 libGLESv1_CM.so.1 libOpenGL.so.0 libEGL.so.1 libGLX.so.0 libGL.so.1
Found non-libglvnd libraries:
Missing libraries:
libglvnd appears to be installed. (i installed these before the driver)
Will not install libglvnd libraries.
Will install libEGL vendor library config file to /usr/local/share/glvnd/egl_vendor.d
-> Searching for conflicting files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (535.179):
executing: '/usr/sbin/ldconfig'...
executing: '/usr/sbin/depmod -a '...
executing: '/usr/bin/systemctl daemon-reload'...
-> done.
-> Driver file installation is complete.
-> Running post-install sanity check:
-> done.
-> Post-install sanity check passed.
-> Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X? Any pre-existing X confi>
-> Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 535.179) is now complete. Please update your xorg.conf file as appropriate; see the file /usr/share/do>
i figured because of the errors, it would be best to run dkms as the source from the driver was logged with dkms in case of emergency, but i couldnt figure out the arguments to do that... and given that i had fugged around for days (previously i had to learn that secure boot needed to be disabled to work with these GTX 1650's) , was mentally exhausted and just wanting to get on with the get on... also i will admit i didnt even try to see if the driver had installed fully, or even in a functional capacity... i just assumed it was going to be busted and needed to be fixed...even though it said it was complete...
i conceded and used
sudo apt install nvidia-driver-535 and
nvidia-dkms-535 - but it installed
535.171.04 - not
535-179...
whats more... now ive got a mismatch of libraries and drivers?
sudo dkms status
nvidia/535.171.04, 6.5.0-35-generic, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built a
nd installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
rtl8814au/5.8.5.1, 6.5.0-28-generic, x86_64: installed
rtl8814au/5.8.5.1, 6.5.0-35-generic, x86_64: installed
but nvidia-smi shows a mismatch...
550.78 IS STILL AROUND...
nvidia-smi
Wed May 22 16:17:28 2024
+---------------------------------------------------------------------------------------+
|
NVIDIA-SMI 535.171.04 Driver Version: 550.78 CUDA Version: 12.4 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1650 Off | 00000000:01:00.0 Off | N/A |
| N/A 37C P0 6W / 50W | 0MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
so i dont know what to do...
thoughts?
should i go back and install the 535.179 - the one that said it glitched out... and see if it will run the cuda libraries i need? or try the cuda libraries now? should i be worried about the mismatched libraries and driver?
the 550 drivers are out. just gotta decide whether i should go back and try the 535.179 for functionality... but what to do about the persistent 550.78 driver... THAT WAS THE PROBLEMATIC DRIVER in the first place... and after all this purging and autocleaning and autoremove.... its stilll there....
im losing it.
im gonna have a smoke and a shower and make some late late lunch... but if you'll got ideas... lemme know... cause i might be coming back here with a
jerry can...