Solved Confused about Nvidia driver versions

Solved issue
Thanks for sharing and all help, I'm adding all the links to my bookmarks but for now I'll stay with manual procedure.
Together with dkms I would assume to save you trouble when you get kernel updates?
 


Together with dkms I would assume to save you trouble when you get kernel updates?
It certainly would and I find it attractive, but my mindset is "manual", that's the problem.

I even prefer cars with manual transmission rather than automatic because it let's me push the accelerator pedal further and can feel the power as much as I want to hear it :p

The only thing manual I don't like is work in real life lol
 
It certainly would and I find it attractive, but my mindset is "manual", that's the problem.
Computers are there for making things easier and automation. Using dkms is semi-automatic if you look at it because it just builds a new kernel module for a new kernel, while you still have to manually download and install the new Nvidia driver if you use the one from the Nvidia website to install it.
 
Computers are there for making things easier and automation. Using dkms is semi-automatic if you look at it because it just builds a new kernel module for a new kernel, while you still have to manually download and install the new Nvidia driver if you use the one from the Nvidia website to install it.
I'll certainly study it and might use it, one thing for which it's especially useful is when you install newer kernel trough apt get but miss that kernel was upgraded, in that case DKMS would save me from a lot of trouble.
So I'll likely use DKMS, but driver installment, I'll leave it at manual because of out of date version in nvidia repos.
 
this is crazy that this thread came up today.
ive been struggling for the past few days.
same gpu, same drivers - well my gpu is a notebook version...

tl:dr skip tho the arrow - ill try to keep it brief and to the point while including as much as i can...

trying to get cuda functionality out of my GPU, and it told me that i had the wrong driver for cuda functionality. or some similar error. i had the 550 driver. so i tried to go down to the 470 or something like that, but it was next to impossible to clear out the remnants of the pre-installed nvidia drivers. my laptop has an integrated intel GPU, so i was using that, but frick if nvidia doesnt have its tentacles in everything. i got the 47- series driver installed but could still see the 550 remnants when i sweeped around with lsmod, ps aux, modprobe, etc etc.

using google-foo people said dont do the 47- series drivers, even though it was updated like may 2, 2024. so that left the 535 and the 550. well if the 550 was not jiving with the cuda, i went with the 535...

up to this point nvidia-smi would echo that it couldnt read the presence of the gpu or some similar error.

--> the most updated driver is 535.179

i installed the .run file i downloaded from nvidia and during the install it errored out. i checked the logs.
the warning was about the compiler differing from the one used to build the kernel

The kernel was built by: x86_64-linux-gnu-gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
You are using: cc (Ubuntu 13.2.0-4ubuntu3) 13.2.0


im going to paste any other glitches/errors/warnings that came up in the logs.


/tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia/nv-mmap.c:303:5: warning: conflicting types for 'nv_encode_cach>
int-mismatch]
303 | int nv_encode_caching(
| ^~~~~~~~~~~~~~~~~
In file included from /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/common/inc/nv-linux.h:1761,
from /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia/nv-mmap.c:27:
/tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/common/inc/nv-proto.h:46:13: note: previous declaration of 'nv_encode_>
46 | int nv_encode_caching (pgprot_t *, NvU32, NvU32);
| ^~~~~~~~~~~~~~~~~

then....

tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-uvm/uvm_perf_events_test.c: In function 'test_events':
/tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-uvm/uvm_perf_events_test.c:83:1: warning: the frame size of 104>
83 | }
| ^

then....

LD [M] /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-uvm.o
ld -r -o /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia/nv-interface.o /tmp/selfgz4017/NVIDIA-Linux-x86_64-535>
179/kernel/nvidia/nv-pat.o /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia/nv-procfs.o /tmp/selfgz4017/NVIDIA-L>
/tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia/nv-report-err.o /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/ker>
mp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia/libspdm_hkdf_sha.o /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/ke>
ld -r -o /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-modeset/nv-modeset-interface.o /tmp/selfgz4017/NVIDIA->
LD [M] /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-modeset.o

then...

Skipping BTF generation for /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-peermem.ko due to unavailability of vmlinux
BTF [M] /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-modeset.ko
Skipping BTF generation for /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-modeset.ko due to unavailability of vmlinux
BTF [M] /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-drm.ko
Skipping BTF generation for /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-drm.ko due to unavailability of vmlinux
BTF [M] /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-uvm.ko
Skipping BTF generation for /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia-uvm.ko due to unavailability of vmlinux
BTF [M] /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia.ko
Skipping BTF generation for /tmp/selfgz4017/NVIDIA-Linux-x86_64-535.179/kernel/nvidia.ko due to unavailability of vmlinux
make[1]: Leaving directory '/usr/src/linux-headers-6.5.0-35-generic'
-> done.
-> Kernel module compilation complete.

-> Kernel messages:
[ 15.445181] audit: type=1400 audit(1716411728.604:113): apparmor="DENIED" operation="open" class="file" profile="snap-update-ns.firefox" name="/var/lib/" pid=2609 comm="5" requested_mas>
[ 16.964690] audit: type=1107 audit(1716411730.124:114): pid=792 uid=101 auid=4294967295 ses=4294967295 subj=unconfined msg='apparmor="DENIED" operation="dbus_method_call" bus="system" >
exe="/usr/bin/dbus-daemon" sauid=101 hostname=? addr=? terminal=?'
[ 16.964701] audit: type=1107 audit(1716411730.124:115): pid=792 uid=101 auid=4294967295 ses=4294967295 subj=unconfined msg='apparmor="DENIED" operation="dbus_method_call" bus="system" >
exe="/usr/bin/dbus-daemon" sauid=101 hostname=? addr=? terminal=?'
[ 91.774484] wlp0s20f3: deauthenticating from ba:5e:71:06:d9:33 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 341.413026] VFIO - User Level meta-driver version: 0.3
[ 341.471389] nvidia: loading out-of-tree module taints kernel.
[ 341.471397] nvidia: module license 'NVIDIA' taints kernel.
[ 341.471398] Disabling lock debugging due to kernel taint
[ 341.471400] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 341.471400] nvidia: module license taints kernel.
[ 341.524685] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
.....
[ 341.525437] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:eek:wns=none
[ 341.571784] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 535.179 Fri Apr 26 21:43:18 UTC 2024
[ 341.583770] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[ 341.608126] nvidia-uvm: Loaded the UVM driver, major device number 507.
[ 341.611558] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 535.179 Fri Apr 26 21:35:21 UTC 2024
[ 341.613641] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 341.613643] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[ 341.616209] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[ 341.652378] nvidia-modeset: Unloading
[ 341.689298] nvidia-uvm: Unloaded the UVM driver.
[ 341.729639] nvidia-nvlink: Unregistered Nvlink Core, major device number 509

then...


WARNING: Your driver installation has been altered since it was initially installed; this may happen, for example, if you have since installed the NVIDIA driver through a mechanism other than "default ubuntu blah blah"

-> Uninstallation of existing driver: NVIDIA Accelerated Graphics Driver for Linux-x86_64 (470.239.06) is complete.
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if your kernel changes later. (Answer: Yes)
-> Registering the kernel modules with DKMS:
-> done.
Looking for install checker script at ./libglvnd_install_checker/check-libglvnd-install.sh
executing: '/bin/sh ./libglvnd_install_checker/check-libglvnd-install.sh'...
Found libglvnd libraries: libGLESv2.so.2 libGLESv1_CM.so.1 libOpenGL.so.0 libEGL.so.1 libGLX.so.0 libGL.so.1
Found non-libglvnd libraries:
Missing libraries:
libglvnd appears to be installed. (i installed these before the driver)
Will not install libglvnd libraries.

Will install libEGL vendor library config file to /usr/local/share/glvnd/egl_vendor.d
-> Searching for conflicting files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (535.179):
executing: '/usr/sbin/ldconfig'...
executing: '/usr/sbin/depmod -a '...
executing: '/usr/bin/systemctl daemon-reload'...
-> done.
-> Driver file installation is complete.
-> Running post-install sanity check:
-> done.
-> Post-install sanity check passed.

-> Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X? Any pre-existing X confi>
-> Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 535.179) is now complete. Please update your xorg.conf file as appropriate; see the file /usr/share/do>

i figured because of the errors, it would be best to run dkms as the source from the driver was logged with dkms in case of emergency, but i couldnt figure out the arguments to do that... and given that i had fugged around for days (previously i had to learn that secure boot needed to be disabled to work with these GTX 1650's) , was mentally exhausted and just wanting to get on with the get on... also i will admit i didnt even try to see if the driver had installed fully, or even in a functional capacity... i just assumed it was going to be busted and needed to be fixed...even though it said it was complete...

i conceded and used sudo apt install nvidia-driver-535 and nvidia-dkms-535 - but it installed 535.171.04 - not 535-179...

whats more... now ive got a mismatch of libraries and drivers?

sudo dkms status
nvidia/535.171.04, 6.5.0-35-generic, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built a
nd installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
rtl8814au/5.8.5.1, 6.5.0-28-generic, x86_64: installed
rtl8814au/5.8.5.1, 6.5.0-35-generic, x86_64: installed

but nvidia-smi shows a mismatch... 550.78 IS STILL AROUND...


nvidia-smi

Wed May 22 16:17:28 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 550.78 CUDA Version: 12.4 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1650 Off | 00000000:01:00.0 Off | N/A |
| N/A 37C P0 6W / 50W | 0MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+


so i dont know what to do...
thoughts?
should i go back and install the 535.179 - the one that said it glitched out... and see if it will run the cuda libraries i need? or try the cuda libraries now? should i be worried about the mismatched libraries and driver?

the 550 drivers are out. just gotta decide whether i should go back and try the 535.179 for functionality... but what to do about the persistent 550.78 driver... THAT WAS THE PROBLEMATIC DRIVER in the first place... and after all this purging and autocleaning and autoremove.... its stilll there....

im losing it.

im gonna have a smoke and a shower and make some late late lunch... but if you'll got ideas... lemme know... cause i might be coming back here with a jerry can...
 

Attachments

  • OIG4-27.jpeg
    OIG4-27.jpeg
    134.2 KB · Views: 41
oh and....

the apt install nvidia-drivers/dkms-535 echo'd this back...

nvidia.ko.zst:
Running module version sanity check.
Error! Module version 535.171.04 for nvidia.ko.zst
is not newer than what is already found in kernel 6.5.0-35-generic (550.78).
You may override by specifying --force.

nvidia-modeset.ko.zst:
Running module version sanity check.
Error! Module version 535.171.04 for nvidia-modeset.ko.zst
is not newer than what is already found in kernel 6.5.0-35-generic (550.78).
You may override by specifying --force.

nvidia-drm.ko.zst:
Running module version sanity check.
Error! Module version 535.171.04 for nvidia-drm.ko.zst
is not newer than what is already found in kernel 6.5.0-35-generic (550.78).
You may override by specifying --force.

nvidia-uvm.ko.zst:
Running module version sanity check.
Error! Module version 535.171.04 for nvidia-uvm.ko.zst
is not newer than what is already found in kernel 6.5.0-35-generic (550.78).
You may override by specifying --force.

nvidia-peermem.ko.zst:
Running module version sanity check.
Error! Module version 535.171.04 for nvidia-peermem.ko.zst
is not newer than what is already found in kernel 6.5.0-35-generic (550.78).
You may override by specifying --force.
 
@curtis_connors
Playing with non distro Nvidia drivers is tricky so don't follow my advices blindly and make a backup just in case you need to reinstall system.

Basically you need to enable noveau driver to make sure you can boot into graphical system and then completely purge all Nvidia drivers and related Nvidia software from your PC.

For example:

First make sure there is noveau not blacklisted in /etc/modprobe.d

Then purge Nvidia completely:

Bash:
# List installed nvidia drivers and programs
dpkg -l | grep nvidia
# Uninstall all Nvidia drivers and programs
sudo apt purge nvidia-driver-*
sudo apt purge nvidia-*
# Remove unused packages
sudo apt autoremove
# This command must not return any nvidia remnants
dpkg -l | grep nvidia

Next rebuild initramfs to make noveau effective for your current kernel

Bash:
# Update initramfs for blacklisted nouveau driver to take effect
sudo update-initramfs -u
# System must be rebooted also for nouveau driver to be loaded
systemctl reboot

At this point you're supposed to be in clean state.
Next step is to upgrade kernel, if you're using Debian then do it from backports because only that one will work on Debian for latest Nvidia driver.
When done follow regular procedure to disable noveau and install latest Nvidia driver.
 
@curtis_connors
Playing with non distro Nvidia drivers is tricky so don't follow my advices blindly and make a backup just in case you need to reinstall system.

so i have an integrated gpu on my laptop that runs intel, so as long as i offload the nvidia modules, i can firebomb them with full gui on screen.

the nvidia drivers tell me they are not compatible with nouveau when i try to install the nvidia ones so i have to blacklist them. but i was going into the virtual cli environment (ctrl+alt+f3) to take the gui right out of the equation.

ill give your methodology for cleaning a rip as well though. i think i did something similar but in an adhd spazmatic fashion.

im using kubuntu, which is built on ubuntu 23.1 (fun)

i will give your protocol a shot and let you know how it goes.

im not entirely sure, but nvidia staffers might be cr@ckheads... i started to have this premonition when they arbirarily switched the pinout on the jetson nano's from one iteration to the next... like... ok ive got 3 chips that fit into the sodimm slot and give me no issues, but the one we just ordered... does not jive... and it's not like they released this information to the public... it took returning the SoM.... twice... and multiple emails back and forth with support until someone said... oh ya we changed the pinout hehehe whoops.... so our proprietary carrier boards werent digging it...

i digress though... cr@ckheads... think about it...
 
@curtis_connors
The errors which you listed are most likely related to kernel not supporting the latest driver, that's why nvidia drivers needs to be purged, kernel upgraded and then drivers reinstalled.

to reinstall the drivers after kernel upgrade you ofc. need to blacklist noveau, but not before kernel is upgraded and not before you're ready to install the driver.

That's how I made it work but my procedure might not be exact as yours, I was googling out errors to figure out it was unsupported kernel.
 
@CaffeineAddict :-

I know it ain't what'cha wanna hear, but I'm with @Brickwizard on this one. If it works, leave it the HELL alone...!

As for the version numbers, well....yeah. With Nvidia, it's perfectly possible to have a higher version on an older driver. It usually indicates a 'bug-fixed' older driver, but Nvidia's build-bots just automatically allocate whatever the next new driver number happens to be in the sequence. They're pretty dumb, TBH!

In my case, my GPU is no longer supported by the newest drivers. For 32-bitzers, I'm stuck on 390.148. For 64-bitzers, 470.199.02 is where Nvidia gave me the two finger salute. On the 2008 Dell Latitude, for the mobile Quadro GPU I'm stuck even further back; 340.108. And you know what?

I really couldn't care less..!!

The only reason I treated myself to a discrete GPU in the first place was simple; when I had to upgrade to the current HP desktop in 2020 (after the old Compaq tower finally gave up the ghost.....15 yrs & counting? I wasn't complaining!), in over 35 yrs with these damned black magic boxes, a discrete GPU was about the only thing left that I'd never tried.

I went with a GT 710 (yeah, I know; the gamers all take the p**s outta them summat rotten) for 2 reasons:-

  • It only draws 19W via the slot.....and this HP Pavilion has a weird, slimline, low-power PSU of 180W that's nigh on impossible to upgrade, so I had to watch out for limited power supply. And I am NOT splashing out on another, more expensive, newer machine just to be able to run a newer, more powerful GPU. What, d'you think I've got money to burn?
  • I went with the passive-cooler version from Asus.....for the simple reason that I did NOT want one of those tiny wee fans buzzing away in my lughole like a demented hornet all day long. I like a quiet life.....and I'm no gamer anyway. The hardest this thing works is on the odd occasions when I "offload" video rendering to it from Openshot.

It was an 'experiment' as far as I was concerned. More than anything else, I wanted to be able to tick the checkbox to say I'd "been there & done that".... FWIW. :p


Mike. ;)
 
yeah i went a little more verbose with

sudo apt purge '^nvidia'
sudo apt purge "^.cublas"
sudo apt purge "^.*cuda"
---also purged nvidia-compute & libnvidia
sudo mv /etc/X11/xorg.conf /etc/X11/xorg.conf.old
sudo apt autoremove
sudo apt autoclean
sudo apt-get autoremove
sudo apt-get audoclean

then checking

lsmod | grep nvidia / cuda / cublas / nvidia-compute /libnvidia
modprobe | grep nvidia / cuda /cublas / nvidia-compute /libnvidia
modinfo | grep nvidia / cuda /cublas / nvidia-compute /libnvidia
nvidia-detect
lsof | grep nvidia / cuda / cublas / nvidia-compute /libnvidia
apt-cache search nvidia / cuda / cublas / nvidia-compute /libnvidia
locate nvidia / cuda / cublas / nvidia-compute /libnvidia
dpkg -l | grep nvidia / cuda / cublas / nvidia-compute /libnvidia
ps aux | grep nvidia or whatever associated modules were running
sudo apt purge nvidia / cuda / cublas / nvidia-compute /libnvidia

(grep was not done with / each key word was searched for individually)

and sniping any remnants i could find with "sudo rm"
 
@CaffeineAddict :-

As far as the different branches go, you can think of it like this:-

The 'production' branch is sorta like the Ubuntu LTS releases; guaranteed stable AND supported for a long period, if you don't mind sticking with a slightly older feature-set.

The 'New Feature' branch can be likened to the 6-monthly interim releases that Canonical put out between LTS releases (the '9-month wonders'). These are to let those who are convinced they NEED the latest 'cutting-edge' features get a good look at what's going on......and which eventually makes its way into the next LTS release.

(Rather like a 'sneak preview', I'd say. Well, it's ONE way of putting it...! :p )


Mike. ;)
 
@CaffeineAddict :-

As far as the different branches go, you can think of it like this:-

The 'production' branch is sorta like the Ubuntu LTS releases; guaranteed stable AND supported for a long period, if you don't mind sticking with a slightly older feature-set.

The 'New Feature' branch can be likened to the 6-monthly interim releases that Canonical put out between LTS releases (the '9-month wonders'). These are to let those who are convinced they NEED the latest 'cutting-edge' features get a good look at what's going on......and which eventually makes its way into the next LTS release.

(Rather like a 'sneak preview', I'd say. Well, it's ONE way of putting it... :p )


Mike. ;)
So it's basically similar to how firefox-esr (production) vs firefox (new feature) works? one is older and stable, the other is cutting edge?
 
@CaffeineAddict :-

Pretty much, yeah. The decision is down to the individual, of course; either you're the sort who's happy with long-term relative stability, and isn't bothered with new features, OR you're the kind that always makes a beeline for the new product shelf in your local electronics store.

Me, I couldn't care less HOW things work, so long as they DO. (And just keep on chugging away, of course...)


Mike. :D
 
@MikeWalsh
I get it, marking this as solved.
I like new features over stability but not always, only those programs which are essential, like web browser, GPU driver, gaming setup, dev tools and anything else that's most often used.

Debian and the rest of software is fine to be out of date\stable, but I think going either all in for all new features or all in to all stable is not good balance.
But like you said it's personal choice, thanks for hints.
 

Staff online


Top