Yes,Do you have free space on your root / boot dir?
Maybe hardware issue ... heat being a possible consideration. How's the air flow around that 10G ethernet card? How's the dust environment?CRITICAL: Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter Connectivity status changed to Link Failure for adapter in slot
I know when it made a mistake. Can I control the temperature value via ILO?Maybe hardware issue ... heat being a possible consideration. How's the air flow around that 10G ethernet card? How's the dust environment?
Can I control the temperature value via ILO?
I've never seen an ILO where that was an option.
Can you install lm_sensors ?
Then run sensors -f
Most NICs don't give temperature for themselves, but at least
it will give you a ball-park of if everything else is running hot on that system.
Are most of the other PCI devices running hot?
Of course it's also possible, the problem is on the remote switch.
Do you have more than one NIC? Can you try another one on that same port?
bnxt_en-pci-1001
Adapter: PCI adapter
temp1: +174.2°F
coretemp-isa-0001
Adapter: ISA adapter
Package id 1: +138.2°F (high = +199.4°F, crit = +217.4°F)
Core 0: +127.4°F (high = +199.4°F, crit = +217.4°F)
Core 1: +123.8°F (high = +199.4°F, crit = +217.4°F)
Core 2: +122.0°F (high = +199.4°F, crit = +217.4°F)
Core 3: +125.6°F (high = +199.4°F, crit = +217.4°F)
Core 4: +122.0°F (high = +199.4°F, crit = +217.4°F)
Core 5: +120.2°F (high = +199.4°F, crit = +217.4°F)
Core 6: +118.4°F (high = +199.4°F, crit = +217.4°F)
Core 7: +138.2°F (high = +199.4°F, crit = +217.4°F)
i350bb-pci-4800
Adapter: PCI adapter
loc1: +129.2°F (high = +248.0°F, crit = +230.0°F)
bnxt_en-pci-1000
Adapter: PCI adapter
temp1: +174.2°F
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +145.4°F (high = +199.4°F, crit = +217.4°F)
Core 0: +129.2°F (high = +199.4°F, crit = +217.4°F)
Core 1: +125.6°F (high = +199.4°F, crit = +217.4°F)
Core 2: +132.8°F (high = +199.4°F, crit = +217.4°F)
Core 3: +131.0°F (high = +199.4°F, crit = +217.4°F)
Core 4: +127.4°F (high = +199.4°F, crit = +217.4°F)
Core 5: +125.6°F (high = +199.4°F, crit = +217.4°F)
Core 6: +145.4°F (high = +199.4°F, crit = +217.4°F)
Core 7: +145.4°F (high = +199.4°F, crit = +217.4°F)
power_meter-acpi-0
Adapter: ACPI interface
power1: 0.00 W (interval = 300.00 s)
bnxt_en-pci-1000 Adapter: PCI adapter temp1: +174.2°F
systemd[20382]: Stopped target Timers.
systemd[20382]: Closed D-Bus User Message Bus Socket.
systemd[20382]: Stopped target Paths.
systemd[20382]: Reached target Shutdown.
systemd[20382]: Started Exit the Session.
systemd[20382]: Reached target Exit the Session.
systemd[1]: [email protected]: Succeeded.
systemd[1]: Stopped User Manager for UID 0.
systemd[1]: Stopping User runtime directory /run/user/0...
systemd[1]: run-user-0.mount: Succeeded.
systemd[1]: [email protected]: Succeeded.
systemd[1]: Stopped User runtime directory /run/user/0.
systemd[1]: Removed slice User Slice of UID 0.
crond[20288]: postdrop: warning: unable to look up public/pickup: No such file or directory
systemd[1]: session-202.scope: Succeeded.
systemd[1]: Stopping User Manager for UID 1008...
systemd[20229]: Stopping D-Bus User Message Bus...
systemd[20229]: Stopped target Default.
systemd[20229]: Stopped D-Bus User Message Bus.
systemd[20229]: Stopped target Basic System.
systemd[20229]: Stopped target Timers.
systemd[20229]: Stopped Mark boot as successful after the user session has run 2 minutes.
systemd[20229]: Stopped target Paths.
systemd[20229]: Stopped target Sockets.
systemd[20229]: Closed Sound System.
systemd[20229]: Closed Multimedia System.
systemd[20229]: Closed D-Bus User Message Bus Socket.
systemd[20229]: Reached target Shutdown.
systemd[20229]: Started Exit the Session.
systemd[20229]: Reached target Exit the Session.
systemd[1]: [email protected]: Succeeded.
systemd[1]: Stopped User Manager for UID 1008.
systemd[1]: Stopping User runtime directory /run/user/1008...
systemd[1]: run-user-1008.mount: Succeeded.
systemd[1]: [email protected]: Succeeded.
systemd[1]: Stopped User runtime directory /run/user/1008.
systemd[1]: Removed slice User Slice of UID 1008.
kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Down
smad[1712]: [INFO ]: AgentX trap received
smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18012)
smad[1712]: [NOTICE]: IML received: 171 bytes
smad[1712]: [ALERT ]: CRITICAL: Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter Connectivity status changed to Link Failure for adapter in slot 1, port 1
smad[1712]: [INFO ]: Log the IML info to syslog
NetworkManager[1417]: <info> [1685482012.4493] device (ens1f0np0): carrier: link connected
smad[1712]: [INFO ]: AgentX trap received
smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18011)
kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: ON - receive
kernel: bnxt_en 0000:10:00.0 ens1f0np0: EEE is not active
kernel: bnxt_en 0000:10:00.0 ens1f0np0: FEC autoneg off encodings: None
smad[1712]: [NOTICE]: IML received: 177 bytes
smad[1712]: [ALERT ]: NOTICE: Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter Connectivity status changed to OK for adapter in slot 1, port 1 has been repaired
smad[1712]: [INFO ]: Log the IML info to syslog
kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Down
smad[1712]: [INFO ]: AgentX trap received
smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18012)
smad[1712]: [NOTICE]: IML received: 171 bytes
smad[1712]: [ALERT ]: CRITICAL: Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter Connectivity status changed to Link Failure for adapter in slot 1, port 1
smad[1712]: [INFO ]: Log the IML info to syslog
smad[1712]: [NOTICE]: IML received: 138 bytes
smad[1712]: [ALERT ]: CRITICAL: All links are down in adapter Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter in slot 1
smad[1712]: [INFO ]: Log the IML info to syslog
NetworkManager[1417]: <info> [1685482015.9492] device (ens1f0np0): carrier: link connected
smad[1712]: [INFO ]: AgentX trap received
smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18011)
kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: ON - receive
kernel: bnxt_en 0000:10:00.0 ens1f0np0: EEE is not active
... Link Failure ...
... has been repaired ...
... Link Failure ...
When I checked, the temperature was normalMy suspicions about hardware are aroused by the messages:
Code:... Link Failure ... ... has been repaired ... ... Link Failure ...
It fits the scenario of a contact failing, say from heat expansion which removes contact ("Link Failure", then cooling allowing the contact to be made ("been repaired") then heat again losing contact. Just a theory.
No company serverIs this your personal server?