Solved Socket still ESTABLISHED after process is killed

Solved issue

JoeZhang

New Member
Joined
Jul 5, 2023
Messages
3
Reaction score
2
Credits
40
Hello:
I have a strange question. please help me, thanks a lot.

process 491 is killed but socket created by process 491 status still is ESTABLISHED, here is information:
root@midea:/userdata# ps -ef | grep node
443 root 327m S ./node_1
463 root 941m S ./node_2
472 root 635m S ./node_3
491 root 1197m S ./node_4(This Process listen on port 12127)
517 root 669m S ./node_5
562 root 831m S< ./node_6
567 root 581m S ./node_7
592 root 1226m S ./node_8
849 root 2368 S grep node
root@midea:/userdata# netstat -t | grep 12127
tcp 0 0 localhost.localdomain:12127 localhost.localdomain:52278 ESTABLISHED
tcp 0 0 localhost.localdomain:52278 localhost.localdomain:12127 ESTABLISHED
root@midea:/userdata# kill -9 491
root@midea:/userdata# netstat -t | grep 12127
tcp 0 0 localhost.localdomain:12127 localhost.localdomain:52278 ESTABLISHED(should be closed)
tcp 0 0 localhost.localdomain:52278 localhost.localdomain:12127 ESTABLISHED
root@midea:/userdata# ps -ef | grep node
443 root 327m S ./node_1
463 root 941m S ./node_2
472 root 635m S ./node_3
517 root 669m S ./node_5
562 root 831m S< ./node_6
567 root 581m S ./node_7
592 root 1226m S ./node_8
849 root 2368 S grep node
root@midea:/userdata# netstat -t | grep 12127
tcp 0 0 localhost.localdomain:12127 localhost.localdomain:52278 ESTABLISHED
tcp 0 0 localhost.localdomain:52278 localhost.localdomain:12127 ESTABLISHED

root@midea:~# uname -a
Linux midea 4.19.219+ #1 SMP Wed May 17 16:36:15 CST 2023 aarch64 GNU/Linux

JoeZhang
 


Probably because you sent the -9 (SIGKILL) signal. Which immediately kills the process, without performing any kind of cleanup.

When using kill, SIGKILL is the last ditch option to use, if everything else fails.

It’s better to try killing processes by sending the -15 (SIGTERM) signal first.
That way, the process should gracefully shut down, closing all pipes, sockets and/or files it has opened.

Some processes might ignore the SIGTERM signal and fail to shut down, in which case - if a process is still running after sending SIGTERM with kill, you should run kill again and send SIGKILL.

So killing processes with SIGTERM is safer, because it allows processes time to finish writing data to disk, before closing any open resources.

Whereas SIGKILL cannot be ignored and will stop any process dead in its tracks right there and then - meaning that any data it is writing to disk at the time will not be complete and may even be corrupt and any open files, sockets and pipes may be left open.
 
Hello @JasKinasis
I solve this problem after add this code
fcntl(fd, F_SETFD, fcntl(fd, F_GETFD) | FD_CLOEXEC);
But I do not know why
Probably because you sent the -9 (SIGKILL) signal. Which immediately kills the process, without performing any kind of cleanup.

When using kill, SIGKILL is the last ditch option to use, if everything else fails.

It’s better to try killing processes by sending the -15 (SIGTERM) signal first.
That way, the process should gracefully shut down, closing all pipes, sockets and/or files it has opened.

Some processes might ignore the SIGTERM signal and fail to shut down, in which case - if a process is still running after sending SIGTERM with kill, you should run kill again and send SIGKILL.

So killing processes with SIGTERM is safer, because it allows processes time to finish writing data to disk, before closing any open resources.

Whereas SIGKILL cannot be ignored and will stop any process dead in its tracks right there and then - meaning that any data it is writing to disk at the time will not be complete and may even be corrupt and any open files, sockets and pipes may be left open.
 
Hello @JasKinasis
I solve this problem after add this code
fcntl(fd, F_SETFD, fcntl(fd, F_GETFD) | FD_CLOEXEC);
But I do not know why
I'm assuming this code was C code, added to the source code of whatever program/process you were running, which was leaving sockets open?

So from the looks of it, you have a couple of nested calls to fcntl. I'm a bit hazy on the exact function of fcntl, but I remember that F_SETFD sets the status of a file descriptor (fd) flags. And F_GETFD gets the current status of a file-descriptor (fd) flags. And that FD_CLOEXEC is the current state of the close-on-exec flag.

The first/outer call to fcntl is attempting to set the flags for fd to whatever value is returned by bitwise OR-ing the result of the second/inner fcntl call (which gets the current flags for fd) with FD_CLOEXEC.

So the integer returned by fcntl's F_GETFD operation must be a bitfield/bit-mask, where each binary digit in the value represents a particular flag.

The second/innermost fcntl call will get the current flags for fd and will then bitwise OR that value against the current state of the FD_CLOEXEC flag.

So if fd's FD_CLOEXEC bit/flag is false and FD_CLOEXEC is true, then the bitwise OR operation will cause fd's FD_CLOEXEC flag to be set to true (because false OR true is true).

The only time that fd's FD_CLOEXEC will be set to false is if FD_CLOEXEC is false AND fd's current FD_CLOEXEC is also false. ( because false OR false is false).

And then the first/outer call to fcntl sets fd[icode]'s flags to whatever the result of the bitwise [icode]OR operation was.

So you're effectively ensuring the FD_CLOEXEC flag is set for the file-descriptor, based on biwise OR-ing the state of the global FD_CLOEXEC flag and the FD_CLOEXEC flag of the file-descriptor...... I think!

If FD_CLOEXEC is set for a file-descriptor, it should cause the file-descriptor to be closed when the process ends - whether that be by the process ending normally, the process crashing, or the process being killed. Again, I think.... I could be wrong on that though!
 
Last edited:
I'm assuming this code was C code, added to the source code of whatever program/process you were running, which was leaving sockets open?

So from the looks of it, you have a couple of nested calls to fcntl. I'm a bit hazy on the exact function of fcntl, but I remember that F_SETFD sets the status of a file descriptor (fd) flags. And F_GETFD gets the current status of a file-descriptor (fd) flags. And that FD_CLOEXEC is the current state of the close-on-exec flag.

The first/outer call to fcntl is attempting to set the flags for fd to whatever value is returned by bitwise OR-ing the result of the second/inner fcntl call (which gets the current flags for fd) with FD_CLOEXEC.

So the integer returned by fcntl's F_GETFD operation must be a bitfield/bit-mask, where each binary digit in the value represents a particular flag.

The second/innermost fcntl call will get the current flags for fd and will then bitwise OR that value against the current state of the FD_CLOEXEC flag.

So if fd's FD_CLOEXEC bit/flag is false and FD_CLOEXEC is true, then the bitwise OR operation will cause fd's FD_CLOEXEC flag to be set to true (because false OR true is true).

The only time that fd's FD_CLOEXEC will be set to false is if FD_CLOEXEC is false AND fd's current FD_CLOEXEC is also false. ( because false OR false is false).

And then the first/outer call to fcntl sets fd[icode]'s flags to whatever the result of the bitwise [icode]OR operation was.

So you're effectively ensuring the FD_CLOEXEC flag is set for the file-descriptor, based on biwise OR-ing the state of the global FD_CLOEXEC flag and the FD_CLOEXEC flag of the file-descriptor...... I think!

If FD_CLOEXEC is set for a file-descriptor, it should cause the file-descriptor to be closed when the process ends - whether that be by the process ending normally, the process crashing, or the process being killed. Again, I think.... I could be wrong on that though!
Yes, this is c code. I found actual problem by this command:
root@midea:/userdata# ./lsof -i:12127
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
./node_4 491 root 473u inet 18247 0t0 TCP *:12127 (LISTEN)
./node_4 491 root 474u inet 21556 0t0 TCP localhost.localdomain:12127->localhost.localdomain:51172 (ESTABLISHED)
./node_5 573 root 4u inet 21853 0t0 TCP localhost.localdomain:51172->localhost.localdomain:12127 (ESTABLISHED)
telnetd 672 root 473u inet 18247 0t0 TCP *:12127 (LISTEN)
telnetd 672 root 474u inet 21556 0t0 TCP localhost.localdomain:12127->localhost.localdomain:51172 (ESTABLISHED)

the actual problem is:
if (receive_turn_on_telnet) {
system("telnetd &");
}

after kill -9 491; kill -9 672. Port 12127 is released.

@JasKinasis
Thanks a lot, Have a nice day.
 

Members online


Top