Recv-Q hung state or full

L

linbeg

Guest
recently we have been facing issues with recv-q getting hung, were in receive buffer gets stuck at some point of time which inturn increased cpu useage of the process which uses that socket. Please help me out what needs to be check at this point of time.
 


Do you have error logs, or messages in the logs that point to this conclusion?
 
Do you have error logs, or messages in the logs that point to this conclusion?

Hi Grim,

I use ss command to find this thing

ss dst 110.160.23.11
State Recv-Q Send-Q Local Address:port Peer Address:port
ESTAB 0 0 10.50.1.11:48298 110.160.23.11:1110
ESTAB 0 0 10.50.1.11:52109 110.160.23.11:1110
ESTAB 1181589 0 10.50.1.11:40343 110.160.23.11:1110
ESTAB 0 0 10.50.1.11:48362 110.160.23.11:1110
ESTAB 0 0 10.50.1.11:40529 110.160.23.11:1110
ESTAB 0 0 10.50.1.11:52101 110.160.23.11:1110
ESTAB 0 0 10.50.1.11:35219 110.160.23.11:1110
ESTAB 0 0 10.50.1.11:52122 110.160.23.11:1110
ESTAB 0 0 10.50.1.11:52113 110.160.23.11:1110
ESTAB 0 0 10.50.1.11:54278 110.160.23.11:1110
ESTAB 30788 0 10.50.1.11:60268 110.160.23.11:1110
ESTAB 0 0 10.50.1.11:40528 110.160.23.11:1110
 
What command exactly are you trying?? Perhaps adding --verbose can show you what is going on.
 
What command exactly are you trying?? Perhaps adding --verbose can show you what is going on.
I was trying to run "ss dst IP" , the occurance of this issue is random. Will check with --verbose option when it occurs again. However can you give me couple of more options to check if this issue re-occurs? .
 
Also, we have not done tuning of tcp parameters on our servers, as linux servers have capability of auto tuning itself. I have found one configuration for reciever buffer size

net.core.rmem_max = 131071
net.core.rmem_default = 124928
net.ipv4.tcp_rmem = 4096 87380 4194304

After googling serveral websites, i found net.ipv4.tcp_rmem (max) value should not be greater than the net.core.rmem_max value specified. However if you see above settings

tcp_rmem max value is 4194304
rmem_max value is 131071

Will this have any impact ??
 
Also, we have not done tuning of tcp parameters on our servers, as linux servers have capability of auto tuning itself. I have found one configuration for reciever buffer size

net.core.rmem_max = 131071
net.core.rmem_default = 124928
net.ipv4.tcp_rmem = 4096 87380 4194304

After googling serveral websites, i found net.ipv4.tcp_rmem (max) value should not be greater than the net.core.rmem_max value specified. However if you see above settings

tcp_rmem max value is 4194304
rmem_max value is 131071

Will this have any impact ??
I do not believe it will cause any issues, however I am not sure.
 
Well i'm stuck, no were to go. I guess i need to check more if the issue reoccurs. What would be the best tcp configuration for 16G server or the default tuning done by linux would be better?
 
The send and recv q can be high while the system is transferring data, streaming data to a server. You can make adjustments to the buffers, but keep in mind that it may add some latency to network communications.

Tuning is an art and you are going to have to play with the settings to get them right for what you are doing. This is the hard part of being a sysadmin.
 
issue re-occured again, below is the output for the epheremal port on which recv-Q is high and it looks like it got stuck and application is not able to read.

~$ sudo ss -emoi src 112.213.11.100:59511
State Recv-Q Send-Q Local Address:port Peer Address:port
ESTAB 1510228 0 112.213.11.100:59511 109.160.55.100:1110 uid:700 ino:37515225 sk:ffff8800148dc780
mem:(r1657624,w0,f1256,t0) ts sack cubic wscale:2,7 rto:270 rtt:42.5/7.5 ato:100 cwnd:4 ssthresh:3 send 1.0Mbps rcv_rtt:308.75 rcv_space:1149054
 
I think you need to look at more than your recv-Q. You are going to want to look at your system overall. You could have problems with Disk I/O, not enough RAM, bad application code, wrongly configured nic teaming, or a litany of other things.

You are going to want to start at what is the application doing when this happens and work your way back until you find the problem. This problem may also be a symptom of problems on the other side of the connection.

More information about the problem would be a good thing. Right now we don't have much to work with.

Here are some things that will be helpful:
  1. Amount of RAM in the machine
  2. Is the machine virtual or physical
  3. Does the machine host any other applications
  4. Amount of hard drive space in the machine
  5. How is that hard drive space configured (File system, LVM, etc....)
  6. Recent changes to the system (Patching, updates, new software, etc....)
  7. Recent changes to the application (Patching, updates, new versions...)
  8. Was the application ever working properly
  9. If the application was working properly then what changed right before you noticed it wasn't working.
  10. Does the system show a high load during normal operations

This is just a list of questions to get started with.

I apologize if this post comes off a bit rough. Sometimes asking the tough questions is the only way to come to a solution.
 
Hi Grim,

I can understand :) , however i was backtracking and there were few changes done since last couple of monthsat application end. I'm gonna do few changes at application front and let you know if issue still exists.

Also as these applications are running since long on same configuration however these issues have been cropped up since last couple of months which is making me to think of what all changes were carried out during this period.

Thanks Grim will get back to you soon.

Cheers :) .
 
I am getting the same issue after we shifted our servers from physical machines to virtual machines. SendQ is not a problem. But recvQ size is not decreasing and as a result we are losing a lot of hits from the other side, our application is not accepting any data.
 
Please start a new thread so that your issue can properly be addressed.

I am getting the same issue after we shifted our servers from physical machines to virtual machines. SendQ is not a problem. But recvQ size is not decreasing and as a result we are losing a lot of hits from the other side, our application is not accepting any data.
 

Members online


Top