admin管理员组

文章数量:1123032

We have an issue that an websocket application (call it: app1) logs error and has a severe performance degradation after we put it behind an Azure LB. The Azure LB does every 15sec a tcp health check. (against app1 on serverA)

I captured the health check of azure with wireshark (on serverB) and try to replay it against app1 on serverA to simulate same behaviour as LB health check.

tcp flags from health check (captured in wireshark)

53236 -> 8212 [SYN, ECE, CWR] Seq=0 Win=64240 Len=0 MSS=1440 WS=256 SACK_PERM

ECE and CWR also known as ECE (RFC 3168)

  • serverA IP: 20.99.99.99 (public) ; 172.18.16.6 (internal)
  • serverB IP: 20.99.99.101 (public) ; 172.18.16.101 (internal)
  • AzureLB IP: 168.63.129.16
  • cmd client IP 96.99.99.99 (pubic)

here my attempts with nc, hping3, traceroute, tcpreplay

anybody knows how to properly play back the recorded tcp traffic or otherwise simulate the tcp health check.

### flags from tcp health check (captured in wireshark)
#  53236 -> 8212 [SYN, ECE, CWR] Seq=0 Win=64240 Len=0 MSS=1440 WS=256 SACK_PERM

# ECE and CWR also known as ECE (RFC 3168)

## missing ECE, CWR, has TS
nc -z -v -w5 20.99.99.99 8212
timeout 1 bash -c "</dev/tcp/20.99.99.99/8212" && echo Port open. || echo Port closed.
 
##chatGPT suggestion not working
sudo hping3 -S -E -C -p 8212 20.99.99.99
##modification, no ECN
sudo hping3 -S -D -p 8212 20.99.99.99
##modification with --tos, no ECN
sudo hping3 -S --tos 50 -D -p 8212 20.99.99.99

# traceroute
### pretty close 53472 → 8212 [SYN, ECE, CWR] Seq=0 Win=5456 Len=0 MSS=1364 SACK_PERM WS=4
##can't change the WS number it seems
sudo traceroute -T --options=ecn,sack,window_scaling,reuse,info -p 8212 20.99.99.99 

# tcpreplay   
  
##works ; MAC is of linux system (WSL) which runs the CMD. 
sudo tcpliveplay eth0 wshark.pcapng 20.99.99.99 00:15:5d:48:zz:zz random

##rewrite first, port and also add again the ECN flag (tos=50)
tcprewrite --portmap=8099:8212 --tos=50 --infile=wshark.pcapng --outfile=wsharkmodport.pcapng
## no ECN    53475 → 8212 [SYN] Seq=0 Win=64240 Len=0 MSS=1440 WS=256 SACK_PERM
sudo tcpliveplay eth0 wsharkmodport.pcapng 20.99.99.99 00:15:5d:48:zz:zz random

##debugs works but need source IP and such so nothing arrives at server. 
sudo tcpreplay --dbug=5 --intf1=eth0 wsharkmodport.pcapng

##debug doesn't work 
sudo tcpliveplay --dbug=5 eth0 wsharkmodport.pcapng 20.99.99.99 00:15:5d:48:zz:zz random  

###working but nothing arrives at server on 20.99.99.99
sudo tcpreplay-edit --intf1=eth0 --verbose --portmap=8099:8212 wshark.pcapng

##try 2 but nothing arrives at server on 20.99.99.99
sudo tcpreplay-edit --intf1=eth0 --verbose --portmap=8099:8212 --dstipmap=172.18.16.6:20.99.99.99   wshark.pcapng

##try 3 but nothing arrives at server on 20.99.99.99
sudo tcpreplay-edit --intf1=eth0 --verbose --portmap=8099:8212 --dstipmap=172.18.16.6:20.99.99.99 --srcipmap=168.63.129.16:96.99.99.99 --enet-smac=00:15:5d:48:zz:zz wshark.pcapng

##working but no ECN and TS  53487 → 8212 [SYN] Seq=0 Win=31337 Len=0 WS=1024 MSS=265 TSval=4294967295 TSecr=0 SACK_PERM
sudo nmap -O --osscan-guess -v -p 8212 20.99.99.99 -d 

###working but nothing arrives at wireshark
sudo nmap -v -p 8212 20.99.99.99 -d --scanflags SYNECECWRSACK

##working but no ECN and TS    53392 → 8212 [SYN] Seq=0 Win=65472 Len=0 MSS=1364 SACK_PERM TSval=2277356213 TSecr=0 WS=128
sudo nmap -v -sT -p 8212 20.99.99.99 -d --scanflags SYNECECWR

###working ; no ECN and TS  but app1 logs same error; but performance still good. 
sudo nmap -v -sT -p 8212 20.99.99.99 -d
  

error the app1 logs during health check:

ExecutionException during WebSocket handshake, msg: 
  java.util.concurrent.ExecutionException: 
     java.io.IOException: An existing connection was forcibly closed by the remote host

edit: haproxy tcp health check does this 53462 → 8212 [SYN] Seq=0 Win=65472 Len=0 MSS=1364 SACK_PERM TSval=2279910830 TSecr=0 WS=128

本文标签: ethernetreplay modified tcp recording (to simulate azure health check)Stack Overflow