Network Error Detection & Analysis - Why it is important

Feb 8, 2024 ~ 6 min read

Network Errors Compromise Reliability and Increase Costs

Given the dynamic nature and scale of Kubernetes (K8s), the network layer reliability is paramount for optimal application performance. Yet, network issues pose significant challenges, undermining reliability, impairing performance, and driving up operational costs.

Detecting and diagnosing these issues, especially within microservices architectures where seamless service integration is crucial, becomes a complex and daunting task for DevOps and network engineers.

Consequently, maintaining stable network performance is essential for user satisfaction and application success.

Traditional APM Tools Focus on the Application Layer (and Not on the Network)

Traditional Application Performance Management (APM) tools often fall short in addressing network-related challenges, as they primarily focus on the application layer rather than the underlying network issues.

It's Even More Important in Kubernetes

In Kubernetes, the importance of network reliability is magnified due to the platform's reliance on APIs and their critical role. Issues like socket creation errors, connection timeouts, unexpected socket closures, protocol errors, and packet loss can drastically affect application performance and reliability.

Addressing these challenges requires a focused approach to network monitoring and troubleshooting, one that is specially tailored to the unique demands of Kubernetes environments.

Kubeshark: Network Observability for Kubernetes

Kubeshark offers network observability tailored for Kubernetes, going beyond traditional monitoring to pinpoint network issues at their core. By inspecting packets, utilizing eBPF for operating system event monitoring, and tracking protocols such as ICMP and TCP, Kubeshark equips DevOps and network engineers with the tools to detect and diagnose network-related errors effectively.

Use Wireshark for Deep Investigation

When an error is detected, Kubeshark provides the faulty PCAP file for further investigation with Wireshark. This capability, along with application layer monitoring, empowers DevOps and network engineers to see with their own eyes all that is happening in every corner of their Kubernetes cluster.

Summary

In the complex world of Kubernetes, network reliability is critical for application performance, yet it's often compromised by network issues that escalate costs and challenge DevOps. Traditional APM tools fall short in addressing these intricate problems. Kubeshark emerges as a vital solution, offering targeted network observability to pinpoint and resolve issues efficiently, ensuring stable network performance and enhancing application success in Kubernetes environments.

‍

TL;DR - Errors That Can Be Detected With Kubeshark

TCP (Connection) Errors

Connection errors, identified as part of the TCP protocol, include:

- SynSent: The client attempted to establish a connection, but the connection was refused. SYN sent, but ACK not received.

- CloseWait: One side closed the connection with FIN, but the final ACK confirmation is pending.

- LastAck: One side received a FIN from the peer, sent an ACK, sent a FIN, and is waiting for the final ACK from the peer.

- Reset: A RST packet is seen, indicating that one side will neither accept nor send more data.

Errors From ICMP

Kubeshark intercepts, dissects, and presents ICMP messages. ICMP is an L4 protocol used for error reporting and network diagnostics. For instance, if a router cannot forward a packet because the destination is unreachable, an ICMP message is sent back to the sender indicating the problem.

Partial list of errors reported by ICMP messages:

Destination Unreachable: Sent when a packet cannot be delivered to its destination for various reasons, with subcodes including:

Network Unreachable
Host Unreachable
Protocol Unreachable
Port Unreachable
Fragmentation Needed and DF (Don’t Fragment) set
Source Route Failed
Destination Network Unknown
Destination Host Unknown
Source Host Isolated
Network Administratively Prohibited
Host Administratively Prohibited
Network Unreachable for Type of Service
Host Unreachable for Type of Service

Time Exceeded: Indicates that the Time to Live (TTL) field of the packet has reached zero, necessitating discarding the packet. This is commonly used in network route tracing.

Source Quench: A deprecated message requesting the sender to decrease the message sending rate due to router or host congestion.

Redirect Message: Sent by routers to indicate a more efficient packet routing path.

Parameter Problem: Indicates an error in the IP packet header fields, preventing processing.

Router Advertisement and Router Solicitation: Used for routing information discovery and announcement.

Dissection Errors

Dissection errors are reported by Kubeshark protocol parsers:

- Unexpected EOF: TCP connection closed unexpectedly.

- Parser error: Kubeshark’s application layer protocol parser reports an invalid payload according to the protocol definition.

Timeout Errors and Packet Loss

Kubeshark’s L4 stream capture times out after a certain configurable timeout (default: 10s). If an L4 stream capture doesn’t complete within this timeframe (e.g. due to packet loss), the stream is marked stale and dropped. PCAP file is generated and is available for further investigation with Wireshark.

Half Connections

Half connections represent incomplete transactions where either a request or a response is missing.

‍

Read more in the Network Error Detection & Analysis section in the docs.