Actionable Network Observability

Mar 23, 2023 ~ 5 min read

Kubernetes network contains significant information that can be critical for use-cases such as incident investigation, API debugging, threat hunting and threat detection. Kubernetes’ highly dynamic and distributed nature make K8s network a blindspot.

Kubeshark is a new open-source tool that provides real-time protocol-level visibility into K8s network, capturing and monitoring all traffic and payloads going in, out and across containers, pods, nodes and clusters.

You can think of Kubeshark as having a one-click, fully automated installation of Wireshark and TCPdump across all of your K8s pods.

Actionable Automation

The network is vast and it happens all the time. While the information in the network is likely to provide clear indications of breaches and problems, it’s not realistic to process the entire network at all times.

Kubeshark automates the detection of suspicious network behaviors and triggers actions by using Javascript in conjunction with OSI L4 and L7 hooks.

The following short script example uses an L7 hook to detect a 500 response code in real-time and trigger a Slack alert:

function onItemCaptured(data) {
  if (data.response.status === 500)
    vendor.slack(
      env.SLACK_WEBHOOK,      // Webhook URL
      "Server-side Error",    // Pretext (title)
      JSON.stringify(data),   // Message text
      "#ff0000"               // Color code of the message
    );
}

Kubeshark supports three action categories:

Alerts: Send real-time alerts to Slack, the console, the dashboard or use a webhook to send anything anywhere.
PCAPs: Generate custom network traces (PCAPs) and upload to AWS S3.
Telemetry: Stream user-generated and network metrics and schema-free documents (e.g. logs) to InfluxDB, Grafana and Elasticsearch.

Use-cases

The information in the network in conjunction with actionable automation can help devops and security engineers leverage the power of the network in the following areas:

You can read more about each use-case in the documentation.

TL;DR

See below some selected examples:

Dashboard Alerts

Use the L7 hook onItemQueried in conjunction with the test.* helpers to detect response code 500 and show alerts in the dashboard:

function onItemQueried(data) {
  if (data.response.status === 500)
    return test.fail(data);
  else
    return test.pass(data);
}

Build Custom Metrics

Write custom-logic scripts in Javascript in conjunction with the L4/L7 hooks to create your own metrics.

The following example uses an L4 hook onPacketCaptured, some Javascript calculation and the job helper to log the L4 packet count and aggregated size per minute:

var packetCount = 0;
var totalKB = 0;

function onPacketCaptured(info) {
  packetCount++;
  totalKB += info.length / 1000;
}

function logPacketCountTotalBytes() {
  console.log("Captured packet count per minute:", packetCount);
  packetCount = 0;
  console.log("Total KB captured per minute:", totalKB);
  totalKB = 0;
}

jobs.schedule("log-packet-count-total-bytes", "0 */1 * * * *", logPacketCountTotalBytes);

Upload PCAPs to AWS S3

Generate custom network traces (PCAPs) based on a rich filtering language and upload to AWS S3. Each network trace can consolidate numerous network fragments into a single custom network snapshot.

Here’s an example of a script that continuously monitors traffic, matching the traffic against two KFL queries:

http and response.status == 500 - HTTP traffic with 500 response code
dns - DNS traffic

Matching L4 streams will be added into a PCAP repository, compressed and uploaded to AWS S3.

var KFL_PCAP_S3_KFL_ARR = [
  "http and response.status == 500",
  "dns",
];

function onItemCaptured(data) {
  wrapper.kflPcapS3(data, {
      kflArr:             KFL_PCAP_S3_KFL_ARR, // Mandatory
  });
}

PCAP files matching the KFL queries are uploaded and available in AWS S3: