[Tech Blog] Exploring Kubernetes Observability with groundcover (Part 1: Best Practices)
Blog|Date : 2024-02-29

Overview🍃

 

 

Kubernetes observability is like solving a near-impossible puzzle. With the addition of layers inside the system, there is a proliferation of data that can be retrieved compared to traditional architectures. This means that observability tools that normally worked on servers are not sufficient anymore. Not only that, but the ephemeral and ever-changing nature of Kubernetes demands the ability to continually monitor events, logs, and more without any interruptions. This is where Kubernetes observability solutions prove their worth.

 

Owing to the complex nature of Kubernetes and the resultant digital footprint, many companies are using multiple observability tools. In fact, according to a survey done by Grafana Labs in 2023, 52% of companies are using 6 or more observability tools, with 16% of them using 16 or more observability tools. This suggests that companies are struggling to gain a complete understanding of their Kubernetes and the events happening within.

 

As we built our own Kubernetes projects, we realised how complex observability can be. Fortunately, we were recently introduced to groundcover, a full-stack observability tool built for Kubernetes. Groundcover fulfils the three pillars of observability – metrics, traces and logs – and much more besides. This made us curious to learn more.

How groundcover supports the best practices of observability

 

 

1.The importance of golden signals of monitoring for our system

 

First popularised in a Google SRE book, golden signals of monitoring are critical metrics  for system reliability management which is paramount for Kubernetes. To discuss all of the golden signals of monitoring and how it affects Kubernetes would require an entire blog of its own. These signals—latency, traffic, errors, and saturation—are revealed by integrating groundcover with Kubernetes clusters. For example, saturation metrics have helped identify nodes that are overloaded, while latency, error, and traffic metrics have helped identify issues with specific workloads, enabling swift and informed responses to maintain application reliability.

 

 

2. The benefit of having centralised observability platform

 

Having a centralised observability platform means involving aggregation of data from multiple data sources and unified tooling to have a full visibility of distributed systems. It also ensures that all kinds of data and metrics, whether structured or semi-structured data across the whole Kubernetes deployment, are retrievable. With all these put into one observability platform, it makes it easy to gain end-to-end insights and address all issues in the system.

 

 

While a lot of observability tools do a good job in their respective functionalities, we found that it is much better if we have a context of what causes that error in particular. Groundcover exemplifies such a platform. It not only aids in debugging by offering detailed insights but also contextualises errors, linking them to user interactions and application traces. It is able to help us gain better visibility for the system and faster debugging.

 

 

3. How a proper log management system helps us to reduce MTTR

 

Unlike in traditional monolithic architecture where logs can be easily retrieved, logging in Kubernetes brings up many different challenges unseen before. Huge amounts of logs from disparate sources; diverse formatting from application and infrastructure; effective logging in a real-time manner are some of the challenges that Kubernetes logging systems face. An efficient log management system can empower the Kubernetes operational experience.

 

While Kubernetes itself has an established methodology and best practices for log collection, it doesn’t provide a log aggregation tool. Whether you like it or not, Kubernetes administrators are opting to use third-party cluster log aggregators needing to deploy an agent for collection. However, groundcover does not. Groundcover, again, uses the power of eBPF for log collection on cluster, node and and pod level. This provides the granularity that log management systems are lacking, without any prior customization. It has proven to be very helpful, since we saw all the node issues in one place, helping us to understand and debug the issues in both for the applications and the nodes.

Conclusion🍃

 

 

At the end of the day, it’s both comforting and useful to have a tool which is decisively on your side when it comes to managing something as complex as Kubernetes. For all too long now, engineers have had to rely on makeshift solutions duct-taped together in an attempt to get a full picture of their deployments. As well as the very real benefits we discovered above, we also enjoyed much enhanced peace of mind that we’re doing things right with this deployment.

 

There are still many features to discover using groundcover, such as service endpoints; integration with OpenTelemetry, Istio, Prometheus; automating custom metrics scraping; and utilising new alerting features. We can’t wait to explore further!

#observability #groundcover #kubernetes #goldensignals #ebpf #debugging  #containers #megazonecloud #aws #awspremierpartner

Written by Theodore Fabian , Associate Cloud Solutions Architect, MegazoneCloud Hong Kong