Memory leaks dumps and CPU traces¶

Overview¶

Sometimes there are issues that are only in prod, or only when a service runs in the Kubernetes cluster. One such typical example is when a service uses too much memory, or it looks like memory is never released and it looks like there is a memory leak. Other examples might be performance issues that are difficult to evaluate when running locally.

There are different ways to troubleshoot these issues. One way is to create a dump of the process memory.

kubectl debug¶

The easiest and best is to use kubectl debug. In that way there's no need to restart the container, instead you can connect to a running one and do the dump with dotnet-dump directly.

Obsolete instructions below¶

Start here - to setup dotnet-monitor¶

dotnet-monitor is a useful tool that can create a memory dump and also display other useful statistics of a running dotnet program. The below steps will use its HTTP API to dump the memory.

Install a sidecar of dotnet-monitor in the cluster on the pod with problems.
- Installation instructions are available here: Kubernetes setup
- Note that the Microsoft example assumes user and group 1654, but we run our main containers with user and group 1000. The easiest is to change the user and group used by the monitor container to 1000 as well.
Setup port forwarding of dotnet-monitor to the local machine (Note that not all developers have access to port forward. If so, please ask devops team or Victor for help).
- > kubectl get pods (get list of pods. Note that if a sidecar is running, it will show 2/2 instead of 1/1)
- > kubectl port-forward pods/minority-auditing-service-68bfc98d56-2t7zz 8085:52323 -n minority (Forward a specific pod to localhost. Replace auditing pod with pod id from previous step)
Check that it is working:
- > curl http://localhost:8085/info (should show some info about the dotnet-monitor tool)
- > curl http://localhost:8085/processes (should show the dotnet process, otherwise something in the setup is broken)

Steps to collect a performance trace¶

Create a trace:
- > curl "http://localhost:8085/trace?durationSeconds=5&profile=Cpu,Http,Metrics" -o trace.nettrace
- Note that you can increase the durationSeconds to get a longer profile which might be required to gather enough data.
- Also note that since the pods are running in k8s on a node with many other nodes, the timings will be very small, making it quite difficult to analyze.
The resulting file can be opened in different tools such as
- Visual Studio
- PerfView
- others...

Steps to dump memory and analyze¶

Create a dump:
- > curl "http://localhost:8080/dump?type=Mini" -o mini1.dmp (first try with Mini to validate that it works)
- Open the dump in Visual Studio to verify it van be opened. It should contain at least call-stack and threads.
- Verify which type of dump that contain the information you need: Dump Types
- > curl "http://localhost:8080/dump?type=Full" -o fulldump1.dmp (probably you want Full. Note that it will take a long time to download since its size is the full memory)
The dump file can be analyzed in different tools such as
- Visual Studio (more or less same experience as when you debug a locally running program)
- Windbg (hard core and more advanced than VS)
- dotMemory
Sometimes it is useful to repeat the process after a while, to identify the difference in memory usage, e.g. what type of memory increased.