ASP.NET Core Distributed Tracing
Introduction
Distributed tracing is a technique that allows us to detect failures and performance issues in distributed applications. For example, you send a request to a web app and this web app needs, in turn, to talk to one or more microservices, and possibly some of these microservices need also to talk to others. Distributed tracing is a way to correlate all these calls under the same umbrella. This is usually achieved through special request headers that must be present for distributed tracing to take place. ASP.NET Core offers mechanisms to make our lives easier when working with these headers, which is an implementation of the OpenTelemetry standard. OpenTelemetry is an observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs. For now, we'll focus on the building blocks of distributed tracing. You can think of a distributed trace as a transaction with an identifier and many operations, which are ocorring on different machines.
W3C Standard Headers
According to version 2 of the W3C specification, there are two HTTP request headers that can be used to engage in distributed tracing:
- traceparent: contains required information about the incoming request in a distributed tracing system
- tracestate: extends traceparent with a set of arbitrary name/value pairs
One example for setting the traceparent and tracestate headers on an HTTP request:
GET /Some/Endpoint HTTP/2.0
traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: version=1.0;foo=bar
...
The traceparent header has a well-defined structure:
<version>-<trace-id>-<parent-id>-<trace-flags>
In this example, we have:
- version: 00 (not used for now)
- trace-id: 0af7651916cd43dd8448eb211c80319c
- parent-id: b7ad6b7169203331 (also known as span-id)
- trace-flags: 01 (sampled flag)
As to each component's meaning:
- trace-id: This is the ID of the whole trace forest and is used to uniquely identify a distributed trace through a system. It is represented as a 16-byte array
- parent-id: This is the ID of this request as known by the caller (in some tracing systems, this is known as the span-id, where a span is the execution of a client request). It is represented as an 8-byte array
- trace-flags: An 8-bit field that controls tracing flags such as sampling, trace level, etc
Now, let's jump to the .NET implementation.
The Activity Class
The Activity class in .NET can be used to automatically parse the distributed trace headers. It is not necessarily bound to a web context, it can be used to track and measure transactions and individual operations inside them in non-web applications, something of which we'll talk on a future post. The Activity has many properties and can be associated with a parent Activity. For now, let's discuss it just in the scope of distributed tracing.
Normally, in a web app context (ASP.NET Core), the Activity.Current static property points to the current activity, which, as mentioned earlier, will have some of its properties set from the incoming headers, unless this behaviour is explictly disabled. Another way to obtain the current Activity is by leveraging the ASP.NET Core request features, of which I talked in a blog post not too long ago, specifically, the feature we're interested in is the IHttpActivityFeature and its Activity property:
var activityFromFeature = HttpContext.Features.Get<IHttpActivityFeature>().Activity;
//or
var currentActivity = Activity.Current;
If the distributed headers mentioned earlier are set, instances obtained using each of these processes will be identical. If the traceparent header is set, the following properties of the Activity instance are automatically filled:
Activity Property | Activity Value |
ActivityTraceFlags | 1 (ActivityTraceFlags.Recorded, which is the sampled flag) |
Id | "00-0af7651916cd43dd8448eb211c80319c-48d281a19ed3f18d-01" |
OperationName | "Microsoft.AspNetCore.Hosting.HttpRequestIn" |
ParentId | "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" |
ParentSpanId | {b7ad6b7169203331} |
RootId | "0af7651916cd43dd8448eb211c80319c" |
SpanId | {48d281a19ed3f18d} |
TraceId | {0af7651916cd43dd8448eb211c80319c} |
TraceStateString | "version=1.0;foo=bar" |
You can see that the ParentId property exactly matches the whole value passed as the traceparent header, the SpanId property is generated at runtime and will be different every time, which means that the Id composite property will change too. RootId and ParentSpanId are the trace-id and parent-id parts of traceparent, trace-flags is mapped to ActivityTraceFlags, and TraceStateString will contain whatever is passed as the tracestate header. version is not mapped at all. The OperationName will have the default value "Microsoft.AspNetCore.Hosting.HttpRequestIn".
Besides holding data as properties, an Activity can be started and stopped. You can also attach events (Events) and arbitrary data (Baggage), tags (TagObject) to it, start and stop it, and see it's time span (Duration). If you need to, you either create an Activity instance directly using the parameterless constructor or from an existing one, which will be the parent Activity, but this will be covered on a future post. For now it is sufficient to know that the Activity instance is started, and can be used as the basis for child activities.
To learn more about activities and distributed tracing in .NET, please read this.
Request Identifier vs Activity Identifier
A feature that is often confused with the trace identifier is the HttpContext.TraceIdentifier property. It can be set directly on the HttpContext instance or through the IHttpRequestIdentifierFeature feature. Essentially, this is a way to identify the current request, from a local point of view, and different web server implementations (Kestrel, Kestrel with HTTP/3, IIS, HTTP.sys, OWIN) use different ID generation strategies, which are essentially opaque and don't mean nothing outside the scope of a request. Not to be confused with the Activity.Id property, as it is not automatically set from the request headers (although can be set manually too). The HttpContext.Connection.Id property features essentially the same value, minus a counter used by Kestrel. Even though they are not correlated in any way, it is common to see code like this in ASP.NET Core to identify a request:
RequestId = Activity.Current?.Id ?? HttpContext.TraceIdentifier;
If for some reason the Activity.Current property is not set, it will fallback to HttpContext.TraceIdentifier. Again, this is not to be confused with distributed tracing IDs, it's just a way to identify the current request uniquely.
Headers Propagation
Because we're talking about distributed tracing with a common correlation id, it is important that we send this id on all requests to other services that we want to enlist in our distributed system. For that, the HttpClient class knows that it must send the traceparent and tracestate headers automatically, getting its values from Activity.Current, if it is set.
This code:
HttpClient httpClient = new();
httpClient.BaseAddress = new Uri("<some url>");
await httpClient.SendAsync("");
Results in these headers being sent transparently:
The headers' values are, of course, coming from the Activity.Current instance.
There are more headers that can be sent automatically, and you have a full list here, and these are actually W3C recommendations for OpenTelemetry.
Now, if for some reason you wish to disable this behaviour, you have a couple options:
- Setting Activity.Current null before making the request:
Activity.Current = null;
- Setting the environment variable DOTNET_SYSTEM_NET_HTTP_ENABLEACTIVITYPROPAGATION to "false" or "0" on the host environment
- Setting switch "System.Net.Http.EnableActivityPropagation" to false in AppContext:
AppContext.SetSwitch("System.Net.Http.EnableActivityPropagation", false);
- You can also disable the logging source "Microsoft.AspNetCore.Hosting.Diagnostics" from configuration, on the target machine:
"Logging": {
"LogLevel": {
"Microsoft.AspNetCore.Hosting.Diagnostics": "None"
}
}
- Finally, if we manually add a SocketsHttpHandler (the default message handler) to the HttpClient, we can configure its ActivityHeadersPropagator property to null
var socketsHttpHandler = new SocketsHttpHandler
{
ActivityHeadersPropagator = null
};
var httpClient = new HttpClient(socketsHttpHandler);
If, of course, you use other class to send HTTP requests to other services, you will need to send the headers yourself.
Trace Information in Logs
Trace information is made available to all logs automatically, as requested by tickets https://github.com/dotnet/runtime/issues/31372 and https://github.com/dotnet/runtime/issues/34305. You need to configure logging so as to include this information, as described here.
builder.Logging.Configure(static options =>
{
options.ActivityTrackingOptions = ActivityTrackingOptions.SpanId |
ActivityTrackingOptions.TraceId |
ActivityTrackingOptions.ParentId |
ActivityTrackingOptions.Baggage |
ActivityTrackingOptions.Tags;
});
See the LoggerFactoryOptions.ActivityTrackingOptions property (ActivityTrackingOptions enumeration).
Conclusion
I think we just touched the surface here in regards to distributed tracing using .NET and ASP.NET Core. I plan to write a few more posts about logging, distributed tracing, OpenTelemetry, and activities very soon, so, if this interests you, stay tuned!
Comments
Post a Comment