OpenTelemetry Python Implementation Deep Dive
OpenTelemetry is a set of libraries, agents and other components that enable the generation and collection of telemetry data. Applications and libraries use the OpenTelemetry API to record information (telemetry) about the different performed operations. The telemetry data usually includes the start and end times, attributes for the operation (headers in an http request for instance), type of operation, result, etc. Our previous A Shallow Dive Into Distributed Tracing blog-post presents a nice introduction to distributed tracing and OpenTelemetry.
This blog post shows the different techniques used to implement instrumentation libraries in Python. It’s intended for developers that are curious about the internal details of OpenTelemetry as well as for new OpenTelemetry contributors who want to get a general idea without having to dig too deeply into the code. It’s not intended for end users. A better place for them to start is reading the official OpenTelemetry Python documentation. This blog post is specific to OpenTelemetry Python (the language I worked in), the solutions implemented in the different languages could be quite different due to the intrinsic characteristics of each language.
Tracing information is generated by the application itself and third party libraries. A dream goal of OpenTelemetry would be to have all libraries use the OpenTelemetry API (built-in support). At the time of writing this blog post there is not any known Python third party library with built-in support for OpenTelemetry, however, some library developers are starting to show some interest in providing it. We are aware that many libraries will not include built-in support. In those cases, a separate instrumentation library wraps the third party library (instrumented library) and makes the calls to the OpenTelemetry API.
OpenTelemetry Python Instrumentation Libraries
OpenTelemetry Python provides instrumentation libraries (also called integrations) for some of the most popular third party libraries. A developer can import and enable the generation of telemetry data without worrying about how it’s implemented. Instrumentation libraries are shipped as independent Python packages. Those integrations provide an Instrumentor
class that defines the instrument()
and uninstrument()
methods that enable/disable the telemetry data generation. In order to enable the integration, an application developer should call the instrument()
method before performing any operation with the library.
The following is a code snippet of a complete example.
import requests
from opentelemetry.ext.requests import RequestsInstrumentor
# TODO: configure exporters
# Enable instrumentation in the requests library
RequestsInstrumentor().instrument()
# This call will generate telemetry data
response = requests.get(url="https://kinvolk.io/")
This approach is intended for application developers that are already using OpenTelemetry in their applications and want to enable tracing reporting on the third libraries they are using.
Let's see how it's implemented under the hook. An instrumentation library intercepts the internal calls of the instrumented library and invokes the OpenTelemetry API to generate telemetry data about the ongoing operation. There are two ways to intercept the internal instrumented library calls in Python, the first one is to use a hook mechanism provided by some libraries, the second one is to use monkey patching to modify the runtime behaviour of the library without changing the code.
Hook Mechanisms
Some libraries provide a hook mechanism that allows a developer to register a set of callbacks that are invoked by the library when it performs some operations. In some cases the information about the ongoing operation is passed as argument to the callbacks, in other cases it has to be accessed through a specific API provided by the library.
An example of this kind of integration is Flask. This library provides the before_request and the tear_down_request hooks. The OpenTelemetry Python Flask integration registers two callbacks that record the telemetry data.
The _before_request
callback attaches the incoming distributed context, records the attributes of the incoming HTTP request, starts a span and saves it to be used in the _teardown_request
callback.
def _before_request():
# Get access to the context of this request
environ = flask.request.environ
# Attach (enable) distributed tracing context if any
# This operation is only meaningful in server libraries
token = context.attach(
propagators.extract(otel_wsgi.get_header_from_environ, environ)
)
# Helper function to collect the attributes from an HTTP request
attributes = otel_wsgi.collect_request_attributes(environ)
span_name = flask.request.endpoint or otel_wsgi.get_default_span_name(
environ
)
# Start and activate a span indicating the HTTP request operation handling
# in the server starts here
span = tracer.start_span(
span_name,
kind=trace.SpanKind.SERVER,
attributes=attributes,
start_time=environ.get(_ENVIRON_STARTTIME_KEY),
)
activation = tracer.use_span(span, end_on_exit=True)
activation.__enter__()
# Use this request context to save these objects until the operation
# completes
environ[_ENVIRON_ACTIVATION_KEY] = activation
environ[_ENVIRON_SPAN_KEY] = span
environ[_ENVIRON_TOKEN] = token
The _teardown_request
callback finishes the span started above and detaches the distributed context.
def _teardown_request(exc):
activation = flask.request.environ.get(_ENVIRON_ACTIVATION_KEY)
# Finish the span indicating the handling of the operation is completed
if exc is None:
activation.__exit__(None, None, None)
else:
activation.__exit__(
type(exc), exc, getattr(exc, "__traceback__", None)
)
# Detach the distributed context
context.detach(flask.request.environ.get(_ENVIRON_TOKEN))
Those callbacks are registered in the internal _InstrumentedFlask
class.
class _InstrumentedFlask(flask.Flask):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._original_wsgi_ = self.wsgi_app
self.wsgi_app = _rewrapped_app(self.wsgi_app)
self.before_request(_before_request)
self.teardown_request(_teardown_request)
Finally, the instrument
method overwrites the flask.Flask
class to _InstrumentedFlask
, hence Flask apps created by the user are instrumented.
class FlaskInstrumentor(BaseInstrumentor):
def _instrument(self, **kwargs):
self._original_flask = flask.Flask
flask.Flask = _InstrumentedFlask
def _uninstrument(self, **kwargs):
flask.Flask = self._original_flask
Other examples of integrations using library specific hooks are pymongo, grpc and aiohttp
The hooks approach should be preferred when it's available as it avoids relying on internal implementation details of the instrumented library. A possible drawback of this is that all the wanted information iscould not be available in the hook API provided by the library.
Monkey Patching
Unfortunately, hooks are not available in all cases; they could not be provided at all or the provided ones could be not enough for the instrumentation library to do its work. Fortunately, Python is such a flexible language and monkey patching can be used to intercept the instrumented library calls to invoke custom callbacks defined in the instrumentation library.
The most frequently used technique is to define some wrap callbacks that intercept the instrumented library calls to gather the tracing information. One example of such an integration is requests. The requests.send()
function is wrapped, i.e., each time the user calls this function, a defined wrapper is called. It records all the tracing information and then invokes the original requests.send()
method.
def _instrument(tracer_provider=None):
wrapped = Session.request
# Define the callback that will record the tracing information
@functools.wraps(wrapped)
def instrumented_request(self, method, url, *args, **kwargs):
with tracer.start_as_current_span(path, kind=SpanKind.CLIENT) as span:
# Record some attributes before performing the request
span.set_attribute("component", "http")
span.set_attribute("http.method", method.upper())
span.set_attribute("http.url", url)
# Propagate distributed context
headers = kwargs.setdefault("headers", {})
propagators.inject(type(headers).__setitem__, headers)
# Call original requests.send() function
result = wrapped(self, method, url, *args, **kwargs)
# Collect some other attributes after performing the request
span.set_attribute("http.status_code", result.status_code)
span.set_attribute("http.status_text", result.reason)
span.set_status(
Status(_http_status_to_canonical_code(result.status_code))
)
return result
# Monkey patch Session.request to be instrumented_request
Session.request = instrumented_request
Other examples of this kind of integration are redis, db-api and sqlalchemy.
The biggest drawback of this method is that it relies on internal details of the library, an internal change in the library could break the integration at any time.
Auto-instrumentation
Auto-instrumentation is a mechanism that allows you to get traces from applications that don't have any instrumentation. Yes, you read it correctly! The Python Auto-instrumentation with OpenTelemetry blog post presents it with a nice example.
The opentelemetry-auto-instrumentation
command automatically detects the installed instrumented libraries and enables them. The following command will produce tracing information if myapplication.py
uses any library that has an instrumentation library installed on the system.
$ opentelemetry-auto-instrumentation python myapplication.py
The idea behind auto-instrumentation is to use the instrumentation libraries described above to make the third party libraries emit tracing information without the user changing the code of the application, i.e., enabling the instrumentation libraries automatically. The problem can be divided in two: (1) how can we detect all the installed instrumentation libraries and (2) how can we execute arbitrary code (invoke instrument()
in the different instrumentation libraries) before the user application is run?
The first problem is solved using entry points, a mechanism to expose a callable object to other code in Python. In the OpenTelemetry Python context, the instrumentation libraries expose the opentelemetry_instrumentor
so it can be invoked from other components.
# snippet from setup.cfg
opentelemetry_instrumentor =
requests = opentelemetry.ext.requests:RequestsInstrumentor
For the second problem, Python provides the module site that is automatically imported during initialization and allows us to perform
arbitrary specific customizations by importing a sitecustomize
module. OpenTelemetry Python defines its own sitecustomize
module that calls instrument()
in all the installed
integrations by looking for all the opentelemetry_instrumentor
entry points
available.
for entry_point in iter_entry_points("opentelemetry_instrumentor"):
try:
entry_point.load()().instrument() # type: ignore
logger.debug("Instrumented %s", entry_point.name)
except Exception: # pylint: disable=broad-except
logger.exception("Instrumenting of %s failed", entry_point.name)
The opentelemetry-auto-instrumentation command just prepends the OpenTelemetry Python sitecustomize
module path to PYTHONPATH
to make sure it's executed before the application.
What's next?
The auto-instrumentation mechanism currently only enables the instrumentation in the different libraries but doesn't perform any configuration on the span exporters. Strictly speaking it's not currently possible to use auto-instrumentation without touching the code because it does not export any trace. The OpenTelemetry team is aware of this and there is an issue tracking the effort aimed at solving this limitation. Once the team agrees in a configuration format (env variables, configuration file, etc.), the implementation should be straightforward, the sitecustomize
module already used could do all the span exporters initialization that is needed.
The integration of auto-instrumentation and technologies like Kubernetes could open the door to new amazing possibilities. For instance, a user could enable the trace reporting in a containerized Python application by writing a ConfigMap
or some env variables in the application's manifest. This could be taken a step further in Lokomotive Kubernetes, pieces like exporters, collectors, etc. needed to have a complete tracing and monitoring infrastructure could be a Lokomotive component. It could be even possible to enable the applications to report traces without almost any user intervention.
We really enjoyed contributing to the OpenTelemetry project and we are looking forward to integrating its amazing features to our products.