OpenTelemetry Python Implementation Deep Dive

OpenTelemetry is a set of libraries, agents and other components that enable the generation and collection of telemetry data. Applications and libraries use the OpenTelemetry API to record information (telemetry) about the different performed operations. The telemetry data usually includes the start and end times, attributes for the operation (headers in an http request for instance), type of operation, result, etc. Our previous A Shallow Dive Into Distributed Tracing blog-post presents a nice introduction to distributed tracing and OpenTelemetry.

This blog post shows the different techniques used to implement instrumentation libraries in Python. It’s intended for developers that are curious about the internal details of OpenTelemetry as well as for new OpenTelemetry contributors who want to get a general idea without having to dig too deeply into the code. It’s not intended for end users. A better place for them to start is reading the official OpenTelemetry Python documentation. This blog post is specific to OpenTelemetry Python (the language I worked in), the solutions implemented in the different languages could be quite different due to the intrinsic characteristics of each language.

Tracing information is generated by the application itself and third party libraries. A dream goal of OpenTelemetry would be to have all libraries use the OpenTelemetry API (built-in support). At the time of writing this blog post there is not any known Python third party library with built-in support for OpenTelemetry, however, some library developers are starting to show some interest in providing it. We are aware that many libraries will not include built-in support. In those cases, a separate instrumentation library wraps the third party library (instrumented library) and makes the calls to the OpenTelemetry API.

OpenTelemetry Python Instrumentation Libraries

OpenTelemetry Python provides instrumentation libraries (also called integrations) for some of the most popular third party libraries. A developer can import and enable the generation of telemetry data without worrying about how it’s implemented. Instrumentation libraries are shipped as independent Python packages. Those integrations provide an Instrumentor class that defines the instrument() and uninstrument() methods that enable/disable the telemetry data generation. In order to enable the integration, an application developer should call the instrument() method before performing any operation with the library.

The following is a code snippet of a complete example.

import requests
from opentelemetry.ext.requests import RequestsInstrumentor

# TODO: configure exporters

# Enable instrumentation in the requests library
RequestsInstrumentor().instrument()

# This call will generate telemetry data
response = requests.get(url="https://kinvolk.io/")

This approach is intended for application developers that are already using OpenTelemetry in their applications and want to enable tracing reporting on the third libraries they are using.

Let's see how it's implemented under the hook. An instrumentation library intercepts the internal calls of the instrumented library and invokes the OpenTelemetry API to generate telemetry data about the ongoing operation. There are two ways to intercept the internal instrumented library calls in Python, the first one is to use a hook mechanism provided by some libraries, the second one is to use monkey patching to modify the runtime behaviour of the library without changing the code.

Hook Mechanisms

Some libraries provide a hook mechanism that allows a developer to register a set of callbacks that are invoked by the library when it performs some operations. In some cases the information about the ongoing operation is passed as argument to the callbacks, in other cases it has to be accessed through a specific API provided by the library.

An example of this kind of integration is Flask. This library provides the before_request and the tear_down_request hooks. The OpenTelemetry Python Flask integration registers two callbacks that record the telemetry data.

The _before_request callback attaches the incoming distributed context, records the attributes of the incoming HTTP request, starts a span and saves it to be used in the _teardown_request callback.

def _before_request():
    # Get access to the context of this request
    environ = flask.request.environ

    # Attach (enable) distributed tracing context if any
    # This operation is only meaningful in server libraries
    token = context.attach(
        propagators.extract(otel_wsgi.get_header_from_environ, environ)
    )

    # Helper function to collect the attributes from an HTTP request
    attributes = otel_wsgi.collect_request_attributes(environ)

    span_name = flask.request.endpoint or otel_wsgi.get_default_span_name(
        environ
    )

    # Start and activate a span indicating the HTTP request operation handling
    # in the server starts here
    span = tracer.start_span(
        span_name,
        kind=trace.SpanKind.SERVER,
        attributes=attributes,
        start_time=environ.get(_ENVIRON_STARTTIME_KEY),
    )
    activation = tracer.use_span(span, end_on_exit=True)
    activation.__enter__()

    # Use this request context to save these objects until the operation
    # completes
    environ[_ENVIRON_ACTIVATION_KEY] = activation
    environ[_ENVIRON_SPAN_KEY] = span
    environ[_ENVIRON_TOKEN] = token

The _teardown_request callback finishes the span started above and detaches the distributed context.

def _teardown_request(exc):
    activation = flask.request.environ.get(_ENVIRON_ACTIVATION_KEY)

    # Finish the span indicating the handling of the operation is completed
    if exc is None:
        activation.__exit__(None, None, None)
    else:
        activation.__exit__(
            type(exc), exc, getattr(exc, "__traceback__", None)
        )

    # Detach the distributed context
    context.detach(flask.request.environ.get(_ENVIRON_TOKEN))

Those callbacks are registered in the internal _InstrumentedFlask class.

class _InstrumentedFlask(flask.Flask):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self._original_wsgi_ = self.wsgi_app
        self.wsgi_app = _rewrapped_app(self.wsgi_app)

        self.before_request(_before_request)
        self.teardown_request(_teardown_request)

Finally, the instrument method overwrites the flask.Flask class to _InstrumentedFlask, hence Flask apps created by the user are instrumented.

class FlaskInstrumentor(BaseInstrumentor):
    def _instrument(self, **kwargs):
        self._original_flask = flask.Flask
        flask.Flask = _InstrumentedFlask

    def _uninstrument(self, **kwargs):
        flask.Flask = self._original_flask

Other examples of integrations using library specific hooks are pymongo, grpc and aiohttp

The hooks approach should be preferred when it's available as it avoids relying on internal implementation details of the instrumented library. A possible drawback of this is that all the wanted information iscould not be available in the hook API provided by the library.

Monkey Patching

Unfortunately, hooks are not available in all cases; they could not be provided at all or the provided ones could be not enough for the instrumentation library to do its work. Fortunately, Python is such a flexible language and monkey patching can be used to intercept the instrumented library calls to invoke custom callbacks defined in the instrumentation library.

The most frequently used technique is to define some wrap callbacks that intercept the instrumented library calls to gather the tracing information. One example of such an integration is requests. The requests.send() function is wrapped, i.e., each time the user calls this function, a defined wrapper is called. It records all the tracing information and then invokes the original requests.send() method.

def _instrument(tracer_provider=None):
    wrapped = Session.request

    # Define the callback that will record the tracing information
    @functools.wraps(wrapped)
    def instrumented_request(self, method, url, *args, **kwargs):
        with tracer.start_as_current_span(path, kind=SpanKind.CLIENT) as span:
            # Record some attributes before performing the request
            span.set_attribute("component", "http")
            span.set_attribute("http.method", method.upper())
            span.set_attribute("http.url", url)

            # Propagate distributed context
            headers = kwargs.setdefault("headers", {})
            propagators.inject(type(headers).__setitem__, headers)

            # Call original requests.send() function
            result = wrapped(self, method, url, *args, **kwargs)

            # Collect some other attributes after performing the request
            span.set_attribute("http.status_code", result.status_code)
            span.set_attribute("http.status_text", result.reason)
            span.set_status(
                Status(_http_status_to_canonical_code(result.status_code))
            )

            return result

    # Monkey patch Session.request to be instrumented_request
    Session.request = instrumented_request

Other examples of this kind of integration are redis, db-api and sqlalchemy.

The biggest drawback of this method is that it relies on internal details of the library, an internal change in the library could break the integration at any time.

Auto-instrumentation

Auto-instrumentation is a mechanism that allows you to get traces from applications that don't have any instrumentation. Yes, you read it correctly! The Python Auto-instrumentation with OpenTelemetry blog post presents it with a nice example.

The opentelemetry-auto-instrumentation command automatically detects the installed instrumented libraries and enables them. The following command will produce tracing information if myapplication.py uses any library that has an instrumentation library installed on the system.

$ opentelemetry-auto-instrumentation python myapplication.py

The idea behind auto-instrumentation is to use the instrumentation libraries described above to make the third party libraries emit tracing information without the user changing the code of the application, i.e., enabling the instrumentation libraries automatically. The problem can be divided in two: (1) how can we detect all the installed instrumentation libraries and (2) how can we execute arbitrary code (invoke instrument() in the different instrumentation libraries) before the user application is run?

The first problem is solved using entry points, a mechanism to expose a callable object to other code in Python. In the OpenTelemetry Python context, the instrumentation libraries expose the opentelemetry_instrumentor so it can be invoked from other components.

# snippet from setup.cfg
opentelemetry_instrumentor =
    requests = opentelemetry.ext.requests:RequestsInstrumentor

For the second problem, Python provides the module site that is automatically imported during initialization and allows us to perform arbitrary specific customizations by importing a sitecustomize module. OpenTelemetry Python defines its own sitecustomize module that calls instrument() in all the installed integrations by looking for all the opentelemetry_instrumentor entry points available.

for entry_point in iter_entry_points("opentelemetry_instrumentor"):
    try:
        entry_point.load()().instrument()  # type: ignore
        logger.debug("Instrumented %s", entry_point.name)
    except Exception:  # pylint: disable=broad-except
        logger.exception("Instrumenting of %s failed", entry_point.name)

The opentelemetry-auto-instrumentation command just prepends the OpenTelemetry Python sitecustomize module path to PYTHONPATH to make sure it's executed before the application.

What's next?

The auto-instrumentation mechanism currently only enables the instrumentation in the different libraries but doesn't perform any configuration on the span exporters. Strictly speaking it's not currently possible to use auto-instrumentation without touching the code because it does not export any trace. The OpenTelemetry team is aware of this and there is an issue tracking the effort aimed at solving this limitation. Once the team agrees in a configuration format (env variables, configuration file, etc.), the implementation should be straightforward, the sitecustomize module already used could do all the span exporters initialization that is needed.

The integration of auto-instrumentation and technologies like Kubernetes could open the door to new amazing possibilities. For instance, a user could enable the trace reporting in a containerized Python application by writing a ConfigMap or some env variables in the application's manifest. This could be taken a step further in Lokomotive Kubernetes, pieces like exporters, collectors, etc. needed to have a complete tracing and monitoring infrastructure could be a Lokomotive component. It could be even possible to enable the applications to report traces without almost any user intervention.

We really enjoyed contributing to the OpenTelemetry project and we are looking forward to integrating its amazing features to our products.

OpenTelemetry Python Instrumentation Libraries​

Hook Mechanisms​

Monkey Patching​

Auto-instrumentation​

What's next?​

OpenTelemetry Python Instrumentation Libraries

Hook Mechanisms

Monkey Patching

Auto-instrumentation

What's next?