5  Logging and Monitoring

Jobs fail. Apps mysteriously quit. These things happen.

When they do, I’d much rather find out from an automated email or slack message than an angry message from the CTO.

The state of knowing how things are going with your system and being able to reconstruct what happened when something (inevitably) does go awry is generally called observability in DevOps lingo.

When software engineers think about observability, they’re generally thinking about observing the operational qualities of their app. When they think about errors or performance, they’re thinking in terms of actual code error states that crash the app and performance is basically synonymous with the app being fast enough for users. As a data scientist, you should care about these things.

Understanding access patterns for your app is important. It’s useful to know if your app is getting usage and if so whether people are using certain parts. You want to make sure they’re having a good experience when they do so.

Knowing that people are using your project is important to understand if there are extensions to your project that might be useful and also to make the case inside your organization that you are doing valuable work. It’s also useful in the case that your organization requires an audit trail or who read or wrote particular data.

Unlike when you’re running code right in your terminal, you’re not going to be paying close attention to code running in production. If (when, really) there’s an error, you may not get to it right away, and you’ll want the necessary information to re-create the state of the system just before the error occurred. Knowing that an error has occurred and what the system looked like when an error occurred is extremely useful to figure out why it happened.

In addition to these operational concerns you share with software engineers, there’s an additional level of complexity to observability related to the actual data science profile of your app. An error doesn’t always mean that your app crashes – it could just return an incorrect result, which is often scarier. For example, your ETL pipeline could complete on time, but data quality could’ve deteriorated so much that the output data is basically meaningless. Or your machine learning model could have fallen out of alignment, so it’s still returning predictions – just wrong ones.

There are two general practices that enable observability of software – logging and monitoring.

Logging is the process of documenting and saving a record of what happened when a piece of software ran in the form of a log. Logs are useful for documenting what’s happening inside a software system under both normal operations and reconstructing the sequence of events prior to a failure.

Monitoring is the process of generating and tracking metrics of how the system is doing. Metrics are quantitative measurements of the system, which include things like CPU, RAM, and disk usage as well as numbers of users and response times. Monitoring involves making sure those metrics are within normal operating parameters, and alerting the right people when – or ideally before – something goes wrong.

For both monitoring and logging, there are two separate things to think about: how does your system emit logs and metrics and how do you consume those logs and metrics to make them useful?

The good news it that as a data scientist, you’re probably very familiar with one form of logging.

A lot of data science work is done in a literate programming format – Jupyter Notebooks, R Markdown Documents, and Quarto Documents. If used right, the beauty of these documents is that they are self-logging in most cases!

In one of my spicier opinions, data scientists should do all jobs in a literate programming format. If you write your code in a literate programming format and include metrics in the document, you’ve got a self-logging job.

Literate ETL Jobs

For ETL jobs in particular, there are two things I find are always essential to log.

The first is the quality of data joins. If you suddenly have a lot of non-matches – or duplicate matches, things can get very, very weird.

The second is cross-tabulations before and after recoding the values of a variable. Making sure your recode logic does what you think and that the values coming in aren’t changing over time has saved my butt many times.

But we don’t get to ignore the software engineering concerns of a good service experience for our users. This makes observability even more important in a data science context than in a pure software engineering one. You’ll still need to think about monitoring metrics and how to log in the event of an actual code failure, but you’ve at least documented the data science qualities of your work.

5.1 Working with logs and metrics

I’m assuming you already have a good sense of how to include helpful tabulations in your literate programming formats, so the rest of this chapter is going to be able logging and monitoring as distinct from just including the right output in your Jupyter Notebook, Quarto document, or R Markdown document.

Monitoring and logging are somewhat nascent practices for data scientists. Part of the purpose of this chapter is to convince you they shouldn’t be! And to show you that they’re really not that hard.

Emitting logs can be as simple as just putting print statements throughout your code – but there are much better ways.

And you’re a data scientist – if you’re putting information into your logs, it’s probably not hard to scrape that data into structured metrics data should the need ever arise. That said, the barrier to entry for actually emitting metrics is way lower than you might realize, and most data scientists should probably pay more attention to emitting metrics than they do.

Whether you need to emit metrics is a more subtle question. It really depends on the criticality of your system. In many data science contexts, you aren’t going to do monitoring on a real-time basis, so emitting metrics that can be used by a real-time monitoring system is less important than for other types of software.

5.1.1 How to write logs

Python has a standard logging package that is excellent and you should use that. R does not have a standard logging package, but there are a number of excellent CRAN packages for logging. My personal favorite is log4r.

Every time you log something, that is a log entry. Every log entry has log data – that would be a description of what’s being logged, perhaps something like “Button A Pressed”. Every entry also has a log level. The log level describes how scary the thing you’re logging is.

Most logging libraries have 5-7 levels of logging. Both the Python logging library and log4r use these five levels from most to least scary:

  1. Critical: an error so big that the app itself shuts down. For example, if your app cannot run without a connection to an outside service, you might log an inability to connect as a Critical error.
  2. Error: an issue that will make an operation not work, but that won’t bring down your app. In the language of software engineering, you might think of this as a caught exception. An example might be a user submitting invalid input.
  3. Warn/Warning: an unexpected application issue that isn’t fatal. For example, you might include having to retry doing something or noticing that resource usage is high. If something were to go wrong later, these might be helpful breadcrumbs to look at.
  4. Info: something normal happened in the app. These record things like starting and stopping, successfully making database and other connections, and configuration options that are used.
  5. Debug: a deep record of what the app was doing. The debug log is meant for people who understand the app code to be able to understand what functions ran and with what arguments.

The nice thing about using a logger as opposed to just print statements is that they make it easy to consistently use these log levels. They also can automatically append other helpful information to each log record – like the timestamp or the active user.

As the log record is written, it has to be formatted using a formatter or layout. In most cases, the default log format is in plain text.

So if you log “Button pressed” at the info level, your log record might look something like this in the file:

2022-11-18 21:57:50 INFO Button Pressed

But if you’re shipping your logs off to have them consumed by some other service, you might prefer to have a highly structured log file. The most common structured logging format is JSON, though it’s also possible to use something like YAML or XML. If you used JSON logging, the same record might be emitted as:

{
  "time": "2022-11-18 21:57:50",
  "level": "INFO", 
  "data": "Button Pressed"
}

Once your log record is created, it needs to be put somewhere for you to get it later via a handler in logger or an appender in log4r.

The first (and most common) way to handle logs is just to append them to a file on disk. Many logging consumers are very comfortable watching a file somewhere and aggregating lines of the log as they are written to the file. If you’re not already familiar with how to consume log files, you’ll learn a lot more about this in Chapter 8 on using the command line.

If you are emitting logs to file, you may also want to consider how long those logs stay around. Log rotation is the process of storing logs for a period of time and then deleting the old ones. A common log rotation pattern is to have a 24-hour log file. Each day, the system automatically sets up a new log file and deletes the oldest one (30 days is a typical retention period). The Python logging library does log rotation itself. log4r does not, but there is a Linux library called logrotate that you can use in concert with log4r.1

In some cases, you may not want to write to a file on disk. This is most common because you’re running in a Docker container – perhaps in a Kubernetes cluster. As you’ll learn more about in Chapter 9, anything that lives inside a Docker container is ephemeral. This is obviously bad if you’re writing a log that might contain clues for why a Docker container was unexpectedly killed.

In that case, you may want to direct log entries somewhere alternative or additional to a log file. Linux provides one facility specifically for receiving normal output called stdout (usually pronounced standard out) and one for errors called stderr (standard error).

In a Docker/Kubernetes context, you’ll usually emit your logs inside the running container to stdout/stderr and then have some sort of more permanent service collecting the logs outside.

Lastly, it’s possible you want to do something else completely custom with your logs. This is most common for critical or error logs. For example, you may want to send an email, slack, or text message immediately if your system emits a high-level log message.

Both logger and log4r have various ways of sending arbitrary messages to other services.

The last piece to logging is configuring your logging environment. When you initialize your logger, you’ll set the default level at which to log events. In log4r, the default level is INFO. For logger, it’s WARNING. This means that, by default, any log less critical than those levels won’t be logged.

This is a great use case for environment variables, as explained in Section 1.4. It’s likely that you’ll want to configure different levels of logging in production vs development apps.

5.1.2 How to emit metrics

Monitoring is a much more nascent practice in Data Science than logging. The main place metrics have a footprint is in the MLOps/ModelOps domain. For many data science use cases, emitting metrics may not be necessary.

These days, most people use the Prometheus/Grafana stack for open source monitoring of servers and their resources.

Prometheus is an open source monitoring tool that makes it easy to store metrics data, query that data, and alert based on it. Grafana is an open source dashboarding tool that sits on top of Prometheus to do visualization of the metrics. They are usually used together to do monitoring and visualization of metrics.

It’s very easy to monitor the server your data science asset might sit on. Prometheus includes something called a node exporter, which makes it extremely easy to monitor the system resources of your server.

Python has an official Prometheus client you can use for emitting metrics from a Python asset, and the openmetrics package in R makes it easy to emit metrics from a Plumber API or Shiny app.

This means that you can instrument your Python or R asset to emits metrics in a format that Prometheus recognizes, Prometheus watches the metrics and stores them and alerts if necessary, and you have a Grafana dashboard available for visualizing those metrics as needed.

For ModelOps in particular, there are also many specialized platforms for monitoring the performance of models over time. The open source vetiver package in R and Python is a model deployment framework that offers built-in facilities for extracting model metrics and putting them in a dashboard (there is currently no prometheus integration).

5.1.3 Consuming Logs and Metrics

Once you’ve got your asset instrumented and your logging going, you’ll want to use them. If you’re using a professional product like Posit Connect to host your asset, you will probably get some degree of support for finding and reading relevant logs out of the box.

If not, you’ll probably be making good use of the skills we’ll review in Chapter 11 to watch logs and parse them.

You also can configure the Prometheus and Grafana stack to do both metrics monitoring and log aggregation. Grafana provides a generous free tier that allows you to use Prometheus and Grafana to do your monitoring without having to set up your own server. You can just set up their service and point your app to it.

There’s a great Get Started with Grafana and Prometheus doc on the GrafanaLabs website if you want to actually try it out.

5.2 Comprehension Questions

  1. What is the difference between monitoring and logging? What are the two halves of the monitoring and logging process?
  2. In general, logging is good, but what are some things you should be careful not to log?
  3. At what level would you log each of the following events:
    1. Someone clicks on a particular tab in your Shiny app.

    2. Someone puts an invalid entry into a text entry box.

    3. An http call your app makes to an external API fails.

    4. The numeric values that are going into your computational function.

5.3 Portfolio Exercise: Adding Logging

Using the appropriate package for whichever language you wrote in, add robust logging to both the API and the App you built.

For an extra challenge, consider trying to emit Prometheus metrics as well and consume them using Prometheus and Grafana.


  1. There are two common naming patterns with rotating log files. The first is to have dated log filenames that look like my-log-20221118.log.

    The other pattern is to keep one file that’s current and have the older ones numbered. So today’s log would be my-log.log, yesterday’s would be my-log.log.1 , the day before my-log.log.2, etc. This second pattern works particularly well if you’re using logrotate with log4r, because then log4r doesn’t need to know anything about the log rotation. It’s just always writing to my-log.log.↩︎