4  Logging and Monitoring

Say something does go wrong with your ETL job today, or your app fails for some reason. Would you rather find out with an email or slack message from your system or when wrong numbers end up in front of your company’s CTO? And then when you do catch it, wouldn’t it be nice to have some documentation of what happened so you can reconstruct the sequence of events and figure out what actually went wrong?

Obviously, the answer is yes to all of these.

Observability is the DevOps term for generally whether you can peer inside the operations of your system to see what’s going on and figure out what happened when something goes awry.

In general, people talk about two sides of observability – logging and monitoring.

There are four things you might try to learn from your observations of your app, job, API, or report.

The first is to understand what your job or model is doing. Are your data joins correct? Is the data quality about the same as your expect, or is it deteriorating? Is your model performing as well as you need and as well as it usually does, or is something strange happening?

The second is to understand access patterns – are people visiting your app or report? Are visiting particular parts more than others? This is useful both for understanding the dynamics of your app and what parts are most useful and also making the case inside your organization that you are doing valuable work.

The third is to create a trail of who accessed or did what. In some organizations, creating an audit trail is mandatory if your app actually creates or writes data. In other cases it’s just useful to know if particular users are using the app in particular ways.

And lastly, it’s useful to have logging in your app to reconstruct what happened just prior to an error. In general, the only way to fix an error is going to be re-creating the state of the system just before the error occurred. Logging enough information that you can do so is extremely useful. Moreover, having enough instrumentation that you can turn on more aggressive logging and just run through that app state is a way you can be really kind to future you.

You’re probably familiar with logging, at least in theory. Logging is the process of documenting and saving a record of what happened when a piece of software ran. It can be useful for documenting what’s happening inside a software system under both normal operations and reconstructing the sequence of events prior to a failure.

Monitoring is often more foreign to a data science audience. Monitoring is the process of keeping live metrics of how the system is doing and alerting the appropriate people if something goes awry.

For both monitoring and logging, there are two separate things to think about – how does your system emit logs and metrics, and how are you consuming those logs and metrics to make them useful.

TODO: diagram of monitoring + logging emission vs aggregation

For data scientists, the main concerns you’ll have around observability are emitting logs from any products you’ve created and emitting performance metrics in a machine-learning context. There are industry-standard tools you can use for consuming those logs and metrics.

4.1 Observing data science

When software engineers think about observability, they’re generally thinking about observing the operational qualities of their app. They’re trying to answer questions like are people using it? Is the app operational or are users experiencing errors? Is performance good enough?

As data scientists, we likely have another level of logging to think about, which is the level of the actual data science operations, especially when machine learning models are involved – how are our models performing? Is the level of model performance stable, or is it changing over time? Are our data queries still working or has the data collection process sprouted some new exciting way to mess up our ETL pipeline?

But there’s good news – you’re probably already pretty used to data science logging.

A lot of data science work is done in a literate programming format – Jupyter Notebooks, R Markdown Documents, and Quarto Documents. If used right, the beauty of these documents is that they are self logging on the data science side!

And in my opinion, you should do all your jobs in a literate programming format. Some people really like running jobs as .R or .py files – that’s fine, but then you also need to make sure your job is emitting the right logs and metrics. On the other hand, if you write your code in a literate programming format and keep an archive of a few runs, you’ve got a self-logging job.

Now it’s worth spending a few minutes thinking about what are the things you’ll want to make sure you note in a job to make sure its useful as a log. The number one thing I log is the quality of data joins. When these go awry, your downstream data can get really confusing. It’s also often worthwhile to do cross tabulations of recodes, to make sure your recode logic is working like you think it is. And obviously, if you’re doing modeling, you’ll want to log some metrics around model quality.

4.2 Observing Software

But we also don’t get to ignore the software engineering concerns of a good service experience for our users. This makes observability even more important in a data science context than a pure software engineering one.

In that context, you’ll need to do more than literate programming. You’ll want to get comfortable with how and what to log from your jobs, apps, and APIs and how to emit metrics in a way that can be consumed by the kinds of services you want.

4.3 Working with logs and metrics

Emitting logs is quite straightforward. It can be as simple as just putting print statements throughout your code. Now, there are much better ways to do logging, but print statements are a start.

Whether you need to emit metrics is a more subtle question. In many data science contexts, you aren’t going to do monitoring on a real-time basis, so emitting metrics that can be used by a real-time monitoring system is less important than for other types of software.

And you’re a data scientist - if you’re putting information into your logs, it’s probably not hard to scrape that data into structured metrics data should the need ever arise. That said, the barrier to entry for actually emitting metrics is way lower than you might realize, and most data scientists should probably pay more attention to emitting metrics than they do.

4.3.1 How to write logs

As I mentioned, logging can be as simple as a bunch of print statements in your code. However, there are tools that make it way easier and more pleasant to create and use logs than just a bunch of manual print statements.

Python has a standard logging package that is excellent and you should use that. R does not have a standard logging package, but there are a number of excellent CRAN packages for logging. My personal favorite is log4r.

Let’s develop a quick mental model of how logging works inside software.

Let’s say you’ve decided that you should log every time the user presses a particular button. In your code, you’re going to do the following – you’re first going to call a logger. The logger is the runtime object that actually does the logging. When you call the logger, you’re going to specify the log data as well as a log level. In this case, the message would be something like, “Button pressed”. The logger aggregates the data, the level, and anything you’ve already specified should be added to each record – like a timestamp or the user and turns that into a log record.

Now, you’ve got to do something with the log record. The logging software does this via a handler (called an appender in the log4r package). There are three general options for what might happen to your log record. The first (and most common) option is to save the log record to a file on disk. This file is saved somewhere you can access it later. Many logging consumers are very comfortable watching a file somewhere and aggregating lines of the log as they are written to the file.

You may additionally – or instead – want your log emitted somewhere on the system. In Linux, there are two common places to emit logs, stdout and stderr. These are the typical places that messages are emitted – normal ones to stdout and errors to stderr.

Emitting to stdout

These days, the most common reason to emit to standard outputs rather than save to a file on disk is because you’re running inside a Docker container. As we’ll get into more in Chapter 5, anything that lives inside a Docker container is ephemeral. This is obviously bad if it’s possible that the things you’re trying to log will be important in the event the Docker container is unexpectedly killed.

For running inside a Docker container (including in Kubernetes), a common pattern is to emit logs on stdout and stderr and have a service outside the container that consumes them and writes them somewhere more permanent and/or aggregates them.

If you are emitting logs to file, you may also want to consider how long those logs stay around. Log rotation is the process of taking old logs, storing them for a period of time, and then deleting the old ones. A common log rotation pattern is to have a 24-hour log file. Each day, the system automatically sets up a new log file and deletes the oldest one (30 days is a typical retention period). The Python logging library does log rotation itself. log4r does not, but there is a Linux library called logrotate that you can use in concert with log4r.1

Lastly, it’s possible you want to do something else completely different with your logs. This is most typical when you want logs of a certain level of criticality sent to an outside system immediately. For example, maybe you want to send the log as a slack message or text in the event a production system goes down.

As the log record is written, it has to be formatted using a formatter or layout. In most cases, the default log format is in plain text. So if you log “Button pressed” at the info level, your log record might look something like this in the file:

2022-11-18 21:57:50 INFO Button Pressed

But if you’re shipping your logs off to have them be consumed by some other service, you might prefer to have a highly structured log file. The most common structured logging format is JSON, though it’s also possible to use something like YAML or XML. If you used JSON logging, the same record might be emitted as:

{
  "time": "2022-11-18 21:57:50",
  "level": "INFO", 
  "data": "Button Pressed"
}

Most logging libraries have 5-7 levels of logging. Both the Python logging library and log4r use the five levels below:

  1. Critical - an error so big that the app itself shuts down. For example, if your app cannot run without a connection to an outside service, you might log an inability to connect as a Critical error.
  2. Error – an issue that will make an operation not work, but that won’t bring down your app. In the language of software engineering, you might think of this as a caught exception. An example might be a user submitting invalid input.
  3. Warn/Warning – an unexpected application issue that isn’t fatal. For example, you might include having to retry doing something or noticing that resource usage is high. If something were to go wrong later, these might be helpful breadcrumbs to look at.
  4. Info – something normal happened in the app. These record things like starting and stopping, successfully making database and other connections, and configuration options that are used.
  5. Debug – a deep record of what the app was doing. The difference between trace and debug is that the trace would be useful to reconstruct a sequence of events even if you weren’t familiar with the inner workings of the app. On the other hand, the debug log is meant for people who actually understand the app itself. Where a trace log would record “user pressed the button”, the debug might record the actual functions invoked and their arguments.

4.3.1.1 A Shiny App that logs

Here’s a silly function. It counts normally up to 5, but returns NaN starting at 5 and forevermore.

counter <- function(n) {
  # Do some math
  n <- n %% 5 # Do a modulous for no reason
  n <- n + n/n # Add n/n (it's always 1, right?)
  n
}

And here’s a Shiny app that renders a number and a button you can use to “Count to Infinity!” Unfortunately, it uses our fatally flawed counter function, so it will stop working after 5.

You can see in the app that I’ve already implemented some logging. At the beginning, I create a logger using the standard defaults and then log that the app is starting. You can also see that I log every time the button gets pressed.

library(shiny)
library(log4r)

# Create log object
log <- logger()
# Log startup
info(log, paste("App starting"))
n <- reactiveVal(1)

ui <- fluidPage(
  titlePanel(
    "Count to Infinity!"
  ),
  mainPanel(
    actionButton(
      "button", "Next Number"
    ),
    textOutput("number")
  )
)

server <- function(input, output) {
  observeEvent(
    input$button, {
      info(log, "Button Pressed")
      # Increment value of count reactive
      n(counter(n(), log))
    })

  output$number <- renderText(n())
}

# Run the application
shinyApp(ui = ui, server = server)

If you’re eagle-eyed, you’ll also notice that I’m passing the log object into the counter function. I’ve modified it to do some logging, allowing me to debug log the value of n after I take the modulo and logging an error if n isn’t a number.

compute_frac <- function(n) {
  # Do some math
  n <- n %% 5 # Do a modulous for no reason
  debug(log, paste("n:", n))
  n <- n + n/n # Add n/n (it's always 1, right?)

  # Error if it's not a number
  if (is.nan(n)) error(log, "n is not a number")

  n
}

So if I launch this app, here’s what comes out in the logs. Note that by default, it’s just printing to the console. In a production setting, I might want to configure where the log goes in the call to the logger function.

> runApp()
INFO  [2022-11-19 12:06:43] App starting
INFO  [2022-11-19 12:06:46] Button Pressed
INFO  [2022-11-19 12:06:47] Button Pressed
INFO  [2022-11-19 12:06:48] Button Pressed
INFO  [2022-11-19 12:06:50] Button Pressed
ERROR [2022-11-19 12:06:50] n is not a number

If this was deployed in production and I got a call that my app was consistently returning NaN, this is pretty helpful. I can see that the app started normally, the button got pressed a few times just fine and then it started returning something that isn’t a number.

You’ll notice that my debug printing of n doesn’t appear anywhere though. That’s because it’s standard for logging packages only to print down to the INFO level by default. If I want DEBUG logging, I’ll have to turn that on by changing the logger creator line to read

log <- logger("DEBUG")
Note

One nice thing about having a dev environment like we discussed in Chapter 1 is that you can have that environment configured to always do debug logging using a config setting to set the log level.

Now when I run my app, I get the following logs:

> runApp()
INFO  [2022-11-19 12:18:57] App starting
INFO  [2022-11-19 12:19:00] Button Pressed
DEBUG [2022-11-19 12:19:00] n: 2
INFO  [2022-11-19 12:19:00] Button Pressed
DEBUG [2022-11-19 12:19:00] n: 3
INFO  [2022-11-19 12:19:01] Button Pressed
DEBUG [2022-11-19 12:19:01] n: 4
INFO  [2022-11-19 12:19:01] Button Pressed
DEBUG [2022-11-19 12:19:01] n: 0
ERROR [2022-11-19 12:19:01] n is not a number

Aha! Now it might be obvious to me that I’m dividing 0 by 0 in my code and I can go fix it, all without having to actually step through my code line-by-line.

4.3.2 How to emit metrics

Emitting metrics is somewhat more complicated than emitting logs, and for many use cases it won’t be necessary. As I mentioned above, in many cases just scraping logs for any metrics you need is good enough. But if you need to do more complex metrics emission and aggregation, you’ve still got great options.

Metrics for ModelOps

It’s worth noting that the most common reasons Data Scientists need metrics is if they’re running ML models in production. As I’m writing, there are a variety of frameworks and tools vying to be the top in this arena.

Each of the major cloud providers has their own model hosting solution that includes monitoring capabilities, and there are many other paid options as well.

For open source usage, I am (perhaps unsurprisingly) partial to the vetiver package being developed for both R and python by my colleagues at Posit.

These days, most people use the Prometheus/Grafana stack for open source monitoring. Prometheus is an open source monitoring tool that makes it easy to store metrics data, query that data, and alert based on it. Grafana is an open source dashboarding tool that sits on top of Prometheus to do visualization of the metrics. They are usually used together to do monitoring and visualization of metrics.

It’s very easy to monitor the server your data science asset might sit on. Prometheus includes something called a node exporter, which makes it extremely easy to monitor the system resources of your server.

The way this works is that your Data Science asset emits metrics in a format that Prometheus recognizes, Prometheus watches the metrics and stores them and alerts if necessary, and you have a Grafana dashboard available for visualizing those metrics as needed.

Python has an official Prometheus client you can use for emitting metrics from a Python asset, and the openmetrics package in R makes it easy to emit metrics from a Plumber API or (experimentally) a Shiny app.

QUESTION FOR READERS - do I need to do an example that includes emitting metrics?

4.3.3 Consuming Logs and Metrics

Once you’ve got your asset instrumented and your logging going, you’ll want to use them. If you’re using a professional product, like Posit Connect, to host your asset, you will probably get some degree of support for finding and reading relevant logs out of the box.

If not, you’ll probably be making good use of the skills we’ll review in Chapter 8 to watch logs and parse through them.

You also can configure the Prometheus and Grafana stack to do both metrics monitoring and log aggregation. Grafana provides a generous free tier that allows you to use Prometheus and Grafana to do your monitoring without having to set up your own server. You can just set up their service and point your app to it.

QUESTION FOR READERS – Do I need to show how to set up the monitoring stack?

4.4 Comprehension Questions

  1. What is the difference between monitoring and logging? What are the two halves of the monitoring and logging process?
  2. In general, logging is good, but what are some things you should be careful not to log?
  3. At what level would you log each of the following events:
    1. Someone clicks on a particular tab in your Shiny app.

    2. Someone puts an invalid entry into a text entry box.

    3. An http call your app makes to an external API fails.

    4. The numeric values that are going into your computational function.

4.5 Portfolio Exercise: Adding Logging

Take the app you built for Chapter 3.

Using the appropriate package for whichever language you wrote in, add robust logging to both the API and the App you built.

For an extra challenge, consider trying to emit Prometheus metrics as well and consume them using Prometheus and Grafana.


  1. There are two common naming patterns with rotating log files. The first is to have dated log filenames that look like my-log-20221118.log.

    The other pattern is to keep one file that’s current and have the older ones numbered. So today’s log would be my-log.log, yesterday’s would be my-log.log.1 , the day before my-log.log.2, etc. This second pattern works particularly well if you’re using logrotate with log4r, because then log4r doesn’t need to know anything about the log rotation. It’s just always writing to my-log.log.↩︎