I skimmed a couple of papers today.

Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold

This paper is about automated log printing. It does not seem to be a popular industry practice, but there is potential to the approach. They use info about the paths in the call graph to deduce the “informativeness” of LPS placement.

The devise an automated system (called an LPS) and tested it on HDFS, HBase, and a few other large Java projects. They were able to successfully output automated logs without much overhead.

The related work section is quite useful to me. In my current situation, I still need to build up a mental model of automated log printing. What are the main problems users face?

Where Do Developers Log? An Empirical Study on Logging Practices in Industry

This paper was linked from the paper on automated logging.

They identify 5 types of log statements:

  1. Return value.
  2. Assertion check.
  3. Exception handling.
  4. Logic-branch
  5. Observation point (aka uncategorized)

We can simplify the situation by noticing the following:

  • We want to differentiate “instructions”. We want to narrow down the amount of source code to consider changing.
  • We want to differentiate “data”. We may want to narrow down the shape of the data that triggered the error. That will then help us understand how to alter the code (to handle that form of data).

The authors also observe that many exceptions in the C# language are not logged. This suggests that users do not find exception logging that useful.

Own thoughts

  • The role of logs is to assist with troubleshooting. Troubleshooting occurs when running unit tests, larger-scoped multi-process tests, and in production environments.
  • If logs contain sufficient information (variable values + logic-branch differentiation), it becomes possible to identify the “cause” of a failure and a solution to that cause.
  • Log statements are currently placed manually. Users are unaware of the usefulness, runtime cost, or storage cost of log statements.
  • Automated log printing can be useful, but only if the runtime cost and storage cost are low, and if the usefulness of the logs is high.

Before considering optimizations, we should instead consider whether there is a larger, but potentially simpler problem to solve. What data is needed to make it easy for somebody to troubleshoot a production failure? What is the query pattern?

Nevertheless, there are a few performance problems to pay attention to:

  • CPU cost. It takes time to deserialize variables, especially larger objects.
  • I/O and storage cost. Passing many logs over the network and storing them for long periods of time is expensive.