log scales seem to have caused a lot of debate recently. When should we use them?

I think there are two reasons:

- Your data is exponentially distributed.
- You want to transform your data to make certain regions easier to see.

#### Exponentially distributed data

The first is a bit of a tautology. For independent variables (usually the x-axis) it’s easy. Perhaps you have collected data at points 1, 2, 4, 8, 16 etc. Makes sense to plot this on log2 scale (0, 1, 2, 3, 4). Collected at 0.1, 1, 10, 100 etc? Plot on log10 (-1, 1, 2, 3). An example is minimum inhibitory concentrations (MIC) which are directly measured in log increases, or parameter searches which are often over large ranges.

But what about dependent variables? Anything that’s a geometric progression (multiplied by a constant factor at each step) will be exponential, and probably needs to be logged to see its range properly.

An examples might be cases in an epidemic (multiplied, roughly, by R each time interval).

Often when you do this you’ll see a nice linear relationship emerge, which is easier to analyse (visually and analytically).

**Transforming for visualisations**

Another reason you may want to log a scale is more purely for visualisation purposes – nothing mechanistically to do with the variables being plotted or how they were collected.

Broadly, *logging a variable expands the range of the smaller values and shrinks the range of the larger values*.

Very useful when looking across a large range or your values are bunched up at the lower range!.

Really this has nothing directly to do with logs/exponentials, it’s just a nice function whose shape has this useful property. In fact, you are often free to use any transformation you want. If you want one that shrinks/grows less than log(x), use sqrt(x). If you want one that does it even more than log(x), use 1/x.

**More intuition**

To get a sense of what logging looks like, try using log graph paper.

Also very handy: in base 10 a linear change of 0.3 is a log change of 0.5 as log10(3) = 0.5. So if you had a progression 10, 30, 100, 300 etc, these are about equally spaced on a log scale.

**log-log scales**

log-log scales can be used to analyse power laws, or when both variables are exponentially distributed, or need to be spread out better.

Useful as always John (good to see you blogging regularly again). I would add one caveat point:

when your data has no (or very few) zero values

since even if your two points apply, applying a log scale to data with zero-values can give a very misleading visualisation.

An important caveat!

To deal with this, I’ve previously used matplotlib which has ‘symlog’ which will let you mirror negative values, but also set a range close to zero which you make linear to avoid infinities.

See e.g.

https://matplotlib.org/stable/gallery/scales/symlog_demo.html

https://stackoverflow.com/a/3513150/4677055