Convolution layer in CNN is just a special case of a regular "fully connected" linear layer. Which means that having convolution layers does not eliminate the need for non-linearities, a.k.a. activation layers.

While max pooling is not a linear operation, it is "kind of linear": sums are not mapped to sums, but multiplication by a scalar produces multiplied output ([imath]P(\alpha x) = \alpha P(x)[/imath], where

[imath]P[/imath] is max pooling function, [imath]x[/imath] is its input and [imath]\alpha [/imath] is an arbitrary scalar. You can say that max pooling "isn't non-linear enough" to warrant elimination of standard non-linearities.

)