Cellular Automata (CA) represent an interesting approach to design Substitution Boxes (S-boxes) having good cryptographic properties and low implementation costs. From the cryptographic perspective, up to now there have been only ad-hoc studies about specific kinds of CA, the best known example being the $\chi$ nonlinear transformation used in Keccak. In this paper, we undertake a systematic investigation of the cryptographic properties of S-boxes defined by CA, proving some upper bounds on their nonlinearity and differential uniformity. Next, we extend some previous published results about the construction of CA-based S-boxes by means of a heuristic technique, namely Genetic Programming (GP). In particular, we propose a ``reverse engineering” method based on De Bruijn graphs to determine whether a specific S-box is expressible through a single CA rule. Then, we use GP to assess if some CA-based S-box with optimal cryptographic properties can be described by a smaller CA. The results show that GP is able to find much smaller CA rules defining the same reference S-boxes up to the size $7\times 7$, suggesting that our method could be used to find more efficient representations of CA-based S-boxes for hardware implementations. Finally, we classify up to affine equivalence all $3\times 3$ and $4\times 4$ CA-based S-boxes.
Note: this publication is an extended version of the following conference papers: