Federated learning is a paradigm that proposes protecting data privacy by sharing local models instead of raw data during each iteration of model training. However, these models can be large, with many parameters, provoking a substantial communication cost and having a notable environmental impact. Reducing communication overhead is paramount but conflictual to maintaining the model’s accuracy. Most research has dealt with the different factors influencing communication reduction separately without addressing their correlations. Moreover, most of them do not consider the heterogeneity of clients’ hardware. Finding the optimal configuration to fulfil all these training aspects can become intractable for classical techniques. This work explores the add-in that multi-objective evolutionary algorithms can provide for solving the communication overhead problem while achieving high accuracy. We do this by 1) realistically modelling and formulating this task as a multi-objective problem by considering the devices’ heterogeneity, 2) including all the communication-triggering aspects, and 3) applying a multi-objective evolutionary algorithm with an intensification operator to solve the problem. A simulated client–server architecture of four devices with four different processing speeds is studied. Both fully connected and convolutional neural network models are investigated with 33,400 and 887,530 weights, respectively. The experiments are performed using the MNIST and Fashion-MNIST datasets. A comparison is made between three approaches using an extensive set of metrics. Results prove that our approach obtains solutions with better accuracy than the full-communication setting and other methods while getting reductions in communications by around 1,000 times in most cases and up to 10,000 times in some cases compared to the maximum communication setting.