Convolutional neural networks (CNNs) are commonplace in
high-performing solutions to many real-world problems, such as
audio classification. CNNs have many parameters and filters,
with some having a larger impact on the performance than others.
This means that networks may contain many unnecessary filters,
increasing a CNN’s computation and memory requirements while
providing limited performance benefits. To make CNNs more
efficient, we propose a pruning framework that eliminates filters
with the highest “commonality”. We measure this commonality
using the graph-theoretic concept of centrality. We hypothesise
that a filter with a high centrality should be eliminated as it
represents commonality and can be replaced by other filters without
affecting the performance of a network much. An experimental
evaluation of the proposed framework is performed on acoustic
scene classification and audio tagging. On the DCASE 2021 Task
1A baseline network, our proposed method reduces computations
per inference by 71% with 50% fewer parameters at less than a
two percentage point drop in accuracy compared to the original
network. For large-scale CNNs such as PANNs designed for audio
tagging, our method reduces 24% computations per inference with
41% fewer parameters at a slight improvement in performance.