Factorized layers revisited: Compressing deep networks without playing the lottery
“From BiT (928 million parameters) to GPT-3 (175 billion parameters), state-of-the-art machine learning models are rapidly growing in size. With the greater expressivity and easier trainability of these models come skyrocketing training costs, deployment difficulties, and even climate impact. As a result, we’re witnessing exciting and emerging research into compressing these models to make them less expensive, small enough to store on any device, and more energy efficient. Perhaps the most popular approach to model compression is pruning, in which redundant model parameters are removed, leaving only a small subset of parameters, or a subnetwork. A major drawback of pruning, though, is it requires training a large model first, which is expensive and resource intensive…”