Anime character recognition/classification using PyTorch
“Our best model, ViT L-16 with image size 128×128 and batch size 64 achieves to get 85.95% and 94.23% top-1 and top-5 classification accuracies, among 3263 characters, compared to the best CNN model (ResNet-18) that only achieved 69.09% and 84.64%, respectively.
We hope that this work inspires other researchers to follow and build upon this path. ViT models have interesting properties for domain transfer that haven’t been studied, and their big jump in terms of performance compared to CNNs suggest that they may be more suitable for drawn, sketched character recognition. This is due to the fact that CNNs are biased towards texture, and not shapes…”