Distillation of CLIP model and other experiments
Distillation of CLIP model and other experiments
“CLIP is a model released by OpenAI earlier this year. It was trained to learn “visual concepts from natural language supervision” on more than 400 million image-text pairs using an impressive amount of compute (256 GPUs for 2 weeks).
At PicCollage we have been researching ways to combine text and images. CLIP came in handy and we tested its performance on some of our content. It was VERY impressive — better than anything we had earlier. However we soon began to notice a quirk of the model: it seemed to prioritize textual similarity to semantic similarity for a search query.
Given how powerful the model was, we also wanted to reduce its size and explore the possibility of deploying it on edge. Given the magnitude of the dataset and compute required, it seemed like a daunting task but we wanted to give it a shot anyway…”
Source: tech.pic-collage.com/distillation-of-clip-model-and-other-experiments-f8394b7321ce