Tag: CCMatrix

CCMatrix: A billion-scale bitext data set for training translation models

CCMatrix: A billion-scale bitext data set for training translation models

CCMatrix: A billion-scale bitext data set for training translation models CCMatrix is the largest data set of high-quality, web-based bitexts for training translation models. With more than 4.5 billion parallel sentences in 576 language pairs pu …

Subscribe to our Digest