Make Every feature Binary: A 135B parameter sparse neural network for massively improved search relevance

0

Make Every feature Binary: A 135B parameter sparse neural network for massively improved search relevance

Make Every feature Binary: A 135B parameter sparse neural network for massively improved search relevance – Microsoft Research

“One reason MEB works so well as a complement to Transformer-based deep learning models for search relevance is that it can map single facts to features, allowing MEB to gain a more nuanced understanding of individual facts. For example, many deep neural network (DNN) language models might overgeneralize when filling in the blank in this sentence: “(blank) can fly.” Since the majority of DNN training cases result in “birds can fly,” DNN language models might only fill the blank with the word “birds.”

MEB avoids this by assigning each fact to a feature, so it can assign weights that distinguish between the ability to fly in, say, a penguin and a puffin. It can do this for each of the characteristics that make a bird—or any entity or object for that matter—singular. Instead of saying “birds can fly,” MEB paired with Transformer models can take this to another level of classification, saying “birds can fly, except ostriches, penguins, and these other birds.”

There’s also an element of improving the method for using data more efficiently as scale increases. The ranking of web results in Bing is a machine learning problem that benefits from learning over huge amounts of user data. A traditional approach for leveraging click data is to extract thousands of handcrafted numeric features for each impressed query/document pair and to train a gradient boosted decision tree (GBDT) model…”

Source: www.microsoft.com/en-us/research/blog/make-every-feature-binary-a-135b-parameter-sparse-neural-network-for-massively-improved-search-relevance/

August 9, 2021
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments

Subscribe to our Digest