Text data typically have high-frequency words such as "the", "a", and "in": they may even occur billions of times in very large corpora. However, these words often co-occur with many different words in context windows, providing little useful signals. For instance, consider the word "chip" in a context window: intuitively its co-occurrence with a low-frequency word "intel" is more useful in training than the co-occurrence with a high-frequency word "a". Moreover, training with vast amounts of (high-frequency) words is slow. Thus, when training word embedding models, high- frequency words can be subsampled (Mikolov et al., 2013b). Specifically, each indexed word w; in the dataset will be discarded with probability P(w;) = max 1- ,0 f(w;) (14.3.1) where f(ur) is the ratio of the number of words w; to the total number of words in the dataset, and the constant t is a hyperparameter (10-4 in the experiment). We can see that only when the relative frequency f(w) > t can the (high-frequency) word w; be discarded, and the higher the relative frequency of the word, the greater the probability of being discarded.

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question
In programming not words
Text data typically have high-frequency words such as "the", "a", and "in": they may even occur
billions of times in very large corpora. However, these words often co-occur with many different
words in context windows, providing little useful signals. For instance, consider the word "chip"
in a context window: intuitively its co-occurrence with a low-frequency word "intel" is more useful
in training than the co-occurrence with a high-frequency word "a". Moreover, training with vast
amounts of (high-frequency) words is slow. Thus, when training word embedding models, high-
frequency words can be subsampled (Mikolov et al., 2013b). Specifically, each indexed word w; in
the dataset will be discarded with probability
P(w;) = max 1-
(14.3.1)
f(w;)
where f(ur) is the ratio of the number of words w; to the total number of words in the dataset,
and the constant t is a hyperparameter (10-4 in the experiment). We can see that only when the
relative frequency f(wi) > t can the (high-frequency) word w; be discarded, and the higher the
relative frequency of the word, the greater the probability of being discarded.
Transcribed Image Text:Text data typically have high-frequency words such as "the", "a", and "in": they may even occur billions of times in very large corpora. However, these words often co-occur with many different words in context windows, providing little useful signals. For instance, consider the word "chip" in a context window: intuitively its co-occurrence with a low-frequency word "intel" is more useful in training than the co-occurrence with a high-frequency word "a". Moreover, training with vast amounts of (high-frequency) words is slow. Thus, when training word embedding models, high- frequency words can be subsampled (Mikolov et al., 2013b). Specifically, each indexed word w; in the dataset will be discarded with probability P(w;) = max 1- (14.3.1) f(w;) where f(ur) is the ratio of the number of words w; to the total number of words in the dataset, and the constant t is a hyperparameter (10-4 in the experiment). We can see that only when the relative frequency f(wi) > t can the (high-frequency) word w; be discarded, and the higher the relative frequency of the word, the greater the probability of being discarded.
Expert Solution
steps

Step by step

Solved in 2 steps with 1 images

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY