利用词向量进行推理（Reasoning with word vectors）

时间：2022-01-23 作者：hisi-tech

The amazing power of word vectors | the morning paper (域名)

What is a word vector?

At one level, it’s simply a vector of weights. In a simple 1-of-N (or ‘one-hot’) encoding every element in the vector is associated with a word in the vocabulary. The encoding of a given word is simply the vector in which the corresponding element is set to one, and all other elements are zero.

从某种角度来说，词向量（word vector）仅仅是对应单词的权重用的向量化表示。在独热编码中，向量中的每个编码元素与一个文本对应的词汇表中的特定的一个单词有联系，一个单词会对应一个独热向量，这个独热向量中对应这个单词的维度的元素会被置为1，其余的全都置为0.

Suppose our vocabulary has only five words: King, Queen, Man, Woman, and Child. We could encode the word ‘Queen’ as:

假设我们的词汇表只有5个单词，国王、女王、男人、女人和孩子，那么我们可以把“女王”编码为：

Using such an encoding, there’s no meaningful comparison we can make between word vectors other than equality testing.

但是这样的编码会使得两个向量之间没有什么比较意义，除了判断两个向量是否相等。

In word2vec, a distributed representation of a word is used. Take a vector with several hundred dimensions (say 1000). Each word is representated by a distribution of weights across those elements. So instead of a one-to-one mapping between an element in the vector and a word, the representation of a word is spread across all of the elements in the vector, and each element in the vector contributes to the definition of many words.

在word2vec中，我们对单词使用了一种数值分散（非零数值分散）的表示方法。假设一个向量有好几百维（比如1000维），每个单词都会被1000个权重表示，这些权重很有可能非零而且表示的是与其他单词的关系的大小。所以一个单词的表示会与这个1000维的向量的所有其他元素都有关系，每个元素都会对这个单词的定义做或多或少的贡献，所以我们会用其代替使用一一映射的独热编码来表示单词或词组。

royaling：与皇室相关的；masculining：与男子汉气概相关的；feminining：与女子特点相关的；Age：年龄。

从图中我们可以看到，King和Queen与royaling很相关，这显而易见，所以这个维度上King和Queen的分数应该要高点，其他也是如此。

另外可以这样思考：V_Gueen - V_Woman = [域名域名 -域名 0.1]（记为Vt），这个向量表示什么意思？V_man+Vt 等于什么呢？约等于V_King吗？下面会给出答案。

Reasoning with word vectors

We find that the learned word representations in fact capture meaningful syntactic and semantic regularities in a very simple way. Specifically, the regularities are observed as constant vector offsets between pairs of words sharing a particular relationship. For example, if we denote the vector for word i as xi, and focus on the singular/plural relation, we observe that xapple – xapples ≈ xcar – xcars, xfamily – xfamilies ≈ xcar – xcars, and so on. Perhaps more surprisingly, we find that this is also the case for a variety of semantic relations, as measured by the SemEval 2012 task of measuring relation similarity.

我们发现模型学习到的词表示方法（词向量）实际上能够以一种很简单的方式表示单词对应的语法（Kings，King）和语义规律。更具体的来说，这些规律是用拥有特定关系的词对所对应的两个向量的差来表示的。例如，假设我们把单词i对应的向量记为Vi，那么对于表示单复数这个维度上的数值K来说应该有：apples_k – apple_k ≈ cars_k – car_k, ≈ families_k – family_k .可能更让人惊讶的是，我们发现在2012年的SemEval关系相似度测量任务中，各种各样的语义（语义：这个单词是什么意思）关系都可以有上述所说的性质。

The vectors are very good at answering analogy questions of the form a is to b as c is to ?. For example, man is to woman as uncle is to ? (aunt) using a simple vector offset method based on cosine distance.

这样的向量非常适合于分析“a对应b，那么c对应什么”这样的分析性问题。例如，如果男人对应女人，那么伯父应该对应什么？（伯母）