Lexical Semantics

  • 更新时间: 2018-09-20
  • 来源: cs.nyu.edu
  • 浏览数: 16次
  • 字数: 6008
  • 发表评论
In our discussion of semantics up to now, we focused on structural issues:  how to represent the relations between predicates and events and their arguments and modifiers;  how to represent quantification;  how to convert syntactic structure into semantic structure.   As our predicates, we used words, but this is really problematic for a semantic representation:  one word may have several meanings (polysemy) and several words may have the same or nearly the same meaning (synonymy).  In this section we take a closer look at word meanings.

Terminology [J&M 19.1, 2]

- multiple senses of a word
- polysemy (and homonymy for totally unrelated senses ("bank"))
- metonomy for certain types of regular, productive polysemy ("the White House", "Washington")
- zeugma (conjunction combining distinct senses) as test for polysemy ("serve")
- synonymy:  when two words mean (more-or-less) the same thing
- hyponymy:  X is the hyponym of Y if X denotes a more specific subclass  of Y
(X is the hyponym, Y is the hypernym)

WordNet [J&M 19.3]

- large-scale database of lexical relations
- freely available for interactive use or download
- organized as a graph whose nodes are synsets (synonym sets)
- each synset consists of 1 or more word senses which are considered synonymous
- primary relation:  hyponym / hypernym
- very fine sense distinctions
- sense-annotated corpus (SemCor, subset of Brown corpus)
- similar wordnets developed for many foreign languages: Global WordNet Association

Word Sense Disambiguation [J&M 20.1]

- process of identifying the sense of a word in context
- WSD evaluation:  either using WordNet or coarser senses (e.g., main senses from a dictionary)
- local cues (Weaver):  train a classifer using nearby words as features
- either treat words at specific positions relative to target word as separate features
- or put all words within a given window (e.g., 10 words wide) as a 'bag of words'
- simple demo for 'interest'

Simple supervised WSD algorithm:  naive Bayes [J&M 20.2.2]

selected sense s' = argmax(sense s) P(s | F)
where F is the set of context features (n different features)
s' = argmax(s) P(F | s) P(s) / P(F)
= argmax(s) P(F | s) P(s)
If we now assume features are independent
P(F | s) =  product(i) P(f[i] | s)
s' = argmax(s) P(s) product(i) P(f[i] | s)
Maximum likelihood estimates for P(s) and P(f[i] | s) can be easily obtained by counting
- some smoothing (e.g., add-one smoothing) is needed
Works quite well at selecting best sense (not at estimating probabilities)
But needs substantial annotated training data for each word

Semi-supervised WSD algorithm [J&M 20.5]

Based on Gale / Yarowsky's "one sense per discourse" observation
(generally true for coarse word senses)
Allows bootstrappig from a small set of sense-annotated seeds

Identifying similar words

Distance metric for Wordnet [J&M 20.6]

Simplest metrics just use path length in WordNet
More sophisticated metrics take account of the fact that going 'up' (to a hypernymm) may represent different degrees of generalization in different cases
Resnik introduced P(c):  for each concept (synset), P(c) = probability that a word in a corpus is an instance of the concept (matches the synset c or one of its hyponyms)
Information content of a concept
IC(c) = -log P(c)
If LCS(c1, c2) is the lowest common subsumer of c1 and c2, the JC distance between c1 and c2 is
IC(c1) + IC(c2) - 2 IC(LCS(c1, c2))

Similarity metric from corpora [J&M 20.7]

Basic idea:  characterize words by their contexts;  words sharing more contexts are more similar
Contexts can either be defined in terms of adjacency or dependency (syntactic relations)
Given a word w and a context feature f, define pointwise mutual information PMI
PMI(w,f) = log ( P(w,f) / P(w) P(f))
Given a list of contexts (words left and right) we can compute a context vector for each word.
The similarity of two vectors (representing two words) can be computed in many ways;  a standard way is using the cosine (normalized dot product).
See the Thesaurus demo by Patrick Pantel.

标签: vector log mutual information dictionary c

我来评分 :6


本站遵循:署名-非商业性使用-禁止演绎 3.0 共享协议