Role of Syntax Analysis

  • 更新时间: 2018-09-20
  • 来源: cs.nyu.edu
  • 浏览数: 9次
  • 字数: 8943
  • 发表评论

Role of Syntax Analysis

  • determining and regularizing structure -- relations between words
    • figuring out the 'who did what to whom'
  • captures generalizations about language

Basic Syntactic Structures of English (J&M 12.3)

  • parts of speech
    • include noun, verb, adjective, adverb, pronoun, preposition, conjunction, article
    • nouns
      • to distinguish nouns from adjectives:  nouns can form possessive or plural or both
      • countable nouns:  singular form must appear with determiner ("Cats sleep."  *"Cat sleep.")
    • verbs occur in different (inflected) forms:
      • base or infinitive ("be", "eat", "sleep")
      • present tense ("is", "am", "are";  "eats", "eat"; "sleeps", "sleep")
      • past tense ("was", "were";  "ate"; "slept")
      • present participle ("being", "eating"; "sleeping")
      • past participle ("been", "eaten"; "slept")
    • pronouns occur in nominative ("I") and accusative ("me") forms ["cases"]
  • phrases: classifying them by part of speech of main word or by syntactic role
    • subject and predicate; noun phrase and verb phrase
      In "The young cats drink milk.", "The young cats" is a noun phrase and the subject;
      "drink milk" is a verb phrase and the predicate
    • the main word is the head of the phrase:  "cats" in "the young cats"
  • verb complements and modifiers
    • types of complements ... noun phrases, adjective phrases, prepositional phrases, particles
      noun phrase:  I served a brownie.
      adjective phrase:  I remained very rich.
      prepositional phrase:  I looked at Fred.
      particles:  He looked up the number.
    • clauses; clausal complements 
      I dreamt that I won a million brownies.
    • tenses: simple past, present, future;  progressive, perfect
      simple present:  John bakes cookies.
      present progressive:  John is baking cookies.
      present perfect:  John has baked cookies.
    • active vs. passive
      active:  Bernie ate the banana.
      passive:  The banana was eaten by Bernie.
  • noun phrase structure
    • left modifiers:  determiner, quantifier, adjective, noun
      the five shiny tin cans
    • right modifiers:  prepositional phrases and apposition
      prepositional phrase:  the man in the moon
      apposition:  Scott, the Arctic explorer,
    • relative clauses
      the man who ate the popcorn
      the popcorn which the man ate
      the man who is eating the popcorn
      the tourist who was eaten by a lion
    • reduced relative clauses
      the man eating the popcorn
      the man eaten by a lion
  • coordinating and subordinating conjunctions

Comparison with other Languages

  • word segmentation (required for Japanese and Chinese)
  • inflectional and derivational morphology
  • fixed vs. free word order

Context-free grammar (J&M 12.2)

  • consists of non-terminal symbols (including a start symbol), terminal symbols, and productions
  • rewrite operation
  • derivation
  • language defined by a CFG
  • CFG as a device for
    • generating sentences
    • recognizing sentences
    • parsing sentences
  • natural language grammars typically treat POS as terminal (or pre-terminal) and teat lexical intertion or look-up as a separate process
  • more powerful than regular expressiions / finite-state automata
    • some languages which can be captured by CFG cannot be captured by regular expressions (J&M 12.6, 16.2)
      • regular expressions can't capture center embedding
    • even if the language can be captured in principle by a reg. expr., it may not be convenient for expressing relations among consituents

A small context-free English grammar

sentence := np vp; 
np := n | art n | art adj n; 
vp := v | v np;Including auxiliariesvp := v | v np | v vp;Including PPssentence := np vp; 
np := ngroup | ngroup pp; 
ngroup := n | art n | art adj n; 
vp := v | v np | v vp | v np pp; 
pp := p np;

Parsers

Parsing as search (J&M 13.1) 
Top-down recognizer / parser (J&M 13.1.1)

Bottom-up (immediate-constituent) parser (Grishman 2.4.2, J&M 13.1.2))

Uses tree nodes with componentsroot (a non-terminal grammar symbol), 
start and end (token numbers), and 
constituents (a vector of parse tree nodes)For i = 1 , … number of words in sentenceCreate a node with root = part of speech of word i, start = i, end = i+1(if the word has several parts of speech, create one node for each P.O.S.)Put this node on list todo 
While todo is not empty,Remove node n from todo 
If there exists a production A --> a1 a2 … aj such thatroot(n) = aj 
and there exist nodes n1 … nj-1 such thatroot(nk)=ak and end(nk)=start(nk+1) (k=1,…,j-1),then create a new node with root = A, start = start(n1), end = end(n) and add it to todo.

标签: vp np

我来评分 :6
0

转载注明:转自5lulu技术库

本站遵循:署名-非商业性使用-禁止演绎 3.0 共享协议