chore: mostly annotating verbs and nouns (#1874)

* chore: mostly annotating verbs and nouns

Fixes #1873

* fix: added `.vscode/settings.json` instead of `dictionary.dict`

* chore: verb work, POS snapshot fmt

* fix: "repurpose"

Fixes #1874

* chore: add and fix noun properties
This commit is contained in:
Andrew Dunbar 2025-09-08 19:20:49 +07:00 committed by GitHub
parent 7d13327f0f
commit 8664ecb580
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
15 changed files with 15386 additions and 15396 deletions

View file

@ -1,3 +0,0 @@
{
"typescript.tsdk": "node_modules/typescript/lib"
}

File diff suppressed because it is too large Load diff

View file

@ -44,7 +44,7 @@
//! - The `L` suffix means linking verb. //! - The `L` suffix means linking verb.
//! - The `X` suffix means auxiliary verb. //! - The `X` suffix means auxiliary verb.
//! - The `B` suffix means base (lemma) form. //! - The `B` suffix means base (lemma) form.
//! - The `P` suffix means regular past tense & past participle. //! - The `P` suffix means simple past tense & past participle.
//! - The `Pr` suffix means progressive form. //! - The `Pr` suffix means progressive form.
//! - The `Pt` suffix means simple past tense. //! - The `Pt` suffix means simple past tense.
//! - The `Pp` suffix means past participle. //! - The `Pp` suffix means past participle.
@ -147,17 +147,30 @@ fn format_word_tag(word: &WordMetadata) -> String {
add_bool(&mut tag, "L", verb.is_linking); add_bool(&mut tag, "L", verb.is_linking);
add_bool(&mut tag, "X", verb.is_auxiliary); add_bool(&mut tag, "X", verb.is_auxiliary);
if let Some(forms) = verb.verb_forms { if let Some(forms) = verb.verb_forms {
if forms.contains(VerbFormFlags::LEMMA) { // If Lemma flag is explicity set; or if no verb forms are set Lemma is the default.
tag.push_str("B"); match (
forms.contains(VerbFormFlags::LEMMA),
forms.contains(VerbFormFlags::PAST),
forms.contains(VerbFormFlags::PAST_PARTICIPLE),
forms.contains(VerbFormFlags::PRETERITE),
forms.contains(VerbFormFlags::PROGRESSIVE),
forms.contains(VerbFormFlags::THIRD_PERSON_SINGULAR),
) {
(true, _, _, _, _, _) | (false, false, false, false, false, false) => {
tag.push_str("B")
}
_ => {}
} }
if forms.contains(VerbFormFlags::PAST) { // Regular verbs set both together; Irregular verbs can set them separately.
tag.push_str("P"); match (
} forms.contains(VerbFormFlags::PAST),
if forms.contains(VerbFormFlags::PRETERITE) { forms.contains(VerbFormFlags::PRETERITE),
tag.push_str("Pt"); forms.contains(VerbFormFlags::PAST_PARTICIPLE),
} ) {
if forms.contains(VerbFormFlags::PAST_PARTICIPLE) { (true, _, _) | (_, true, true) => tag.push_str("P"),
tag.push_str("Pp"); (false, true, false) => tag.push_str("Pt"),
(false, false, true) => tag.push_str("Pp"),
_ => {}
} }
if forms.contains(VerbFormFlags::PROGRESSIVE) { if forms.contains(VerbFormFlags::PROGRESSIVE) {
tag.push_str("g"); tag.push_str("g");
@ -165,6 +178,8 @@ fn format_word_tag(word: &WordMetadata) -> String {
if forms.contains(VerbFormFlags::THIRD_PERSON_SINGULAR) { if forms.contains(VerbFormFlags::THIRD_PERSON_SINGULAR) {
tag.push_str("3"); tag.push_str("3");
} }
} else {
tag.push_str("B");
} }
add(&tag, &mut tags); add(&tag, &mut tags);
} }

View file

@ -1619,18 +1619,6 @@ Suggest:
Lint: Agreement (31 priority)
Message: |
1484 | begin lessons: youd only have to whisper a hint to Time, and round goes the
1485 | clock in a twinkling! Half-past one, time for dinner!”
| ^~~~~~~~~~~ `twinkling` is a mass noun.
Suggest:
- Replace with: “twinkling”
- Replace with: “some twinkling”
- Replace with: “a piece of twinkling”
Lint: WordChoice (63 priority) Lint: WordChoice (63 priority)
Message: | Message: |
1498 | March—just before he went mad, you know—” (pointing with his tea spoon at the 1498 | March—just before he went mad, you know—” (pointing with his tea spoon at the

View file

@ -1430,18 +1430,6 @@ Message: |
Lint: Spelling (63 priority)
Message: |
610 | the State wherein they reside. No State shall make or enforce any law which
611 | shall abridge the privileges or immunities of citizens of the United States;
| ^~~~~~~~~~ Did you mean to spell `immunities` this way?
Suggest:
- Replace with: “immunity's”
- Replace with: “immensities”
- Replace with: “immunizes”
Lint: Readability (127 priority) Lint: Readability (127 priority)
Message: | Message: |
616 | The right of citizens of the United States, who are eighteen years of age or 616 | The right of citizens of the United States, who are eighteen years of age or

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -6,304 +6,304 @@
# Unlintable Unlintable # Unlintable Unlintable
> --> > -->
# Unlintable Unlintable # Unlintable Unlintable
> Part - of - speech tagging > Part - of - speech tagging
# Unlintable NSg/V/J+ . P . N🅪Sg/V+ NSg/V # Unlintable NSg/VB/J+ . P . N🅪Sg/VB+ NSg/Vg
> >
# #
> In corpus linguistics , part - of - speech tagging ( POS tagging or PoS tagging or > In corpus linguistics , part - of - speech tagging ( POS tagging or PoS tagging or
# NPr/J/P NSg+ Nᴹ+ . NSg/V/J+ . P . N🅪Sg/V+ NSg/V . NSg+ NSg/V NPr/C NSg+ NSg/V NPr/C # NPr/J/P NSg+ Nᴹ+ . NSg/VB/J+ . P . N🅪Sg/VB+ NSg/Vg . NSg+ NSg/Vg NPr/C NSg+ NSg/Vg NPr/C
> POST ) , also called grammatical tagging is the process of marking up a word in a > POST ) , also called grammatical tagging is the process of marking up a word in a
# NPr🅪/V/P+ . . R/C VP/J J NSg/V VL3 D NSg/V P Nᴹ/Vg/J NSg/V/J/P D/P NSg/V+ NPr/J/P D/P # NPr🅪/VB/P+ . . R/C VP/J J NSg/Vg VL3 D NSg/VB P Nᴹ/Vg/J NSg/VB/J/P D/P NSg/VB+ NPr/J/P D/P
> text ( corpus ) as corresponding to a particular part of speech , based on both its > text ( corpus ) as corresponding to a particular part of speech , based on both its
# N🅪Sg/V+ . NSg+ . NSg/R Nᴹ/Vg/J P D/P NSg/J NSg/V/J P N🅪Sg/V+ . VP/J J/P I/C/Dq ISg/D$+ # N🅪Sg/VB+ . NSg+ . NSg/R Nᴹ/Vg/J P D/P NSg/J NSg/VB/J P N🅪Sg/VB+ . VP/J J/P I/C/Dq ISg/D$+
> definition and its context . A simplified form of this is commonly taught to > definition and its context . A simplified form of this is commonly taught to
# NSg V/C ISg/D$+ N🅪Sg/V+ . D/P VP/J N🅪Sg/V P I/Ddem+ VL3 R V P # NSg VB/C ISg/D$+ N🅪Sg/VB+ . D/P VP/J N🅪Sg/VB P I/Ddem+ VL3 R VB P
> school - age children , in the identification of words as nouns , verbs , adjectives , > school - age children , in the identification of words as nouns , verbs , adjectives ,
# N🅪Sg/V . N🅪Sg/V+ NPl+ . NPr/J/P D Nᴹ P NPl/V3+ NSg/R NPl/V3 . NPl/V3+ . NPl/V3 . # N🅪Sg/VB . N🅪Sg/VB+ NPl+ . NPr/J/P D Nᴹ P NPl/V3+ NSg/R NPl/V3 . NPl/V3+ . NPl/V3 .
> adverbs , etc. > adverbs , etc.
# NPl/V3 . + # NPl/V3 . +
> >
# #
> Once performed by hand , POS tagging is now done in the context of computational > Once performed by hand , POS tagging is now done in the context of computational
# NSg/C VP/J NSg/J/P NSg/V+ . NSg+ NSg/V VL3 NPr/V/J/C NSg/VPp/J NPr/J/P D N🅪Sg/V P J # NSg/C VP/J NSg/J/P NSg/VB+ . NSg+ NSg/Vg VL3 NPr/VB/J/C NSg/VPp/J NPr/J/P D N🅪Sg/VB P J
> linguistics , using algorithms which associate discrete terms , as well as hidden > linguistics , using algorithms which associate discrete terms , as well as hidden
# Nᴹ+ . Nᴹ/Vg/J NPl+ I/C+ NSg/V/J+ J NPl/V3+ . NSg/R NSg/V/J NSg/R V/J # Nᴹ+ . Nᴹ/Vg/J NPl+ I/C+ NSg/VB/J+ J NPl/V3+ . NSg/R NSg/VB/J NSg/R VB/J
> parts of speech , by a set of descriptive tags . POS - tagging algorithms fall into > parts of speech , by a set of descriptive tags . POS - tagging algorithms fall into
# NPl/V3 P N🅪Sg/V+ . NSg/J/P D/P NPr/V/J P NSg/J NPl/V3+ . NSg+ . NSg/V NPl+ N🅪Sg/VB+ P # NPl/V3 P N🅪Sg/VB+ . NSg/J/P D/P NPr/VB/J P NSg/J NPl/V3+ . NSg+ . NSg/Vg NPl+ N🅪Sg/VB+ P
> two distinctive groups : rule - based and stochastic . E. Brill's tagger , one of the > two distinctive groups : rule - based and stochastic . E. Brill's tagger , one of the
# NSg NSg/J NPl/V3+ . NSg/V+ . VP/J V/C J . ? ? NSg . NSg/I/V/J P D # NSg NSg/J NPl/V3+ . NSg/VB+ . VP/J VB/C J . ? ? NSg . NSg/I/VB/J P D
> first and most widely used English POS - taggers , employs rule - based algorithms . > first and most widely used English POS - taggers , employs rule - based algorithms .
# NSg/V/J V/C NSg/I/J/R/Dq R VPPtPp/J NPr🅪/V/J+ NSg+ . NPl . NPl/V3 NSg/V+ . VP/J NPl+ . # NSg/VB/J VB/C NSg/I/J/R/Dq R VP/J NPr🅪/VB/J+ NSg+ . NPl . NPl/V3 NSg/VB+ . VP/J NPl+ .
> >
# #
> Principle > Principle
# N🅪Sg/V+ # N🅪Sg/VB+
> >
# #
> Part - of - speech tagging is harder than just having a list of words and their > Part - of - speech tagging is harder than just having a list of words and their
# NSg/V/J+ . P . N🅪Sg/V+ NSg/V VL3 JC C/P V/J Nᴹ/Vg/J D/P NSg/V P NPl/V3+ V/C D$+ # NSg/VB/J+ . P . N🅪Sg/VB+ NSg/Vg VL3 JC C/P VB/J Nᴹ/Vg/J D/P NSg/VB P NPl/V3+ VB/C D$+
> parts of speech , because some words can represent more than one part of speech > parts of speech , because some words can represent more than one part of speech
# NPl/V3 P N🅪Sg/V+ . C/P I/J/R/Dq NPl/V3+ NPr/VX V NPr/I/V/J/R/Dq C/P NSg/I/V/J NSg/V/J P N🅪Sg/V+ # NPl/V3 P N🅪Sg/VB+ . C/P I/J/R/Dq NPl/V3+ NPr/VXB VB NPr/I/VB/J/R/Dq C/P NSg/I/VB/J NSg/VB/J P N🅪Sg/VB+
> at different times , and because some parts of speech are complex . This is not > at different times , and because some parts of speech are complex . This is not
# NSg/P NSg/J NPl/V3+ . V/C C/P I/J/R/Dq NPl/V3 P N🅪Sg/V+ V NSg/V/J . I/Ddem+ VL3 NSg/C # NSg/P NSg/J NPl/V3+ . VB/C C/P I/J/R/Dq NPl/V3 P N🅪Sg/VB+ VB NSg/VB/J . I/Ddem+ VL3 NSg/C
> rare — in natural languages ( as opposed to many artificial languages ) , a large > rare — in natural languages ( as opposed to many artificial languages ) , a large
# NSg/V/J . NPr/J/P NSg/J+ NPl/V3+ . NSg/R VP/J P NSg/I/J/Dq+ J+ NPl/V3+ . . D/P NSg/J # NSg/VB/J . NPr/J/P NSg/J+ NPl/V3+ . NSg/R VP/J P NSg/I/J/Dq+ J+ NPl/V3+ . . D/P NSg/J
> percentage of word - forms are ambiguous . For example , even " dogs " , which is > percentage of word - forms are ambiguous . For example , even " dogs " , which is
# N🅪Sg P NSg/V+ . NPl/V3+ V J . C/P NSg/V+ . NSg/V/J . NPl/V3+ . . I/C+ VL3 # N🅪Sg P NSg/VB+ . NPl/V3+ VB J . C/P NSg/VB+ . NSg/VB/J . NPl/V3+ . . I/C+ VL3
> usually thought of as just a plural noun , can also be a verb : > usually thought of as just a plural noun , can also be a verb :
# R N🅪Sg/VPtPp P NSg/R V/J D/P+ NSg/J+ NSg/V+ . NPr/VX R/C NSg/VXB D/P+ NSg/V+ . # R N🅪Sg/VP P NSg/R VB/J D/P+ NSg/J+ NSg/VB+ . NPr/VXB R/C NSg/VXB D/P+ NSg/VB+ .
> >
# #
> The sailor dogs the hatch . > The sailor dogs the hatch .
# D+ NSg+ NPl/V3+ D+ NSg/V+ . # D+ NSg+ NPl/V3+ D+ NSg/VB+ .
> >
# #
> Correct grammatical tagging will reflect that " dogs " is here used as a verb , not > Correct grammatical tagging will reflect that " dogs " is here used as a verb , not
# NSg/V/J J NSg/V NPr/VX V NSg/I/C/Ddem+ . NPl/V3+ . VL3 NSg/J/R VPPtPp/J NSg/R D/P NSg/V+ . NSg/C # NSg/VB/J J NSg/Vg NPr/VXB VB NSg/I/C/Ddem+ . NPl/V3+ . VL3 NSg/J/R VP/J NSg/R D/P NSg/VB+ . NSg/C
> as the more common plural noun . Grammatical context is one way to determine > as the more common plural noun . Grammatical context is one way to determine
# NSg/R D NPr/I/V/J/R/Dq NSg/V/J NSg/J NSg/V+ . J+ N🅪Sg/V+ VL3 NSg/I/V/J NSg/J P V # NSg/R D NPr/I/VB/J/R/Dq NSg/VB/J NSg/J NSg/VB+ . J+ N🅪Sg/VB+ VL3 NSg/I/VB/J NSg/J P VB
> this ; semantic analysis can also be used to infer that " sailor " and " hatch " > this ; semantic analysis can also be used to infer that " sailor " and " hatch "
# I/Ddem+ . NSg/J+ N🅪Sg+ NPr/VX R/C NSg/VXB VPPtPp/J P V NSg/I/C/Ddem+ . NSg+ . V/C . NSg/V . # I/Ddem+ . NSg/J+ N🅪Sg+ NPr/VXB R/C NSg/VXB VP/J P VB NSg/I/C/Ddem+ . NSg+ . VB/C . NSg/VB .
> implicate " dogs " as 1 ) in the nautical context and 2 ) an action applied to the > implicate " dogs " as 1 ) in the nautical context and 2 ) an action applied to the
# NSg/V . NPl/V3+ . NSg/R # . NPr/J/P D J N🅪Sg/V+ V/C # . D/P N🅪Sg/V/J+ VP/J P D # NSg/VB . NPl/V3+ . NSg/R # . NPr/J/P D J N🅪Sg/VB+ VB/C # . D/P N🅪Sg/VB/J+ VP/J P D
> object " hatch " ( in this context , " dogs " is a nautical term meaning " fastens ( a > object " hatch " ( in this context , " dogs " is a nautical term meaning " fastens ( a
# NSg/V+ . NSg/V . . NPr/J/P I/Ddem N🅪Sg/V+ . . NPl/V3+ . VL3 D/P J NSg/V/J+ N🅪Sg/Vg/J+ . V3 . D/P # NSg/VB+ . NSg/VB . . NPr/J/P I/Ddem N🅪Sg/VB+ . . NPl/V3+ . VL3 D/P J NSg/VB/J+ N🅪Sg/Vg/J+ . V3 . D/P
> watertight door ) securely " ) . > watertight door ) securely " ) .
# J NSg/V+ . R . . . # J NSg/VB+ . R . . .
> >
# #
> Tag sets > Tag sets
# NSg/V+ NPl/V3 # NSg/VB+ NPl/V3
> >
# #
> Schools commonly teach that there are 9 parts of speech in English : noun , verb , > Schools commonly teach that there are 9 parts of speech in English : noun , verb ,
# NPl/V3+ R NSg/V NSg/I/C/Ddem + V # NPl/V3 P N🅪Sg/V NPr/J/P NPr🅪/V/J . NSg/V+ . NSg/V+ . # NPl/V3+ R NSg/VB NSg/I/C/Ddem + VB # NPl/V3 P N🅪Sg/VB NPr/J/P NPr🅪/VB/J . NSg/VB+ . NSg/VB+ .
> article , adjective , preposition , pronoun , adverb , conjunction , and interjection . > article , adjective , preposition , pronoun , adverb , conjunction , and interjection .
# NSg/V+ . NSg/V/J+ . NSg/V . NSg/V+ . NSg/V+ . NSg/V+ . V/C N🅪Sg+ . # NSg/VB+ . NSg/VB/J+ . NSg/VB . NSg/VB+ . NSg/VB+ . NSg/VB+ . VB/C N🅪Sg+ .
> However , there are clearly many more categories and sub - categories . For nouns , > However , there are clearly many more categories and sub - categories . For nouns ,
# C . + V R NSg/I/J/Dq+ NPr/I/V/J/R/Dq+ NPl+ V/C NSg/V/P . NPl+ . C/P NPl/V3 . # C . + VB R NSg/I/J/Dq+ NPr/I/VB/J/R/Dq+ NPl+ VB/C NSg/VB/P . NPl+ . C/P NPl/V3 .
> the plural , possessive , and singular forms can be distinguished . In many > the plural , possessive , and singular forms can be distinguished . In many
# D NSg/J . NSg/J . V/C NSg/J NPl/V3+ NPr/VX NSg/VXB VP/J . NPr/J/P NSg/I/J/Dq+ # D NSg/J . NSg/J . VB/C NSg/J NPl/V3+ NPr/VXB NSg/VXB VP/J . NPr/J/P NSg/I/J/Dq+
> languages words are also marked for their " case " ( role as subject , object , > languages words are also marked for their " case " ( role as subject , object ,
# NPl/V3+ NPl/V3+ V R/C VP/J C/P D$+ . NPr🅪/V+ . . NSg NSg/R NSg/V/J+ . NSg/V+ . # NPl/V3+ NPl/V3+ VB R/C VP/J C/P D$+ . NPr🅪/VB+ . . NSg NSg/R NSg/VB/J+ . NSg/VB+ .
> etc. ) , grammatical gender , and so on ; while verbs are marked for tense , aspect , > etc. ) , grammatical gender , and so on ; while verbs are marked for tense , aspect ,
# + . . J+ N🅪Sg/V/J+ . V/C NSg/I/J/C J/P . NSg/V/C/P NPl/V3+ V VP/J C/P NSg/V/J . NSg/V+ . # + . . J+ N🅪Sg/VB/J+ . VB/C NSg/I/J/C J/P . NSg/VB/C/P NPl/V3+ VB VP/J C/P NSg/VB/J . NSg/VB+ .
> and other things . In some tagging systems , different inflections of the same > and other things . In some tagging systems , different inflections of the same
# V/C NSg/V/J+ NPl+ . NPr/J/P I/J/R/Dq NSg/V NPl+ . NSg/J NPl P D I/J # VB/C NSg/VB/J+ NPl+ . NPr/J/P I/J/R/Dq NSg/Vg NPl+ . NSg/J NPl P D I/J
> root word will get different parts of speech , resulting in a large number of > root word will get different parts of speech , resulting in a large number of
# NPr/V+ NSg/V+ NPr/VX NSg/V NSg/J NPl/V3 P N🅪Sg/V+ . Nᴹ/Vg/J NPr/J/P D/P NSg/J N🅪Sg/V/JC P # NPr/VB+ NSg/VB+ NPr/VXB NSg/VB NSg/J NPl/V3 P N🅪Sg/VB+ . Nᴹ/Vg/J NPr/J/P D/P NSg/J N🅪Sg/VB/JC P
> tags . For example , NN for singular common nouns , NNS for plural common nouns , NP > tags . For example , NN for singular common nouns , NNS for plural common nouns , NP
# NPl/V3+ . C/P NSg/V+ . ? C/P NSg/J NSg/V/J NPl/V3 . ? C/P NSg/J NSg/V/J NPl/V3 . NPr # NPl/V3+ . C/P NSg/VB+ . ? C/P NSg/J NSg/VB/J NPl/V3 . ? C/P NSg/J NSg/VB/J NPl/V3 . NPr
> for singular proper nouns ( see the POS tags used in the Brown Corpus ) . Other > for singular proper nouns ( see the POS tags used in the Brown Corpus ) . Other
# C/P NSg/J NSg/J NPl/V3 . NSg/V D NSg+ NPl/V3+ VPPtPp/J NPr/J/P D NPr🅪/V/J NSg+ . . NSg/V/J # C/P NSg/J NSg/J NPl/V3 . NSg/VB D NSg+ NPl/V3+ VP/J NPr/J/P D NPr🅪/VB/J NSg+ . . NSg/VB/J
> tagging systems use a smaller number of tags and ignore fine differences or > tagging systems use a smaller number of tags and ignore fine differences or
# NSg/V NPl+ N🅪Sg/VB D/P NSg/JC N🅪Sg/V/JC P NPl/V3+ V/C V NSg/V/J NPl/V NPr/C # NSg/Vg NPl+ N🅪Sg/VB D/P NSg/JC N🅪Sg/VB/JC P NPl/V3+ VB/C VB NSg/VB/J NPl/VB NPr/C
> model them as features somewhat independent from part - of - speech . > model them as features somewhat independent from part - of - speech .
# NSg/V/J+ NSg/IPl+ NSg/R NPl/V3+ NSg/I/R NSg/J P NSg/V/J+ . P . N🅪Sg/V+ . # NSg/VB/J+ NSg/IPl+ NSg/R NPl/V3+ NSg/I/R NSg/J P NSg/VB/J+ . P . N🅪Sg/VB+ .
> >
# #
> In part - of - speech tagging by computer , it is typical to distinguish from 50 to > In part - of - speech tagging by computer , it is typical to distinguish from 50 to
# NPr/J/P NSg/V/J+ . P . N🅪Sg/V+ NSg/V NSg/J/P NSg/V+ . NPr/ISg+ VL3 NSg/J P V P # P # NPr/J/P NSg/VB/J+ . P . N🅪Sg/VB+ NSg/Vg NSg/J/P NSg/VB+ . NPr/ISg+ VL3 NSg/J P VB P # P
> 150 separate parts of speech for English . Work on stochastic methods for tagging > 150 separate parts of speech for English . Work on stochastic methods for tagging
# # NSg/V/J NPl/V3 P N🅪Sg/V C/P NPr🅪/V/J+ . N🅪Sg/V J/P J NPl/V3+ C/P NSg/V # # NSg/VB/J NPl/V3 P N🅪Sg/VB C/P NPr🅪/VB/J+ . N🅪Sg/VB J/P J NPl/V3+ C/P NSg/Vg
> Koine Greek ( DeRose 1990 ) has used over 1 , 000 parts of speech and found that > Koine Greek ( DeRose 1990 ) has used over 1 , 000 parts of speech and found that
# ? NPr/V/J . ? # . V3 VPPtPp/J NSg/J/P # . # NPl/V3 P N🅪Sg/V+ V/C NSg/V NSg/I/C/Ddem # ? NPr/VB/J . ? # . V3 VP/J NSg/J/P # . # NPl/V3 P N🅪Sg/VB+ VB/C NSg/VB NSg/I/C/Ddem
> about as many words were ambiguous in that language as in English . A > about as many words were ambiguous in that language as in English . A
# J/P NSg/R NSg/I/J/Dq NPl/V3+ NSg/VPt J NPr/J/P NSg/I/C/Ddem N🅪Sg/V+ NSg/R NPr/J/P NPr🅪/V/J+ . D/P # J/P NSg/R NSg/I/J/Dq NPl/V3+ NSg/VPt J NPr/J/P NSg/I/C/Ddem N🅪Sg/VB+ NSg/R NPr/J/P NPr🅪/VB/J+ . D/P
> morphosyntactic descriptor in the case of morphologically rich languages is > morphosyntactic descriptor in the case of morphologically rich languages is
# ? NSg NPr/J/P D NPr🅪/V P ? NPr/V/J NPl/V3+ VL3 # ? NSg NPr/J/P D NPr🅪/VB P ? NPr/VB/J NPl/V3+ VL3
> commonly expressed using very short mnemonics , such as Ncmsan for Category = Noun , > commonly expressed using very short mnemonics , such as Ncmsan for Category = Noun ,
# R VP/J Nᴹ/Vg/J J/R NPr/V/J/P NPl . NSg/I NSg/R ? C/P NSg+ . NSg/V+ . # R VP/J Nᴹ/Vg/J J/R NPr/VB/J/P NPl . NSg/I NSg/R ? C/P NSg+ . NSg/VB+ .
> Type = common , Gender = masculine , Number = singular , Case = accusative , Animate > Type = common , Gender = masculine , Number = singular , Case = accusative , Animate
# NSg/V+ . NSg/V/J . N🅪Sg/V/J+ . NSg/J . N🅪Sg/V/JC+ . NSg/J . NPr🅪/V+ . NSg/J . V/J # NSg/VB+ . NSg/VB/J . N🅪Sg/VB/J+ . NSg/J . N🅪Sg/VB/JC+ . NSg/J . NPr🅪/VB+ . NSg/J . VB/J
> = no . > = no .
# . NPr/P . # . NPr/P .
> >
# #
> The most popular " tag set " for POS tagging for American English is probably the > The most popular " tag set " for POS tagging for American English is probably the
# D NSg/I/J/R/Dq NSg/J . NSg/V NPr/V/J . C/P NSg+ NSg/V C/P NPr/J NPr🅪/V/J+ VL3 R D # D NSg/I/J/R/Dq NSg/J . NSg/VB NPr/VB/J . C/P NSg+ NSg/Vg C/P NPr/J NPr🅪/VB/J+ VL3 R D
> Penn tag set , developed in the Penn Treebank project . It is largely similar to > Penn tag set , developed in the Penn Treebank project . It is largely similar to
# NPr+ NSg/V+ NPr/V/J . VP/J NPr/J/P D NPr+ ? NSg/V+ . NPr/ISg+ VL3 R NSg/J P # NPr+ NSg/VB+ NPr/VB/J . VP/J NPr/J/P D NPr+ ? NSg/VB+ . NPr/ISg+ VL3 R NSg/J P
> the earlier Brown Corpus and LOB Corpus tag sets , though much smaller . In > the earlier Brown Corpus and LOB Corpus tag sets , though much smaller . In
# D JC NPr🅪/V/J NSg V/C NSg/V NSg+ NSg/V+ NPl/V3 . V/C NSg/I/J/R/Dq NSg/JC . NPr/J/P # D JC NPr🅪/VB/J NSg VB/C NSg/VB NSg+ NSg/VB+ NPl/V3 . VB/C NSg/I/J/R/Dq NSg/JC . NPr/J/P
> Europe , tag sets from the Eagles Guidelines see wide use and include versions > Europe , tag sets from the Eagles Guidelines see wide use and include versions
# NPr+ . NSg/V+ NPl/V3 P D NPl/V3 NPl+ NSg/V NSg/J N🅪Sg/VB+ V/C NSg/V NPl/V3+ # NPr+ . NSg/VB+ NPl/V3 P D NPl/V3 NPl+ NSg/VB NSg/J N🅪Sg/VB+ VB/C NSg/VB NPl/V3+
> for multiple languages . > for multiple languages .
# C/P NSg/J/Dq NPl/V3+ . # C/P NSg/J/Dq NPl/V3+ .
> >
# #
> POS tagging work has been done in a variety of languages , and the set of POS > POS tagging work has been done in a variety of languages , and the set of POS
# NSg+ NSg/V N🅪Sg/V+ V3 NSg/VPp NSg/VPp/J NPr/J/P D/P NSg P NPl/V3+ . V/C D NPr/V/J P NSg+ # NSg+ NSg/Vg N🅪Sg/VB+ V3 NSg/VPp NSg/VPp/J NPr/J/P D/P NSg P NPl/V3+ . VB/C D NPr/VB/J P NSg+
> tags used varies greatly with language . Tags usually are designed to include > tags used varies greatly with language . Tags usually are designed to include
# NPl/V3+ VPPtPp/J NPl/V3 R P N🅪Sg/V+ . NPl/V3+ R V VP/J P NSg/V # NPl/V3+ VP/J NPl/V3 R P N🅪Sg/VB+ . NPl/V3+ R VB VP/J P NSg/VB
> overt morphological distinctions , although this leads to inconsistencies such as > overt morphological distinctions , although this leads to inconsistencies such as
# NSg/J+ J+ NPl+ . C I/Ddem NPl/V3 P NPl NSg/I NSg/R # NSg/J+ J+ NPl+ . C I/Ddem NPl/V3 P NPl NSg/I NSg/R
> case - marking for pronouns but not nouns in English , and much larger > case - marking for pronouns but not nouns in English , and much larger
# NPr🅪/V+ . Nᴹ/Vg/J C/P NPl/V3 NSg/C/P NSg/C NPl/V3 NPr/J/P NPr🅪/V/J+ . V/C NSg/I/J/R/Dq JC # NPr🅪/VB+ . Nᴹ/Vg/J C/P NPl/V3 NSg/C/P NSg/C NPl/V3 NPr/J/P NPr🅪/VB/J+ . VB/C NSg/I/J/R/Dq JC
> cross - language differences . The tag sets for heavily inflected languages such as > cross - language differences . The tag sets for heavily inflected languages such as
# NPr/V/J/P+ . N🅪Sg/V+ NPl/V+ . D+ NSg/V+ NPl/V3 C/P R VP/J NPl/V3+ NSg/I NSg/R # NPr/VB/J/P+ . N🅪Sg/VB+ NPl/VB+ . D+ NSg/VB+ NPl/V3 C/P R VP/J NPl/V3+ NSg/I NSg/R
> Greek and Latin can be very large ; tagging words in agglutinative languages such > Greek and Latin can be very large ; tagging words in agglutinative languages such
# NPr/V/J V/C NPr/J NPr/VX NSg/VXB J/R NSg/J . NSg/V NPl/V3+ NPr/J/P ? NPl/V3+ NSg/I # NPr/VB/J VB/C NPr/J NPr/VXB NSg/VXB J/R NSg/J . NSg/Vg NPl/V3+ NPr/J/P ? NPl/V3+ NSg/I
> as Inuit languages may be virtually impossible . At the other extreme , Petrov et > as Inuit languages may be virtually impossible . At the other extreme , Petrov et
# NSg/R NPr/J NPl/V3+ NPr/VX NSg/VXB R NSg/J . NSg/P D NSg/V/J NSg/J . ? ? # NSg/R NPr/J NPl/V3+ NPr/VXB NSg/VXB R NSg/J . NSg/P D NSg/VB/J NSg/J . ? ?
> al. have proposed a " universal " tag set , with 12 categories ( for example , no > al. have proposed a " universal " tag set , with 12 categories ( for example , no
# ? NSg/VX VP/J D/P . NSg/J . NSg/V+ NPr/V/J . P # NPl+ . C/P NSg/V+ . NPr/P # ? NSg/VXB VP/J D/P . NSg/J . NSg/VB+ NPr/VB/J . P # NPl+ . C/P NSg/VB+ . NPr/P
> subtypes of nouns , verbs , punctuation , and so on ) . Whether a very small set of > subtypes of nouns , verbs , punctuation , and so on ) . Whether a very small set of
# NPl P NPl/V3 . NPl/V3+ . Nᴹ+ . V/C NSg/I/J/C J/P . . I/C D/P J/R NPr/V/J NPr/V/J P # NPl P NPl/V3 . NPl/V3+ . Nᴹ+ . VB/C NSg/I/J/C J/P . . I/C D/P J/R NPr/VB/J NPr/VB/J P
> very broad tags or a much larger set of more precise ones is preferable , depends > very broad tags or a much larger set of more precise ones is preferable , depends
# J/R NSg/J NPl/V3+ NPr/C D/P NSg/I/J/R/Dq JC NPr/V/J P NPr/I/V/J/R/Dq V/J+ NPl/V3+ VL3 W? . NPl/V3 # J/R NSg/J NPl/V3+ NPr/C D/P NSg/I/J/R/Dq JC NPr/VB/J P NPr/I/VB/J/R/Dq VB/J+ NPl/V3+ VL3 W? . NPl/V3
> on the purpose at hand . Automatic tagging is easier on smaller tag - sets . > on the purpose at hand . Automatic tagging is easier on smaller tag - sets .
# J/P D N🅪Sg/V+ NSg/P NSg/V+ . NSg/J NSg/V VL3 NSg/JC J/P NSg/JC NSg/V+ . NPl/V3 . # J/P D N🅪Sg/VB+ NSg/P NSg/VB+ . NSg/J NSg/Vg VL3 NSg/JC J/P NSg/JC NSg/VB+ . NPl/V3 .
> >
# #
> History > History
# N🅪Sg+ # N🅪Sg+
> >
# #
> The Brown Corpus > The Brown Corpus
# D+ NPr🅪/V/J+ NSg+ # D+ NPr🅪/VB/J+ NSg+
> >
# #
> Research on part - of - speech tagging has been closely tied to corpus linguistics . > Research on part - of - speech tagging has been closely tied to corpus linguistics .
# Nᴹ/V J/P NSg/V/J+ . P . N🅪Sg/V+ NSg/V V3 NSg/VPp R VP/J P NSg Nᴹ+ . # Nᴹ/VB J/P NSg/VB/J+ . P . N🅪Sg/VB+ NSg/Vg V3 NSg/VPp R VP/J P NSg Nᴹ+ .
> The first major corpus of English for computer analysis was the Brown Corpus > The first major corpus of English for computer analysis was the Brown Corpus
# D NSg/V/J NPr/V/J NSg P NPr🅪/V/J C/P NSg/V+ N🅪Sg+ VPt D NPr🅪/V/J NSg # D NSg/VB/J NPr/VB/J NSg P NPr🅪/VB/J C/P NSg/VB+ N🅪Sg+ VPt D NPr🅪/VB/J NSg
> developed at Brown University by Henry Kučera and W. Nelson Francis , in the > developed at Brown University by Henry Kučera and W. Nelson Francis , in the
# VP/J NSg/P NPr🅪/V/J NSg+ NSg/J/P NPr+ ? V/C ? NPr+ NPr+ . NPr/J/P D # VP/J NSg/P NPr🅪/VB/J NSg+ NSg/J/P NPr+ ? VB/C ? NPr+ NPr+ . NPr/J/P D
> mid - 1960s . It consists of about 1 , 000 , 000 words of running English prose text , > mid - 1960s . It consists of about 1 , 000 , 000 words of running English prose text ,
# NSg/J/P+ . #d . NPr/ISg+ NPl/V3 P J/P # . # . # NPl/V3 P Nᴹ/V/J/P NPr🅪/V/J+ Nᴹ/V N🅪Sg/V+ . # NSg/J/P+ . #d . NPr/ISg+ NPl/V3 P J/P # . # . # NPl/V3 P Nᴹ/Vg/J/P NPr🅪/VB/J+ Nᴹ/VB N🅪Sg/VB+ .
> made up of 500 samples from randomly chosen publications . Each sample is 2 , 000 > made up of 500 samples from randomly chosen publications . Each sample is 2 , 000
# V NSg/V/J/P P # NPl/V3+ P R Nᴹ/V/J NPl+ . Dq+ NSg/V+ VL3 # . # # VB NSg/VB/J/P P # NPl/V3+ P R Nᴹ/VPp/J NPl+ . Dq+ NSg/VB+ VL3 # . #
> or more words ( ending at the first sentence - end after 2 , 000 words , so that the > or more words ( ending at the first sentence - end after 2 , 000 words , so that the
# NPr/C NPr/I/V/J/R/Dq NPl/V3+ . Nᴹ/Vg/J NSg/P D NSg/V/J NSg/V+ . NSg/V+ P # . # NPl/V3+ . NSg/I/J/C NSg/I/C/Ddem D # NPr/C NPr/I/VB/J/R/Dq NPl/V3+ . Nᴹ/Vg/J NSg/P D NSg/VB/J NSg/VB+ . NSg/VB+ P # . # NPl/V3+ . NSg/I/J/C NSg/I/C/Ddem D
> corpus contains only complete sentences ) . > corpus contains only complete sentences ) .
# NSg+ V3 J/R/C NSg/V/J NPl/V3+ . . # NSg+ V3 J/R/C NSg/VB/J NPl/V3+ . .
> >
# #
> The Brown Corpus was painstakingly " tagged " with part - of - speech markers over > The Brown Corpus was painstakingly " tagged " with part - of - speech markers over
# D+ NPr🅪/V/J+ NSg+ VPt R . V/J . P NSg/V/J+ . P . N🅪Sg/V+ NPl/V3 NSg/J/P # D+ NPr🅪/VB/J+ NSg+ VPt R . VP/J . P NSg/VB/J+ . P . N🅪Sg/VB+ NPl/V3 NSg/J/P
> many years . A first approximation was done with a program by Greene and Rubin , > many years . A first approximation was done with a program by Greene and Rubin ,
# NSg/I/J/Dq+ NPl+ . D/P+ NSg/V/J+ N🅪Sg+ VPt NSg/VPp/J P D/P+ NPr/V+ NSg/J/P NPr V/C NPr . # NSg/I/J/Dq+ NPl+ . D/P+ NSg/VB/J+ N🅪Sg+ VPt NSg/VPp/J P D/P+ NPr/VB+ NSg/J/P NPr VB/C NPr .
> which consisted of a huge handmade list of what categories could co - occur at > which consisted of a huge handmade list of what categories could co - occur at
# I/C+ VP/J P D/P J NSg/J NSg/V P NSg/I+ NPl+ NSg/VX NPr/I/V+ . V NSg/P # I/C+ VP/J P D/P J NSg/J NSg/VB P NSg/I+ NPl+ NSg/VXB NPr/I/VB+ . VB NSg/P
> all . For example , article then noun can occur , but article then verb ( arguably ) > all . For example , article then noun can occur , but article then verb ( arguably )
# NSg/I/J/C/Dq . C/P NSg/V+ . NSg/V+ NSg/J/C NSg/V+ NPr/VX V . NSg/C/P NSg/V+ NSg/J/C NSg/V+ . R . # NSg/I/J/C/Dq . C/P NSg/VB+ . NSg/VB+ NSg/J/C NSg/VB+ NPr/VXB VB . NSg/C/P NSg/VB+ NSg/J/C NSg/VB+ . R .
> cannot . The program got about 70 % correct . Its results were repeatedly reviewed > cannot . The program got about 70 % correct . Its results were repeatedly reviewed
# NSg/V . D+ NPr/V+ V J/P # . NSg/V/J . ISg/D$+ NPl/V3+ NSg/VPt R VP/J # NSg/VB . D+ NPr/VB+ VB J/P # . NSg/VB/J . ISg/D$+ NPl/V3+ NSg/VPt R VP/J
> and corrected by hand , and later users sent in errata so that by the late 70 s > and corrected by hand , and later users sent in errata so that by the late 70 s
# V/C VP/J NSg/J/P NSg/V+ . V/C JC NPl+ NSg/V NPr/J/P NSg NSg/I/J/C NSg/I/C/Ddem+ NSg/J/P D NSg/J # ? # VB/C VP/J NSg/J/P NSg/VB+ . VB/C JC NPl+ NSg/VB NPr/J/P NSg NSg/I/J/C NSg/I/C/Ddem+ NSg/J/P D NSg/J # ?
> the tagging was nearly perfect ( allowing for some cases on which even human > the tagging was nearly perfect ( allowing for some cases on which even human
# D NSg/V VPt R NSg/V/J . Nᴹ/Vg/J C/P I/J/R/Dq NPl/V3+ J/P I/C+ NSg/V/J NSg/V/J+ # D NSg/Vg VPt R NSg/VB/J . Nᴹ/Vg/J C/P I/J/R/Dq NPl/V3+ J/P I/C+ NSg/VB/J NSg/VB/J+
> speakers might not agree ) . > speakers might not agree ) .
# + Nᴹ/VX/J NSg/C V . . # + Nᴹ/VXB/J NSg/C VB . .
> >
# #
> This corpus has been used for innumerable studies of word - frequency and of > This corpus has been used for innumerable studies of word - frequency and of
# I/Ddem+ NSg+ V3 NSg/VPp VPPtPp/J C/P J NPl/V3 P NSg/V+ . NSg V/C P # I/Ddem+ NSg+ V3 NSg/VPp VP/J C/P J NPl/V3 P NSg/VB+ . NSg VB/C P
> part - of - speech and inspired the development of similar " tagged " corpora in many > part - of - speech and inspired the development of similar " tagged " corpora in many
# NSg/V/J+ . P . N🅪Sg/V+ V/C VP/J D N🅪Sg P NSg/J . V/J . NPl+ NPr/J/P NSg/I/J/Dq # NSg/VB/J+ . P . N🅪Sg/VB+ VB/C VP/J D N🅪Sg P NSg/J . VP/J . NPl+ NPr/J/P NSg/I/J/Dq
> other languages . Statistics derived by analyzing it formed the basis for most > other languages . Statistics derived by analyzing it formed the basis for most
# NSg/V/J NPl/V3+ . NPl/V3+ VP/J NSg/J/P Nᴹ/Vg/J NPr/ISg+ VP/J D+ NSg+ C/P NSg/I/J/R/Dq # NSg/VB/J NPl/V3+ . NPl/V3+ VP/J NSg/J/P Nᴹ/Vg/J NPr/ISg+ VP/J D+ NSg+ C/P NSg/I/J/R/Dq
> later part - of - speech tagging systems , such as CLAWS and VOLSUNGA . However , by > later part - of - speech tagging systems , such as CLAWS and VOLSUNGA . However , by
# JC NSg/V/J+ . P . N🅪Sg/V+ NSg/V NPl+ . NSg/I NSg/R NPl/V3+ V/C ? . C . NSg/J/P # JC NSg/VB/J+ . P . N🅪Sg/VB+ NSg/Vg NPl+ . NSg/I NSg/R NPl/V3+ VB/C ? . C . NSg/J/P
> this time ( 2005 ) it has been superseded by larger corpora such as the 100 > this time ( 2005 ) it has been superseded by larger corpora such as the 100
# I/Ddem+ N🅪Sg/V/J+ . # . NPr/ISg+ V3 NSg/VPp VP/J NSg/J/P JC NPl+ NSg/I NSg/R D # # I/Ddem+ N🅪Sg/VB/J+ . # . NPr/ISg+ V3 NSg/VPp VP/J NSg/J/P JC NPl+ NSg/I NSg/R D #
> million word British National Corpus , even though larger corpora are rarely so > million word British National Corpus , even though larger corpora are rarely so
# NSg NSg/V+ NPr/J NSg/J NSg+ . NSg/V/J V/C JC NPl+ V R NSg/I/J/C # NSg NSg/VB+ NPr/J NSg/J NSg+ . NSg/VB/J VB/C JC NPl+ VB R NSg/I/J/C
> thoroughly curated . > thoroughly curated .
# R VP/J . # R VP/J .
> >
# #
> For some time , part - of - speech tagging was considered an inseparable part of > For some time , part - of - speech tagging was considered an inseparable part of
# C/P I/J/R/Dq N🅪Sg/V/J+ . NSg/V/J+ . P . N🅪Sg/V+ NSg/V VPt VP/J D/P NSg/J NSg/V/J P # C/P I/J/R/Dq N🅪Sg/VB/J+ . NSg/VB/J+ . P . N🅪Sg/VB+ NSg/Vg VPt VP/J D/P NSg/J NSg/VB/J P
> natural language processing , because there are certain cases where the correct > natural language processing , because there are certain cases where the correct
# NSg/J N🅪Sg/V+ Nᴹ/Vg/J+ . C/P + V I/J NPl/V3+ NSg/C D NSg/V/J # NSg/J N🅪Sg/VB+ Nᴹ/Vg/J+ . C/P + VB I/J NPl/V3+ NSg/C D NSg/VB/J
> part of speech cannot be decided without understanding the semantics or even the > part of speech cannot be decided without understanding the semantics or even the
# NSg/V/J P N🅪Sg/V+ NSg/V NSg/VXB NSg/VP/J C/P N🅪Sg/Vg/J+ D NPl+ NPr/C NSg/V/J D # NSg/VB/J P N🅪Sg/VB+ NSg/VB NSg/VXB NSg/VP/J C/P N🅪Sg/Vg/J+ D NPl+ NPr/C NSg/VB/J D
> pragmatics of the context . This is extremely expensive , especially because > pragmatics of the context . This is extremely expensive , especially because
# NPl P D N🅪Sg/V+ . I/Ddem+ VL3 R J . R C/P # NPl P D N🅪Sg/VB+ . I/Ddem+ VL3 R J . R C/P
> analyzing the higher levels is much harder when multiple part - of - speech > analyzing the higher levels is much harder when multiple part - of - speech
# Nᴹ/Vg/J D+ NSg/JC+ NPl/V3+ VL3 NSg/I/J/R/Dq JC NSg/I/C NSg/J/Dq NSg/V/J . P . N🅪Sg/V+ # Nᴹ/Vg/J D+ NSg/JC+ NPl/V3+ VL3 NSg/I/J/R/Dq JC NSg/I/C NSg/J/Dq NSg/VB/J . P . N🅪Sg/VB+
> possibilities must be considered for each word . > possibilities must be considered for each word .
# NPl+ NSg/V NSg/VXB VP/J C/P Dq+ NSg/V+ . # NPl+ NSg/VB NSg/VXB VP/J C/P Dq+ NSg/VB+ .
> >
# #
> Use of hidden Markov models > Use of hidden Markov models
# N🅪Sg/VB P V/J NPr NPl/V3+ # N🅪Sg/VB P VB/J NPr NPl/V3+
> >
# #
> In the mid - 1980s , researchers in Europe began to use hidden Markov models ( HMMs ) > In the mid - 1980s , researchers in Europe began to use hidden Markov models ( HMMs )
# NPr/J/P D NSg/J/P+ . #d . NPl NPr/J/P NPr+ VPt P N🅪Sg/VB V/J NPr NPl/V3+ . ? . # NPr/J/P D NSg/J/P+ . #d . NPl NPr/J/P NPr+ VPt P N🅪Sg/VB VB/J NPr NPl/V3+ . ? .
> to disambiguate parts of speech , when working to tag the Lancaster - Oslo - Bergen > to disambiguate parts of speech , when working to tag the Lancaster - Oslo - Bergen
# P V NPl/V3 P N🅪Sg/V+ . NSg/I/C Nᴹ/Vg/J P NSg/V D NPr . NPr+ . NPr+ # P VB NPl/V3 P N🅪Sg/VB+ . NSg/I/C Nᴹ/Vg/J P NSg/VB D NPr . NPr+ . NPr+
> Corpus of British English . HMMs involve counting cases ( such as from the Brown > Corpus of British English . HMMs involve counting cases ( such as from the Brown
# NSg P NPr/J NPr🅪/V/J+ . ? V Nᴹ/Vg/J NPl/V3+ . NSg/I NSg/R P D NPr🅪/V/J # NSg P NPr/J NPr🅪/VB/J+ . ? VB Nᴹ/Vg/J NPl/V3+ . NSg/I NSg/R P D NPr🅪/VB/J
> Corpus ) and making a table of the probabilities of certain sequences . For > Corpus ) and making a table of the probabilities of certain sequences . For
# NSg+ . V/C Nᴹ/Vg/J D/P NSg/V P D NPl P I/J NPl/V3+ . C/P # NSg+ . VB/C Nᴹ/Vg/J D/P NSg/VB P D NPl P I/J NPl/V3+ . C/P
> example , once you've seen an article such as ' the ' , perhaps the next word is a > example , once you've seen an article such as ' the ' , perhaps the next word is a
# NSg/V+ . NSg/C W? NSg/VPp D/P NSg/V+ NSg/I NSg/R . D . . NSg/R D NSg/J/P NSg/V+ VL3 D/P # NSg/VB+ . NSg/C W? NSg/VPp D/P NSg/VB+ NSg/I NSg/R . D . . NSg/R D NSg/J/P NSg/VB+ VL3 D/P
> noun 40 % of the time , an adjective 40 % , and a number 20 % . Knowing this , a > noun 40 % of the time , an adjective 40 % , and a number 20 % . Knowing this , a
# NSg/V+ # . P D N🅪Sg/V/J+ . D/P NSg/V/J+ # . . V/C D/P N🅪Sg/V/JC+ # . . NSg/V/J/P I/Ddem+ . D/P+ # NSg/VB+ # . P D N🅪Sg/VB/J+ . D/P NSg/VB/J+ # . . VB/C D/P N🅪Sg/VB/JC+ # . . NSg/VB/J/P I/Ddem+ . D/P+
> program can decide that " can " in " the can " is far more likely to be a noun than > program can decide that " can " in " the can " is far more likely to be a noun than
# NPr/V+ NPr/VX V NSg/I/C/Ddem+ . NPr/VX . NPr/J/P . D+ NPr/VX . VL3 NSg/V/J NPr/I/V/J/R/Dq NSg/J P NSg/VXB D/P NSg/V C/P # NPr/VB+ NPr/VXB VB NSg/I/C/Ddem+ . NPr/VXB . NPr/J/P . D+ NPr/VXB . VL3 NSg/VB/J NPr/I/VB/J/R/Dq NSg/J P NSg/VXB D/P NSg/VB C/P
> a verb or a modal . The same method can , of course , be used to benefit from > a verb or a modal . The same method can , of course , be used to benefit from
# D/P+ NSg/V+ NPr/C D/P NSg/J . D+ I/J+ NSg/V+ NPr/VX . P NSg/V+ . NSg/VXB VPPtPp/J P NSg/V P # D/P+ NSg/VB+ NPr/C D/P NSg/J . D+ I/J+ NSg/VB+ NPr/VXB . P NSg/VB+ . NSg/VXB VP/J P NSg/VB P
> knowledge about the following words . > knowledge about the following words .
# Nᴹ+ J/P D+ Nᴹ/Vg/J/P NPl/V3+ . # Nᴹ+ J/P D+ Nᴹ/Vg/J/P NPl/V3+ .
> >
# #
> More advanced ( " higher - order " ) HMMs learn the probabilities not only of pairs > More advanced ( " higher - order " ) HMMs learn the probabilities not only of pairs
# NPr/I/V/J/R/Dq VP/J . . NSg/JC . NSg/V . . ? NSg/V D NPl+ NSg/C J/R/C P NPl/V3+ # NPr/I/VB/J/R/Dq VP/J . . NSg/JC . NSg/VB . . ? NSg/VB D NPl+ NSg/C J/R/C P NPl/V3+
> but triples or even larger sequences . So , for example , if you've just seen a > but triples or even larger sequences . So , for example , if you've just seen a
# NSg/C/P NPl/V3 NPr/C NSg/V/J JC NPl/V3+ . NSg/I/J/C . C/P NSg/V+ . NSg/C W? V/J NSg/VPp D/P # NSg/C/P NPl/V3 NPr/C NSg/VB/J JC NPl/V3+ . NSg/I/J/C . C/P NSg/VB+ . NSg/C W? VB/J NSg/VPp D/P
> noun followed by a verb , the next item may be very likely a preposition , > noun followed by a verb , the next item may be very likely a preposition ,
# NSg/V+ VP/J NSg/J/P D/P NSg/V+ . D NSg/J/P NSg/V+ NPr/VX NSg/VXB J/R NSg/J D/P NSg/V . # NSg/VB+ VP/J NSg/J/P D/P NSg/VB+ . D NSg/J/P NSg/VB+ NPr/VXB NSg/VXB J/R NSg/J D/P NSg/VB .
> article , or noun , but much less likely another verb . > article , or noun , but much less likely another verb .
# NSg/V+ . NPr/C NSg/V+ . NSg/C/P NSg/I/J/R/Dq V/J/R/C/P NSg/J I/D NSg/V+ . # NSg/VB+ . NPr/C NSg/VB+ . NSg/C/P NSg/I/J/R/Dq VB/J/R/C/P NSg/J I/D NSg/VB+ .
> >
# #
> When several ambiguous words occur together , the possibilities multiply . > When several ambiguous words occur together , the possibilities multiply .
# NSg/I/C J/Dq+ J+ NPl/V3+ V J . D+ NPl+ NSg/V . # NSg/I/C J/Dq+ J+ NPl/V3+ VB J . D+ NPl+ NSg/VB .
> However , it is easy to enumerate every combination and to assign a relative > However , it is easy to enumerate every combination and to assign a relative
# C . NPr/ISg+ VL3 NSg/V/J P V Dq+ N🅪Sg+ V/C P NSg/V D/P NSg/J # C . NPr/ISg+ VL3 NSg/VB/J P VB Dq+ N🅪Sg+ VB/C P NSg/VB D/P NSg/J
> probability to each one , by multiplying together the probabilities of each > probability to each one , by multiplying together the probabilities of each
# NSg+ P Dq NSg/I/V/J+ . NSg/J/P Nᴹ/Vg/J J D NPl P Dq # NSg+ P Dq NSg/I/VB/J+ . NSg/J/P Nᴹ/Vg/J J D NPl P Dq
> choice in turn . The combination with the highest probability is then chosen . The > choice in turn . The combination with the highest probability is then chosen . The
# N🅪Sg/J+ NPr/J/P NSg/V . D N🅪Sg P D+ JS+ NSg+ VL3 NSg/J/C Nᴹ/V/J . D+ # N🅪Sg/J+ NPr/J/P NSg/VB . D N🅪Sg P D+ JS+ NSg+ VL3 NSg/J/C Nᴹ/VPp/J . D+
> European group developed CLAWS , a tagging program that did exactly this and > European group developed CLAWS , a tagging program that did exactly this and
# NSg/J+ NSg/V+ VP/J NPl/V3+ . D/P NSg/V NPr/V+ NSg/I/C/Ddem+ VPt R I/Ddem V/C # NSg/J+ NSg/VB+ VP/J NPl/V3+ . D/P NSg/Vg NPr/VB+ NSg/I/C/Ddem+ VPt R I/Ddem VB/C
> achieved accuracy in the 93 95 % range . > achieved accuracy in the 93 95 % range .
# VP/J N🅪Sg+ NPr/J/P D # . # . N🅪Sg/V+ . # VP/J N🅪Sg+ NPr/J/P D # . # . N🅪Sg/VB+ .
> >
# #
> Eugene Charniak points out in Statistical techniques for natural language > Eugene Charniak points out in Statistical techniques for natural language
# NPr+ ? NPl/V3+ NSg/V/J/R/P NPr/J/P J NPl C/P NSg/J+ N🅪Sg/V+ # NPr+ ? NPl/V3+ NSg/VB/J/R/P NPr/J/P J NPl C/P NSg/J+ N🅪Sg/VB+
> parsing ( 1997 ) that merely assigning the most common tag to each known word and > parsing ( 1997 ) that merely assigning the most common tag to each known word and
# Nᴹ/Vg/J . # . NSg/I/C/Ddem+ R Nᴹ/Vg/J D NSg/I/J/R/Dq NSg/V/J NSg/V+ P Dq VPp/J NSg/V+ V/C # Nᴹ/Vg/J . # . NSg/I/C/Ddem+ R Nᴹ/Vg/J D NSg/I/J/R/Dq NSg/VB/J NSg/VB+ P Dq VPp/J NSg/VB+ VB/C
> the tag " proper noun " to all unknowns will approach 90 % accuracy because many > the tag " proper noun " to all unknowns will approach 90 % accuracy because many
# D NSg/V+ . NSg/J NSg/V+ . P NSg/I/J/C/Dq NPl/V3+ NPr/VX NSg/V+ # . N🅪Sg+ C/P NSg/I/J/Dq # D NSg/VB+ . NSg/J NSg/VB+ . P NSg/I/J/C/Dq NPl/V3+ NPr/VXB NSg/VB+ # . N🅪Sg+ C/P NSg/I/J/Dq
> words are unambiguous , and many others only rarely represent their less - common > words are unambiguous , and many others only rarely represent their less - common
# NPl/V3+ V J . V/C NSg/I/J/Dq NPl/V3+ J/R/C R V D$+ V/J/R/C/P . NSg/V/J # NPl/V3+ VB J . VB/C NSg/I/J/Dq NPl/V3+ J/R/C R VB D$+ VB/J/R/C/P . NSg/VB/J
> parts of speech . > parts of speech .
# NPl/V3 P N🅪Sg/V+ . # NPl/V3 P N🅪Sg/VB+ .
> >
# #
> CLAWS pioneered the field of HMM - based part of speech tagging but was quite > CLAWS pioneered the field of HMM - based part of speech tagging but was quite
# NPl/V3+ VP/J D NSg/V P V . VP/J NSg/V/J P N🅪Sg/V+ NSg/V NSg/C/P VPt R # NPl/V3+ VP/J D NSg/VB P VB . VP/J NSg/VB/J P N🅪Sg/VB+ NSg/Vg NSg/C/P VPt R
> expensive since it enumerated all possibilities . It sometimes had to resort to > expensive since it enumerated all possibilities . It sometimes had to resort to
# J C/P NPr/ISg+ VP/J NSg/I/J/C/Dq NPl+ . NPr/ISg+ R V P NSg/V P # J C/P NPr/ISg+ VP/J NSg/I/J/C/Dq NPl+ . NPr/ISg+ R VB P NSg/VB P
> backup methods when there were simply too many options ( the Brown Corpus > backup methods when there were simply too many options ( the Brown Corpus
# NSg/J NPl/V3+ NSg/I/C + NSg/VPt R W? NSg/I/J/Dq NPl/V3 . D+ NPr🅪/V/J+ NSg+ # NSg/J NPl/V3+ NSg/I/C + NSg/VPt R W? NSg/I/J/Dq NPl/V3 . D+ NPr🅪/VB/J+ NSg+
> contains a case with 17 ambiguous words in a row , and there are words such as > contains a case with 17 ambiguous words in a row , and there are words such as
# V3 D/P NPr🅪/V+ P # J NPl/V3 NPr/J/P D/P+ NSg/V+ . V/C + V NPl/V3+ NSg/I NSg/R # V3 D/P NPr🅪/VB+ P # J NPl/V3 NPr/J/P D/P+ NSg/VB+ . VB/C + VB NPl/V3+ NSg/I NSg/R
> " still " that can represent as many as 7 distinct parts of speech . > " still " that can represent as many as 7 distinct parts of speech .
# . NSg/V/J . NSg/I/C/Ddem+ NPr/VX V NSg/R NSg/I/J/Dq NSg/R # V/J NPl/V3 P N🅪Sg/V+ . # . NSg/VB/J . NSg/I/C/Ddem+ NPr/VXB VB NSg/R NSg/I/J/Dq NSg/R # VB/J NPl/V3 P N🅪Sg/VB+ .
> >
# #
> HMMs underlie the functioning of stochastic taggers and are used in various > HMMs underlie the functioning of stochastic taggers and are used in various
# ? V D Nᴹ/Vg/J+ P J NPl V/C V VPPtPp/J NPr/J/P J # ? VB D Nᴹ/Vg/J+ P J NPl VB/C VB VP/J NPr/J/P J
> algorithms one of the most widely used being the bi - directional inference > algorithms one of the most widely used being the bi - directional inference
# NPl+ NSg/I/V/J P D NSg/I/J/R/Dq R VPPtPp/J N🅪Sg/Vg/J/C D NSg/J . NSg/J NSg+ # NPl+ NSg/I/VB/J P D NSg/I/J/R/Dq R VP/J N🅪Sg/Vg/J/C D NSg/J . NSg/J NSg+
> algorithm . > algorithm .
# NSg . # NSg .
> >
@ -312,129 +312,129 @@
# NSg/J+ Nᴹ/Vg/J+ NPl/V3+ # NSg/J+ Nᴹ/Vg/J+ NPl/V3+
> >
# #
> In 1987 , Steven DeRose and Kenneth W. Church independently developed dynamic > In 1987 , Steven DeRose and Kenneth W. Church independently developed dynamic
# NPr/J/P # . NPr+ ? V/C NPr+ ? NPr🅪/V+ R VP/J NSg/J # NPr/J/P # . NPr+ ? VB/C NPr+ ? NPr🅪/VB+ R VP/J NSg/J
> programming algorithms to solve the same problem in vastly less time . Their > programming algorithms to solve the same problem in vastly less time . Their
# Nᴹ/Vg/J+ NPl+ P NSg/V D I/J NSg/J+ NPr/J/P R V/J/R/C/P N🅪Sg/V/J+ . D$+ # Nᴹ/Vg/J+ NPl+ P NSg/VB D I/J NSg/J+ NPr/J/P R VB/J/R/C/P N🅪Sg/VB/J+ . D$+
> methods were similar to the Viterbi algorithm known for some time in other > methods were similar to the Viterbi algorithm known for some time in other
# NPl/V3+ NSg/VPt NSg/J P D ? NSg VPp/J C/P I/J/R/Dq N🅪Sg/V/J+ NPr/J/P NSg/V/J # NPl/V3+ NSg/VPt NSg/J P D ? NSg VPp/J C/P I/J/R/Dq N🅪Sg/VB/J+ NPr/J/P NSg/VB/J
> fields . DeRose used a table of pairs , while Church used a table of triples and a > fields . DeRose used a table of pairs , while Church used a table of triples and a
# NPrPl/V3+ . ? VPPtPp/J D/P NSg/V P NPl/V3+ . NSg/V/C/P NPr🅪/V+ VPPtPp/J D/P NSg/V P NPl/V3 V/C D/P # NPrPl/V3+ . ? VP/J D/P NSg/VB P NPl/V3+ . NSg/VB/C/P NPr🅪/VB+ VP/J D/P NSg/VB P NPl/V3 VB/C D/P
> method of estimating the values for triples that were rare or nonexistent in the > method of estimating the values for triples that were rare or nonexistent in the
# NSg/V P Nᴹ/Vg/J D NPl/V3+ C/P NPl/V3 NSg/I/C/Ddem+ NSg/VPt NSg/V/J NPr/C NSg/J NPr/J/P D # NSg/VB P Nᴹ/Vg/J D NPl/V3+ C/P NPl/V3 NSg/I/C/Ddem+ NSg/VPt NSg/VB/J NPr/C NSg/J NPr/J/P D
> Brown Corpus ( an actual measurement of triple probabilities would require a much > Brown Corpus ( an actual measurement of triple probabilities would require a much
# NPr🅪/V/J NSg+ . D/P NSg/J N🅪Sg P NSg/V/J NPl+ VX NSg/V D/P NSg/I/J/R/Dq # NPr🅪/VB/J NSg+ . D/P NSg/J N🅪Sg P NSg/VB/J NPl+ VXB NSg/VB D/P NSg/I/J/R/Dq
> larger corpus ) . Both methods achieved an accuracy of over 95 % . DeRose's 1990 > larger corpus ) . Both methods achieved an accuracy of over 95 % . DeRose's 1990
# JC NSg+ . . I/C/Dq NPl/V3+ VP/J D/P N🅪Sg+ P NSg/J/P # . . ? # # JC NSg+ . . I/C/Dq NPl/V3+ VP/J D/P N🅪Sg+ P NSg/J/P # . . ? #
> dissertation at Brown University included analyses of the specific error types , > dissertation at Brown University included analyses of the specific error types ,
# NSg+ NSg/P NPr🅪/V/J NSg+ VP/J NPl/V3/Au/Br P D NSg/J NSg/V+ NPl/V3+ . # NSg+ NSg/P NPr🅪/VB/J NSg+ VP/J NPl/V3/Au/Br P D NSg/J NSg/VB+ NPl/V3+ .
> probabilities , and other related data , and replicated his work for Greek , where > probabilities , and other related data , and replicated his work for Greek , where
# NPl+ . V/C NSg/V/J J N🅪Pl+ . V/C VP/J ISg/D$+ N🅪Sg/V+ C/P NPr/V/J . NSg/C # NPl+ . VB/C NSg/VB/J J N🅪Pl+ . VB/C VP/J ISg/D$+ N🅪Sg/VB+ C/P NPr/VB/J . NSg/C
> it proved similarly effective . > it proved similarly effective .
# NPr/ISg+ VP/J R NSg/J . # NPr/ISg+ VP/J R NSg/J .
> >
# #
> These findings were surprisingly disruptive to the field of natural language > These findings were surprisingly disruptive to the field of natural language
# I/Ddem+ NSg+ NSg/VPt R J P D NSg/V P NSg/J+ N🅪Sg/V+ # I/Ddem+ NSg+ NSg/VPt R J P D NSg/VB P NSg/J+ N🅪Sg/VB+
> processing . The accuracy reported was higher than the typical accuracy of very > processing . The accuracy reported was higher than the typical accuracy of very
# Nᴹ/Vg/J+ . D+ N🅪Sg+ VP/J VPt NSg/JC C/P D NSg/J N🅪Sg P J/R # Nᴹ/Vg/J+ . D+ N🅪Sg+ VP/J VPt NSg/JC C/P D NSg/J N🅪Sg P J/R
> sophisticated algorithms that integrated part of speech choice with many higher > sophisticated algorithms that integrated part of speech choice with many higher
# VP/J+ NPl+ NSg/I/C/Ddem+ VP/J NSg/V/J P N🅪Sg/V+ N🅪Sg/J+ P NSg/I/J/Dq NSg/JC # VP/J+ NPl+ NSg/I/C/Ddem+ VP/J NSg/VB/J P N🅪Sg/VB+ N🅪Sg/J+ P NSg/I/J/Dq NSg/JC
> levels of linguistic analysis : syntax , morphology , semantics , and so on . CLAWS , > levels of linguistic analysis : syntax , morphology , semantics , and so on . CLAWS ,
# NPl/V3 P J N🅪Sg . Nᴹ+ . Nᴹ+ . NPl+ . V/C NSg/I/J/C J/P . NPl/V3+ . # NPl/V3 P J N🅪Sg . Nᴹ+ . Nᴹ+ . NPl+ . VB/C NSg/I/J/C J/P . NPl/V3+ .
> DeRose's and Church's methods did fail for some of the known cases where > DeRose's and Church's methods did fail for some of the known cases where
# ? V/C NSg$ NPl/V3+ VPt NSg/V/J C/P I/J/R/Dq P D VPp/J NPl/V3+ NSg/C # ? VB/C NSg$ NPl/V3+ VPt NSg/VB/J C/P I/J/R/Dq P D VPp/J NPl/V3+ NSg/C
> semantics is required , but those proved negligibly rare . This convinced many in > semantics is required , but those proved negligibly rare . This convinced many in
# NPl+ VL3 VP/J . NSg/C/P I/Ddem VP/J R NSg/V/J . I/Ddem VP/J NSg/I/J/Dq NPr/J/P # NPl+ VL3 VP/J . NSg/C/P I/Ddem VP/J R NSg/VB/J . I/Ddem VP/J NSg/I/J/Dq NPr/J/P
> the field that part - of - speech tagging could usefully be separated from the other > the field that part - of - speech tagging could usefully be separated from the other
# D+ NSg/V+ NSg/I/C/Ddem+ NSg/V/J+ . P . N🅪Sg/V+ NSg/V NSg/VX R NSg/VXB VP/J P D NSg/V/J # D+ NSg/VB+ NSg/I/C/Ddem+ NSg/VB/J+ . P . N🅪Sg/VB+ NSg/Vg NSg/VXB R NSg/VXB VP/J P D NSg/VB/J
> levels of processing ; this , in turn , simplified the theory and practice of > levels of processing ; this , in turn , simplified the theory and practice of
# NPl/V3 P Nᴹ/Vg/J+ . I/Ddem+ . NPr/J/P NSg/V . VP/J D N🅪Sg V/C NSg/V P # NPl/V3 P Nᴹ/Vg/J+ . I/Ddem+ . NPr/J/P NSg/VB . VP/J D N🅪Sg VB/C NSg/VB P
> computerized language analysis and encouraged researchers to find ways to > computerized language analysis and encouraged researchers to find ways to
# VP/J N🅪Sg/V+ N🅪Sg+ V/C VP/J NPl+ P NSg/V NPl+ P # VP/J N🅪Sg/VB+ N🅪Sg+ VB/C VP/J NPl+ P NSg/VB NPl+ P
> separate other pieces as well . Markov Models became the standard method for the > separate other pieces as well . Markov Models became the standard method for the
# NSg/V/J NSg/V/J NPl/V3+ NSg/R NSg/V/J . NPr NPl/V3+ VPt D NSg/J NSg/V+ C/P D # NSg/VB/J NSg/VB/J NPl/V3+ NSg/R NSg/VB/J . NPr NPl/V3+ VPt D NSg/J NSg/VB+ C/P D
> part - of - speech assignment . > part - of - speech assignment .
# NSg/V/J+ . P . N🅪Sg/V+ NSg+ . # NSg/VB/J+ . P . N🅪Sg/VB+ NSg+ .
> >
# #
> Unsupervised taggers > Unsupervised taggers
# V/J NPl # VB/J NPl
> >
# #
> The methods already discussed involve working from a pre - existing corpus to > The methods already discussed involve working from a pre - existing corpus to
# D+ NPl/V3+ W? VP/J V Nᴹ/Vg/J P D/P+ NSg/V/P+ . Nᴹ/Vg/J NSg+ P # D+ NPl/V3+ W? VP/J VB Nᴹ/Vg/J P D/P+ NSg/VB/P+ . Nᴹ/Vg/J NSg+ P
> learn tag probabilities . It is , however , also possible to bootstrap using > learn tag probabilities . It is , however , also possible to bootstrap using
# NSg/V NSg/V+ NPl+ . NPr/ISg+ VL3 . C . R/C NSg/J P NSg/V Nᴹ/Vg/J # NSg/VB NSg/VB+ NPl+ . NPr/ISg+ VL3 . C . R/C NSg/J P NSg/VB Nᴹ/Vg/J
> " unsupervised " tagging . Unsupervised tagging techniques use an untagged corpus > " unsupervised " tagging . Unsupervised tagging techniques use an untagged corpus
# . V/J . NSg/V . V/J NSg/V NPl+ N🅪Sg/VB D/P J NSg+ # . VB/J . NSg/Vg . VB/J NSg/Vg NPl+ N🅪Sg/VB D/P J NSg+
> for their training data and produce the tagset by induction . That is , they > for their training data and produce the tagset by induction . That is , they
# C/P D$+ Nᴹ/Vg/J+ N🅪Pl+ V/C Nᴹ/V D NSg NSg/J/P N🅪Sg . NSg/I/C/Ddem+ VL3 . IPl+ # C/P D$+ Nᴹ/Vg/J+ N🅪Pl+ VB/C Nᴹ/VB D NSg NSg/J/P N🅪Sg . NSg/I/C/Ddem+ VL3 . IPl+
> observe patterns in word use , and derive part - of - speech categories themselves . > observe patterns in word use , and derive part - of - speech categories themselves .
# NSg/V NPl/V3+ NPr/J/P NSg/V+ N🅪Sg/VB . V/C NSg/V NSg/V/J+ . P . N🅪Sg/V+ NPl+ IPl+ . # NSg/VB NPl/V3+ NPr/J/P NSg/VB+ N🅪Sg/VB . VB/C NSg/VB NSg/VB/J+ . P . N🅪Sg/VB+ NPl+ IPl+ .
> For example , statistics readily reveal that " the " , " a " , and " an " occur in > For example , statistics readily reveal that " the " , " a " , and " an " occur in
# C/P NSg/V+ . NPl/V3+ R NSg/V NSg/I/C/Ddem+ . D . . . D/P . . V/C . D/P . V NPr/J/P # C/P NSg/VB+ . NPl/V3+ R NSg/VB NSg/I/C/Ddem+ . D . . . D/P . . VB/C . D/P . VB NPr/J/P
> similar contexts , while " eat " occurs in very different ones . With sufficient > similar contexts , while " eat " occurs in very different ones . With sufficient
# NSg/J+ NPl/V3+ . NSg/V/C/P . V . V3 NPr/J/P J/R NSg/J+ NPl/V3+ . P J # NSg/J+ NPl/V3+ . NSg/VB/C/P . VB . V3 NPr/J/P J/R NSg/J+ NPl/V3+ . P J
> iteration , similarity classes of words emerge that are remarkably similar to > iteration , similarity classes of words emerge that are remarkably similar to
# N🅪Sg . NSg NPl/V3 P NPl/V3+ NSg/V NSg/I/C/Ddem+ V R NSg/J P # N🅪Sg . NSg NPl/V3 P NPl/V3+ NSg/VB NSg/I/C/Ddem+ VB R NSg/J P
> those human linguists would expect ; and the differences themselves sometimes > those human linguists would expect ; and the differences themselves sometimes
# I/Ddem NSg/V/J NPl+ VX V . V/C D NPl/V+ IPl+ R # I/Ddem NSg/VB/J NPl+ VXB VB . VB/C D NPl/VB+ IPl+ R
> suggest valuable new insights . > suggest valuable new insights .
# V NSg/J NSg/V/J NPl+ . # VB NSg/J NSg/VB/J NPl+ .
> >
# #
> These two categories can be further subdivided into rule - based , stochastic , and > These two categories can be further subdivided into rule - based , stochastic , and
# I/Ddem+ NSg+ NPl+ NPr/VX NSg/VXB V/J VP/J P NSg/V+ . VP/J . J . V/C # I/Ddem+ NSg+ NPl+ NPr/VXB NSg/VXB VB/J VP/J P NSg/VB+ . VP/J . J . VB/C
> neural approaches . > neural approaches .
# J NPl/V3+ . # J NPl/V3+ .
> >
# #
> Other taggers and methods > Other taggers and methods
# NSg/V/J NPl V/C NPl/V3+ # NSg/VB/J NPl VB/C NPl/V3+
> >
# #
> Some current major algorithms for part - of - speech tagging include the Viterbi > Some current major algorithms for part - of - speech tagging include the Viterbi
# I/J/R/Dq NSg/J NPr/V/J NPl C/P NSg/V/J+ . P . N🅪Sg/V+ NSg/V NSg/V D ? # I/J/R/Dq NSg/J NPr/VB/J NPl C/P NSg/VB/J+ . P . N🅪Sg/VB+ NSg/Vg NSg/VB D ?
> algorithm , Brill tagger , Constraint Grammar , and the Baum - Welch algorithm ( also > algorithm , Brill tagger , Constraint Grammar , and the Baum - Welch algorithm ( also
# NSg . NSg/J NSg . NSg+ N🅪Sg/V+ . V/C D NPr . ? NSg . R/C # NSg . NSg/J NSg . NSg+ N🅪Sg/VB+ . VB/C D NPr . ? NSg . R/C
> known as the forward - backward algorithm ) . Hidden Markov model and visible Markov > known as the forward - backward algorithm ) . Hidden Markov model and visible Markov
# VPp/J NSg/R D NSg/V/J . NSg/J NSg . . V/J NPr NSg/V/J+ V/C J NPr # VPp/J NSg/R D NSg/VB/J . NSg/J NSg . . VB/J NPr NSg/VB/J+ VB/C J NPr
> model taggers can both be implemented using the Viterbi algorithm . The > model taggers can both be implemented using the Viterbi algorithm . The
# NSg/V/J+ NPl NPr/VX I/C/Dq NSg/VXB VP/J Nᴹ/Vg/J D ? NSg . D+ # NSg/VB/J+ NPl NPr/VXB I/C/Dq NSg/VXB VP/J Nᴹ/Vg/J D ? NSg . D+
> rule - based Brill tagger is unusual in that it learns a set of rule patterns , and > rule - based Brill tagger is unusual in that it learns a set of rule patterns , and
# NSg/V+ . VP/J NSg/J NSg VL3 NSg/J NPr/J/P NSg/I/C/Ddem NPr/ISg+ NPl/V3 D/P NPr/V/J P NSg/V+ NPl/V3+ . V/C # NSg/VB+ . VP/J NSg/J NSg VL3 NSg/J NPr/J/P NSg/I/C/Ddem NPr/ISg+ NPl/V3 D/P NPr/VB/J P NSg/VB+ NPl/V3+ . VB/C
> then applies those patterns rather than optimizing a statistical quantity . > then applies those patterns rather than optimizing a statistical quantity .
# NSg/J/C V3 I/Ddem NPl/V3+ NPr/V/J/R C/P Nᴹ/Vg/J D/P J N🅪Sg+ . # NSg/J/C V3 I/Ddem NPl/V3+ NPr/VB/J/R C/P Nᴹ/Vg/J D/P J N🅪Sg+ .
> >
# #
> Many machine learning methods have also been applied to the problem of POS > Many machine learning methods have also been applied to the problem of POS
# NSg/I/J/Dq+ NSg/V+ Nᴹ/Vg/J+ NPl/V3+ NSg/VX R/C NSg/VPp VP/J P D NSg/J P NSg+ # NSg/I/J/Dq+ NSg/VB+ Nᴹ/Vg/J+ NPl/V3+ NSg/VXB R/C NSg/VPp VP/J P D NSg/J P NSg+
> tagging . Methods such as SVM , maximum entropy classifier , perceptron , and > tagging . Methods such as SVM , maximum entropy classifier , perceptron , and
# NSg/V . NPl/V3+ NSg/I NSg/R ? . NSg/J NSg NSg . NSg . V/C # NSg/Vg . NPl/V3+ NSg/I NSg/R ? . NSg/J NSg NSg . NSg . VB/C
> nearest - neighbor have all been tried , and most can achieve accuracy above > nearest - neighbor have all been tried , and most can achieve accuracy above
# JS . NSg/V/J/Am+ NSg/VX NSg/I/J/C/Dq NSg/VPp VP/J . V/C NSg/I/J/R/Dq NPr/VX V N🅪Sg+ NSg/J/P # JS . NSg/VB/J/Am+ NSg/VXB NSg/I/J/C/Dq NSg/VPp VP/J . VB/C NSg/I/J/R/Dq NPr/VXB VB N🅪Sg+ NSg/J/P
> 95 % . [ citation needed ] > 95 % . [ citation needed ]
# # . . . NSg+ VP/J . # # . . . NSg+ VP/J .
> >
# #
> A direct comparison of several methods is reported ( with references ) at the ACL > A direct comparison of several methods is reported ( with references ) at the ACL
# D/P V/J NSg P J/Dq+ NPl/V3+ VL3 VP/J . P NPl/V3+ . NSg/P D NSg # D/P VB/J NSg P J/Dq+ NPl/V3+ VL3 VP/J . P NPl/V3+ . NSg/P D NSg
> Wiki . This comparison uses the Penn tag set on some of the Penn Treebank data , > Wiki . This comparison uses the Penn tag set on some of the Penn Treebank data ,
# NSg/V+ . I/Ddem+ NSg+ NPl/V3 D+ NPr+ NSg/V+ NPr/V/J J/P I/J/R/Dq P D NPr+ ? N🅪Pl+ . # NSg/VB+ . I/Ddem+ NSg+ NPl/V3 D+ NPr+ NSg/VB+ NPr/VB/J J/P I/J/R/Dq P D NPr+ ? N🅪Pl+ .
> so the results are directly comparable . However , many significant taggers are > so the results are directly comparable . However , many significant taggers are
# NSg/I/J/C D NPl/V3+ V R/C NSg/J . C . NSg/I/J/Dq NSg/J NPl V # NSg/I/J/C D NPl/V3+ VB R/C NSg/J . C . NSg/I/J/Dq NSg/J NPl VB
> not included ( perhaps because of the labor involved in reconfiguring them for > not included ( perhaps because of the labor involved in reconfiguring them for
# NSg/C VP/J . NSg/R C/P P D NPr🅪/V/Am/Au+ VP/J NPr/J/P Nᴹ/Vg/J NSg/IPl+ C/P # NSg/C VP/J . NSg/R C/P P D NPr🅪/VB/Am/Au+ VP/J NPr/J/P Nᴹ/Vg/J NSg/IPl+ C/P
> this particular dataset ) . Thus , it should not be assumed that the results > this particular dataset ) . Thus , it should not be assumed that the results
# I/Ddem NSg/J NSg . . NSg . NPr/ISg+ VX NSg/C NSg/VXB VP/J NSg/I/C/Ddem D+ NPl/V3+ # I/Ddem NSg/J NSg . . NSg . NPr/ISg+ VXB NSg/C NSg/VXB VP/J NSg/I/C/Ddem D+ NPl/V3+
> reported here are the best that can be achieved with a given approach ; nor even > reported here are the best that can be achieved with a given approach ; nor even
# VP/J NSg/J/R V D NPr/VX/JS NSg/I/C/Ddem+ NPr/VX NSg/VXB VP/J P D/P+ NSg/V/J/P+ NSg/V+ . NSg/C NSg/V/J # VP/J NSg/J/R VB D NPr/VXB/JS NSg/I/C/Ddem+ NPr/VXB NSg/VXB VP/J P D/P+ NSg/VB/J/P+ NSg/VB+ . NSg/C NSg/VB/J
> the best that have been achieved with a given approach . > the best that have been achieved with a given approach .
# D+ NPr/VX/JS+ NSg/I/C/Ddem+ NSg/VX NSg/VPp VP/J P D/P+ NSg/V/J/P+ NSg/V+ . # D+ NPr/VXB/JS+ NSg/I/C/Ddem+ NSg/VXB NSg/VPp VP/J P D/P+ NSg/VB/J/P+ NSg/VB+ .
> >
# #
> In 2014 , a paper reporting using the structure regularization method for > In 2014 , a paper reporting using the structure regularization method for
# NPr/J/P # . D/P+ N🅪Sg/V/J+ Nᴹ/Vg/J Nᴹ/Vg/J D N🅪Sg/V+ N🅪Sg NSg/V C/P # NPr/J/P # . D/P+ N🅪Sg/VB/J+ Nᴹ/Vg/J Nᴹ/Vg/J D N🅪Sg/VB+ N🅪Sg NSg/VB C/P
> part - of - speech tagging , achieving 97.36 % on a standard benchmark dataset . > part - of - speech tagging , achieving 97.36 % on a standard benchmark dataset .
# NSg/V/J+ . P . N🅪Sg/V+ NSg/V . Nᴹ/Vg/J # . J/P D/P NSg/J NSg/V NSg . # NSg/VB/J+ . P . N🅪Sg/VB+ NSg/Vg . Nᴹ/Vg/J # . J/P D/P NSg/J NSg/VB NSg .

View file

@ -1,13 +1,13 @@
> Spell > Spell
# NSg/V # NSg/VB
> >
# #
> This document contains a list of words spelled correctly in some dialects of English , but not American English . This is designed to test the spelling suggestions we give for such mistakes . > This document contains a list of words spelled correctly in some dialects of English , but not American English . This is designed to test the spelling suggestions we give for such mistakes .
# I/Ddem+ NSg/V+ V3 D/P NSg/V P NPl/V3+ VP/J R NPr/J/P I/J/R/Dq NPl P NPr🅪/V/J+ . NSg/C/P NSg/C NPr/J NPr🅪/V/J+ . I/Ddem+ VL3 VP/J P NSg/V D+ Nᴹ/Vg/J+ NPl+ IPl+ NSg/V C/P NSg/I+ NPl/V3+ . # I/Ddem+ NSg/VB+ V3 D/P NSg/VB P NPl/V3+ VP/J R NPr/J/P I/J/R/Dq NPl P NPr🅪/VB/J+ . NSg/C/P NSg/C NPr/J NPr🅪/VB/J+ . I/Ddem+ VL3 VP/J P NSg/VB D+ Nᴹ/Vg/J+ NPl+ IPl+ NSg/VB C/P NSg/I+ NPl/V3+ .
> >
# #
> To achieve this , the filename of this file contains `.US , which will tell the snapshot generator to use the American dialect , rather than trying to use an automatically detected dialect . > To achieve this , the filename of this file contains `.US , which will tell the snapshot generator to use the American dialect , rather than trying to use an automatically detected dialect .
# P V I/Ddem+ . D NSg P I/Ddem NSg/V+ V3 Unlintable . I/C+ NPr/VX NPr/VB D NSg/V+ NSg P N🅪Sg/VB D NPr/J NSg+ . NPr/V/J/R C/P Nᴹ/Vg/J P N🅪Sg/VB D/P W? VP/J NSg+ . # P VB I/Ddem+ . D NSg P I/Ddem NSg/VB+ V3 Unlintable . I/C+ NPr/VXB NPr/VB D NSg/VB+ NSg P N🅪Sg/VB D NPr/J NSg+ . NPr/VB/J/R C/P Nᴹ/Vg/J P N🅪Sg/VB D/P W? VP/J NSg+ .
> >
# #
> Words > Words
@ -22,57 +22,57 @@
# R/Comm . # R/Comm .
> >
# #
> Centre . > Centre .
# NSg/V/Comm+ . # NSg/VB/Comm+ .
> >
# #
> Labelled . > Labelled .
# V/J/Comm . # VB/J/Comm .
> >
# #
> Flavour . > Flavour .
# N🅪Sg/V/Comm+ . # N🅪Sg/VB/Comm+ .
> >
# #
> Favoured . > Favoured .
# VP/J/Comm . # VP/J/Comm .
> >
# #
> Honour . > Honour .
# N🅪Sg/V/Comm+ . # N🅪Sg/VB/Comm+ .
> >
# #
> Grey . > Grey .
# NPr🅪/V/J/Comm . # NPr🅪/VB/J/Comm .
> >
# #
> Quarrelled . > Quarrelled .
# V/Comm . # VB/Comm .
> >
# #
> Quarrelling . > Quarrelling .
# Nᴹ/V/Comm . # Nᴹ/VB/Comm .
> >
# #
> Recognised . > Recognised .
# VP/J/Au/Br . # VP/J/Au/Br .
> >
# #
> Neighbour . > Neighbour .
# NSg/V/J/Comm+ . # NSg/VB/J/Comm+ .
> >
# #
> Neighbouring . > Neighbouring .
# Nᴹ/Vg/J/Comm . # Nᴹ/Vg/J/Comm .
> >
# #
> Clamour . > Clamour .
# NSg/V/Comm . # NSg/VB/Comm .
> >
# #
> Theatre . > Theatre .
# N🅪Sg/Comm+ . # N🅪Sg/Comm+ .
> >
# #
> Analyse . > Analyse .
# V/Au/Br . # VB/Au/Br .

View file

@ -1,26 +1,26 @@
> Spell > Spell
# NSg/V # NSg/VB
> >
# #
> This document contains example sentences with misspelled words that we want to test the spell checker on . > This document contains example sentences with misspelled words that we want to test the spell checker on .
# I/Ddem+ NSg/V+ V3 NSg/V+ NPl/V3+ P VP/J NPl/V3+ NSg/I/C/Ddem+ IPl+ NSg/V P NSg/V D NSg/V NSg/V J/P . # I/Ddem+ NSg/VB+ V3 NSg/VB+ NPl/V3+ P VP/J NPl/V3+ NSg/I/C/Ddem+ IPl+ NSg/VB P NSg/VB D NSg/VB NSg/VB J/P .
> >
# #
> Example Sentences > Example Sentences
# NSg/V+ NPl/V3+ # NSg/VB+ NPl/V3+
> >
# #
> My favourite color is blu . > My favourite color is blu .
# D$+ NSg/V/J/Comm+ N🅪Sg/V/J/Am+ VL3 W? . # D$+ NSg/VB/J/Comm+ N🅪Sg/VB/J/Am+ VL3 W? .
> I must defend my honour ! > I must defend my honour !
# ISg+ NSg/V NSg/V D$+ N🅪Sg/V/Comm+ . # ISg+ NSg/VB NSg/VB D$+ N🅪Sg/VB/Comm+ .
> I recognize that you recognise me . > I recognize that you recognise me .
# ISg+ V NSg/I/C/Ddem ISgPl+ V/Au/Br NPr/ISg+ . # ISg+ VB NSg/I/C/Ddem ISgPl+ VB/Au/Br NPr/ISg+ .
> I analyze how you infantilize me . > I analyze how you infantilize me .
# ISg+ V NSg/C ISgPl+ V NPr/ISg+ . # ISg+ VB NSg/C ISgPl+ VB NPr/ISg+ .
> I analyse how you infantilise me . > I analyse how you infantilise me .
# ISg+ V/Au/Br NSg/C ISgPl+ ? NPr/ISg+ . # ISg+ VB/Au/Br NSg/C ISgPl+ ? NPr/ISg+ .
> Careful , traveller ! > Careful , traveller !
# J . NSg/Comm+ . # J . NSg/Comm+ .
> At the centre of the theatre I dropped a litre of coke . > At the centre of the theatre I dropped a litre of coke .
# NSg/P D NSg/V/Comm P D+ N🅪Sg/Comm+ ISg+ V/J D/P NSg/Comm P NPr🅪/V+ . # NSg/P D NSg/VB/Comm P D+ N🅪Sg/Comm+ ISg+ VP/J D/P NSg/Comm P NPr🅪/VB+ .

View file

@ -3,16 +3,16 @@
> >
# #
> This documents tests that different forms / variations of swears are tagged as such . > This documents tests that different forms / variations of swears are tagged as such .
# I/Ddem+ NPl/V3+ NPl/V3+ NSg/I/C/Ddem NSg/J+ NPl/V3+ . W? P NPl/V3 V V/J NSg/R NSg/I . # I/Ddem+ NPl/V3+ NPl/V3+ NSg/I/C/Ddem NSg/J+ NPl/V3+ . W? P NPl/V3 VB VP/J NSg/R NSg/I .
> >
# #
> Examples > Examples
# NPl/V3+ # NPl/V3+
> >
# #
> One turd , two turds . > One turd , two turds .
# NSg/I/V/J+ NSg/V+/B . NSg NPl/V3/B . # NSg/I/VB/J+ NSg/VB+/B . NSg NPl/V3/B .
> >
# #
> I fart , you're farting , he farts , she farted . > I fart , you're farting , he farts , she farted .
# ISg+ NSg/V/B . + Nᴹ/Vg/J/B . NPr/ISg+ NPl/V3/B . ISg+ VP/J/B . # ISg+ NSg/VB/B . + Nᴹ/Vg/J/B . NPr/ISg+ NPl/V3/B . ISg+ VP/J/B .

File diff suppressed because it is too large Load diff

View file

@ -1,74 +1,74 @@
> " This " and " that " are common and fulfill multiple purposes in everyday English . > " This " and " that " are common and fulfill multiple purposes in everyday English .
# . I/Ddem+ . V/C . NSg/I/C/Ddem+ . V NSg/V/J V/C V/NoAm NSg/J/Dq NPl/V3 NPr/J/P NSg/J+ NPr🅪/V/J+ . # . I/Ddem+ . VB/C . NSg/I/C/Ddem+ . VB NSg/VB/J VB/C VB/NoAm NSg/J/Dq NPl/V3 NPr/J/P NSg/J+ NPr🅪/VB/J+ .
> As such , disambiguating them is necessary . > As such , disambiguating them is necessary .
# NSg/R NSg/I . Nᴹ/Vg/J NSg/IPl+ VL3 NSg/J . # NSg/R NSg/I . Nᴹ/Vg/J NSg/IPl+ VL3 NSg/J .
> >
# #
> This document contains various sentences that use " this " , " that " , " these " , and > This document contains various sentences that use " this " , " that " , " these " , and
# I/Ddem+ NSg/V+ V3 J+ NPl/V3+ NSg/I/C/Ddem+ N🅪Sg/VB . I/Ddem+ . . . NSg/I/C/Ddem+ . . . I/Ddem . . V/C # I/Ddem+ NSg/VB+ V3 J+ NPl/V3+ NSg/I/C/Ddem+ N🅪Sg/VB . I/Ddem+ . . . NSg/I/C/Ddem+ . . . I/Ddem . . VB/C
> " those " in different contexts with a lot of edge cases . > " those " in different contexts with a lot of edge cases .
# . I/Ddem . NPr/J/P NSg/J NPl/V3 P D/P NPr/V P NSg/V+ NPl/V3+ . # . I/Ddem . NPr/J/P NSg/J NPl/V3 P D/P NPr/VB P NSg/VB+ NPl/V3+ .
> >
# #
> Examples > Examples
# NPl/V3+ # NPl/V3+
> >
# #
> This triangle is nice . > This triangle is nice .
# I/Ddem NSg VL3 NPr/V/J . # I/Ddem NSg VL3 NPr/VB/J .
> This is nice . > This is nice .
# I/Ddem+ VL3 NPr/V/J . # I/Ddem+ VL3 NPr/VB/J .
> That triangle is nice . > That triangle is nice .
# NSg/I/C/Ddem+ NSg VL3 NPr/V/J . # NSg/I/C/Ddem+ NSg VL3 NPr/VB/J .
> That is nice . > That is nice .
# NSg/I/C/Ddem+ VL3 NPr/V/J . # NSg/I/C/Ddem+ VL3 NPr/VB/J .
> These triangles are nice . > These triangles are nice .
# I/Ddem NPl V NPr/V/J . # I/Ddem NPl VB NPr/VB/J .
> These are nice . > These are nice .
# I/Ddem+ V NPr/V/J . # I/Ddem+ VB NPr/VB/J .
> Those triangles are nice . > Those triangles are nice .
# I/Ddem NPl V NPr/V/J . # I/Ddem NPl VB NPr/VB/J .
> Those are nice . > Those are nice .
# I/Ddem+ V NPr/V/J . # I/Ddem+ VB NPr/VB/J .
> >
# #
> This massage is nice . > This massage is nice .
# I/Ddem+ NSg/V+ VL3 NPr/V/J . # I/Ddem+ NSg/VB+ VL3 NPr/VB/J .
> That massage is nice . > That massage is nice .
# NSg/I/C/Ddem NSg/V+ VL3 NPr/V/J . # NSg/I/C/Ddem NSg/VB+ VL3 NPr/VB/J .
> These massages are nice . > These massages are nice .
# I/Ddem+ NPl/V3+ V NPr/V/J . # I/Ddem+ NPl/V3+ VB NPr/VB/J .
> Those massages are nice . > Those massages are nice .
# I/Ddem+ NPl/V3+ V NPr/V/J . # I/Ddem+ NPl/V3+ VB NPr/VB/J .
> This massages well . > This massages well .
# I/Ddem+ NPl/V3+ NSg/V/J . # I/Ddem+ NPl/V3+ NSg/VB/J .
> That massages well . > That massages well .
# NSg/I/C/Ddem+ NPl/V3+ NSg/V/J . # NSg/I/C/Ddem+ NPl/V3+ NSg/VB/J .
> These massage well . > These massage well .
# I/Ddem+ NSg/V+ NSg/V/J . # I/Ddem+ NSg/VB+ NSg/VB/J .
> Those massage well . > Those massage well .
# I/Ddem+ NSg/V+ NSg/V/J . # I/Ddem+ NSg/VB+ NSg/VB/J .
> >
# #
> That could be a solution . > That could be a solution .
# NSg/I/C/Ddem+ NSg/VX NSg/VXB D/P+ NSg+ . # NSg/I/C/Ddem+ NSg/VXB NSg/VXB D/P+ NSg+ .
> Find all candidates that could be a solution . > Find all candidates that could be a solution .
# NSg/V NSg/I/J/C/Dq+ NPl/V3+ NSg/I/C/Ddem+ NSg/VX NSg/VXB D/P+ NSg+ . # NSg/VB NSg/I/J/C/Dq+ NPl/V3+ NSg/I/C/Ddem+ NSg/VXB NSg/VXB D/P+ NSg+ .
> >
# #
> This is all that I have . > This is all that I have .
# I/Ddem+ VL3 NSg/I/J/C/Dq NSg/I/C/Ddem ISg+ NSg/VX . # I/Ddem+ VL3 NSg/I/J/C/Dq NSg/I/C/Ddem ISg+ NSg/VXB .
> This is all that solutions can do . > This is all that solutions can do .
# I/Ddem+ VL3 NSg/I/J/C/Dq NSg/I/C/Ddem NPl+ NPr/VX NSg/VX . # I/Ddem+ VL3 NSg/I/J/C/Dq NSg/I/C/Ddem NPl+ NPr/VXB NSg/VXB .
> That solution can do . > That solution can do .
# NSg/I/C/Ddem NSg+ NPr/VX NSg/VX . # NSg/I/C/Ddem NSg+ NPr/VXB NSg/VXB .
> >
# #
> We can do this ! > We can do this !
# IPl+ NPr/VX NSg/VX I/Ddem+ . # IPl+ NPr/VXB NSg/VXB I/Ddem+ .
> I can do this and that . > I can do this and that .
# ISg+ NPr/VX NSg/VX I/Ddem V/C NSg/I/C/Ddem+ . # ISg+ NPr/VXB NSg/VXB I/Ddem VB/C NSg/I/C/Ddem+ .
> >
# #
> We unite to stand united in unity . > We unite to stand united in unity .
# IPl+ NSg/V P NSg/V VP/J NPr/J/P NSg+ . # IPl+ NSg/VB P NSg/VB VP/J NPr/J/P Nᴹ+ .