A suffix based part-of-speech tagger for Turkish

Dinçer, Bekir Taner; Karaoğlan, Bahar; Kısla, Tarık

Göster/Aç

Tam Metin / Full Text (562.8Kb)

Erişim

info:eu-repo/semantics/openAccess

Tarih

2008

Yazar

Dinçer, Bekir Taner
Karaoğlan, Bahar
Kısla, Tarık

Üst veri

Tüm öğe kaydını göster

Özet

In this paper, we present a stochastic part-of-speech tagger for Turkish. The tagger is primarily developed for information retrieval purposes, but it can as well serve as a light-weight PoS tagger for other purposes. The tagger uses a well-established Hidden Markov model of the language with a closed lexicon that consists of fixed number of letters from the word endings. We have considered seven different lengths of word endings against 30 training corpus sizes. Best-case accuracy obtained is 90.2% with 5 characters. The main contribution of this paper is to present a way of constructing a closed vocabulary for part-of-speech tagging effort that can be useful for highly inflected languages like Turkish, Finnish, Hungarian, Estonian, and Czech.

Kaynak

Proceedings of the Fifth International Conference on Information Technology: New Generations

Bağlantı

https://doi.org/10.1109/ITNG.2008.103
https://hdl.handle.net/20.500.12809/5020

Koleksiyonlar

Bilgisayar Mühendisliği Bölümü Koleksiyonu [103]
Scopus İndeksli Yayınlar Koleksiyonu [6221]
WoS İndeksli Yayınlar Koleksiyonu [6500]