Statistical Properties of Turkish Words GOKHAN DALKILIC

Подробная информация о книге «Statistical Properties of Turkish Words GOKHAN DALKILIC»

GOKHAN DALKILIC - «Statistical Properties of Turkish Words»

О книге

For speech recognition, OCR, etc. determination of the structural properties of a natural language is essential. These properties can be analyzed under two different categories; morphological and statistical analysis. For statistical analysis, a corpus which is a representative sample of the natural language is needed. Word n-gram frequencies of that corpus can be determined by using suitable algorithms and missing n-grams can be estimated by using smoothing techniques. In this study, in order to compare and apply smoothing techniques to Turkish, a corpus named TurCo was created. In order to calculate word n-grams, different algorithms were tested. After finding n-gram word lists, their characteristics were analyzed. For generalization, Zipf''s Law was applied, and to increase the accuracy in Zipf''s Law, Mandelbrot Law was applied by finding the appropriate constants of Mandelbrot. As the corpus could not be big enough to represent all of the language, smoothing... Это и многое другое вы найдете в книге Statistical Properties of Turkish Words (GOKHAN DALKILIC)

Полное название книги GOKHAN DALKILIC Statistical Properties of Turkish Words
Автор GOKHAN DALKILIC
Ключевые слова компьютерная литература, основы информатики общие работы
Категории Компьютеры и Internet
ISBN 9783838351582
Издательство
Год 2010
Название транслитом statistical-properties-of-turkish-words-gokhan-dalkilic
Название с ошибочной раскладкой statistical properties of turkish words gokhan dalkilic