Method for Developing Phonetically Rich and Balanced Lexical
The various embodiments of the present invention provide a method for developing a phonetically rich and balanced lexical corpus. the method comprises accumulating a plurality of sentences from a target language through a plurality of data sources. The sentences collected from the web-source are raw as it contains unstructured data such as duplication of sentences as well as the presence of alien words, special characters, and boilerplates from the extracted sentences. At least one sentence is selected from the accumulated sentences. The selected sentence is phonetically rich and balanced with relatively small database size. A plurality of selected sentences is evaluated for creating a balanced database. The result of the said method is the phonetically rich and balanced lexical corpus that is only a fraction in size of the initial lexical database.
Orang yang boleh dihubungi untuk tawaran ini:
Lee Ching Shya
Pegawai Perniagaan UMCIC Universiti Malaya
e-mel: leecs@um.edu.my
Tel: +603-7967-7351/7352