top of page

Method for Developing Phonetically Rich and Balanced Lexical


The various embodiments of the present invention provide a method for developing a phonetically rich and balanced lexical corpus. the method comprises accumulating a plurality of sentences from a target language through a plurality of data sources. The sentences collected from the web-source are raw as it contains unstructured data such as duplication of sentences as well as the presence of alien words, special characters, and boilerplates from the extracted sentences. At least one sentence is selected from the accumulated sentences. The selected sentence is phonetically rich and balanced with relatively small database size. A plurality of selected sentences is evaluated for creating a balanced database. The result of the said method is the phonetically rich and balanced lexical corpus that is only a fraction in size of the initial lexical database.

Mega - Trends

Information and Communication Technologies

Technology Readiness Level (TRL)


Patent Number

PI 2018702280

Get the technology fact sheet here:

Contact person for this offer:

ChM Dr. Lee Ching Shya, PhD (Dual), RTTP

Technology Transfer Manager


Tel: +603-7967-7351/ 013-2250151


You have a question to know about technologies or cooperations? 
Please Contact Us:

+603 - 7967 7351 / 013-2250151

bottom of page