Please use this identifier to cite or link to this item:
https://publication.npru.ac.th/jspui/handle/123456789/2327
Title: | Development of a Prototype of the Speech-to-Text Web Application for Electronic Meetings for the Research and Development Institute of Nakhon Pathom Rajabhat University |
Other Titles: | การพัฒนาต้นแบบเว็บแอปพลิเคชันแปลงเสียงเป็นข้อความสำหรับการประชุมอิเล็กทรอนิกส์ สถาบันวิจัยและพัฒนา มหาวิทยาลัยราชภัฏนครปฐม |
Authors: | Saliew, Pimonpan Buhuatchai, Jirundon Thammasiri, Dech |
Keywords: | Web Application Speech-to-Text Electronic Meetings Large Language Models: LLMs Whisper |
Issue Date: | 21-Aug-2025 |
Publisher: | The 17th NPRU National Academic Conference Nakhon Pathom Rajabhat University |
Series/Report no.: | Proceedings of the 17th NPRU National Academic Conference;426-438 |
Abstract: | The objectives of this research are to 1) develop a prototype of the speech-to-text web application for electronic meetings for the Research and Development Institute of Nakhon Pathom Rajabhat University. 2) To compare the accuracy of models for the prototype of the speech-to-text web application for electronic meetings for the Research and Development Institute of Nakhon Pathom Rajabhat University. Due to the previous problem, the researcher had to spend a long time transcribing the meeting summary audio files and did not want to process the audio files in the cloud to maintain confidentiality within the organization. The researchers therefore developed a prototype of a speech-to-text web application using the large language model Whisper using PHP and Python to use the generated voice data of 3 messages with lengths of 10 characters, 42 characters, and 121 characters. And using a personal computer, the experiment was repeated 3 times with 6 sizes of Whisper models: 1) Tiny 2) Base 3) Small 4) Medium 5) Large, and 6) Turbo. The research results found that the Turbo Whisper model has an average accuracy of 67.81% and an average processing time of 38.54 seconds, making it the most suitable among all models. For a character length of 10 characters, the average accuracy is 100%, with an average processing time of 28.72 seconds. For a character length of 42 characters, the average accuracy is 57.14%, with an average processing time of 30.41 seconds. For a character length of 121 characters, the average accuracy is 46.28%, with an average processing time of 56.50 seconds. The researchers then applied the Turbo Whisper model and improved the post-processing results of Thai speech-to-text conversion to achieve better accuracy. |
URI: | https://publication.npru.ac.th/jspui/handle/123456789/2327 |
ISBN: | 978-974-7063-48-6 |
Appears in Collections: | Proceedings of the 17th NPRU National Academic Conference |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.