• Login
    View Item 
    •   DSpace Home
    • FACULTY OF SCIENCE AND COMPUTER
    • COMPUTER SCIENCE (ILMU KOMPUTER)
    • DISSERTATIONS AND THESES (CS)
    • View Item
    •   DSpace Home
    • FACULTY OF SCIENCE AND COMPUTER
    • COMPUTER SCIENCE (ILMU KOMPUTER)
    • DISSERTATIONS AND THESES (CS)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Studi Pengaruh Tingkat Interferensi Terhadap Performa Transkripsi Model Wav2Vec2-Large-XLSR-Indonesian

    Thumbnail
    View/Open
    Abstract and Table of Content (221.6Kb)
    Chapter 1: Introduction (134.6Kb)
    Chapter 2: Literature Review (191.0Kb)
    Chapter 3: Method (323.8Kb)
    Chapter 4: Result and Discussions (1.445Mb)
    Chapter 5: Conclusion (100.9Kb)
    References (113.2Kb)
    Cover and Legal (406.0Kb)
    Date
    2025-08-10
    Author
    Ephraim, David
    Metadata
    Show full item record
    Abstract
    In general, meetings require a Minutes of Meeting (MoM) to record the main discussion points. Manually creating MoM can be time-consuming and labor-intensive, but it can be assisted by Automatic Speech Recognition (ASR) technology to convert recorded conversations into text or speech-to-text (STT). However, the use of this technology often presents confidentiality issues due to reliance on third-party services. Another challenge arises from the presence of interfering voices that are inevitably mixed with the main conversation. Therefore, this study investigates the extent to which interference affects the performance of ASR and MoM generation conducted locally without using third-party services. The ASR model used in this study is Wav2Vec2 XLSR Indonesian, which was fine-tuned using the Few-shot Learning Evaluation of Universal Representations of Speech (FLEURS) dataset. Interfering sounds were generated in several scenarios—ideal, whisper, equal RMS, and overpower—to be added into the system. Model performance was then evaluated using the Word Error Rate (WER) metric. Simulation results show that the higher the audio level of interference, the lower the transcription model's performance. However, summarization results for MoM generation using a Large Language Model (LLM) show that the whisper scenario with audio levels up to $-40$ dBFS yields performance comparable to the ideal condition (no interference), as indicated by the BERTScore metric. This demonstrates that LLMs are capable of improving less accurate STT transcription outputs.
    URI
    https://library.universitaspertamina.ac.id//xmlui/handle/123456789/14574
    Collections
    • DISSERTATIONS AND THESES (CS)

    DSpace software copyright © 2002-2015  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    @mire NV
     

     

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    DSpace software copyright © 2002-2015  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    @mire NV