Malik Talha

Journey into GSoC 2023

GSoC 2023 Journey: Week 13 Report

26 August 2023

4 Minutes

My contribution details and experiences during the thirteenth week of coding period of Google Summer of Code (GSoC) 2023.

Introduction

Welcome to my weekly report documenting my journey during Google Summer of Code 2023 with the Linux Foundation! In this project, I am working on enhancing the existing speech-to-text feature of Automotive Grade Linux (AGL) by introducing a Natural Language Intent engine and implementing software daemons/controllers to execute the extracted intent. This endeavor aims to significantly improve the user experience and functionality of the speech-to-text feature in automotive environments. Throughout this report, I will share my progress, challenges faced, and achievements made as I contribute to the development of AGL and pave the way for more intuitive and intelligent voice interactions in automobiles.

Summary of the week

During this week, I achieved significant progress in enhancing the capabilities of the Voice Agent gRPC Service. The main accomplishments included a comprehensive overhaul of the previous GRPC server to establish the new Voice Agent gRPC Service, equipped with various advanced functionalities. Additionally, successful integration of the Python Kuksa client into the voice agent service marked another milestone achieved.

Tasks completed

  • Comprehensive overhaul of the previous GRPC server to establish the new Voice Agent gRPC Service with enhanced capabilities:
    • Wake Word Mode: The service now waits for a designated wake word. Upon detection, it promptly signals the client to initiate command execution. This enables a seamless and intuitive user experience.
    • Auto Mode: A sophisticated custom Voice Activity Detection (VAD) system was implemented, enabling the service to perform continuous recording and processing of user commands exclusively during the user's speech.
    • Manual Mode: A manual recording mode was incorporated into the service's functionality. It records commands based on client cues "Start" and "Stop," providing users with fine-tuned control over the execution of voice commands.
  • Successfully integrated the Python Kuksa client into the voice agent service. The integration of the Kuksa client into the voice agent service will allow us to interface with various vehicle-related functionalities.

You can find the updated Voice Agent gRPC Service here.

Tasks leftover

No tasks were leftover this week.

Next steps

  • Utilize the Python Kuksa client interfaces to execute the following intents by implementing mapping interfaces:
    • Volume Controls: Implement functionality within the voice agent service to control audio volume seamlessly using voice commands. This will contribute to a more user-centric and hands-free driving experience.
    • HVAC Controls: Leverage the Kuksa Python client interfaces to enable users to control the vehicle's HVAC system through voice commands. This feature adds convenience and comfort to the driving experience.
  • Add comprehensive documentation for the new voice agent service: Develop detailed and user-friendly documentation that explains the setup, configuration, and usage of the enhanced voice agent service. Clear documentation will facilitate seamless adoption and integration of the service by both developers and end-users.

Conclusion

Overall, this week was productive, and I am satisfied with the progress made in achieving the goals outlined for the week. I am excited to continue my GSoC journey and further enhance the speech-to-text feature in Automotive Grade Linux.

Resources

There were no resources found attached to this post.

TwitterGitHubLinkedIn

© 2023 Malik Talha, All rights reserved.