Automating U6 Dialogue Matching

Friday, August 23 at 10:18 PM

 The Whisper AI program running in the Pinokio installer, automatically transcribing the Ultima VI FM Towns audio files into subtitles text.

The Whisper AI program running in the Pinokio installer, automatically transcribing the Ultima VI FM Towns audio files into subtitles text.

TECHNICAL POST

Automating U6 Dialogue Matching in my #AgeOfSingularity Ultima VI Fan Recreation Project

I've been working on the Age Of Singularity Ultima VI fan recreation project, and one of the key challenges now is matching 6,500 transcribed audio files to the correct keyword response pairs in the U6 dialogue SQL database. This is crucial for ensuring the game’s characters speak the right lines at the right moments, but I just did it in a day using ChatGPT-4o, Whisper, and Python.

Here’s how we tackled it:

Transcribing Audio Files with Whisper

I used the Pinokio installer to easily locally install Whisper, an advanced FOSS speech-to-text model, to transcribe the dialogue from the over 6,500 U6 FM Towns audio files into text. These transcriptions were saved as .srt files, containing both the text and timestamps.

Extracting Relevant Text

From the .srt files, we extracted just the dialogue text, ignoring the timestamps and sequence numbers, to prepare it for matching with the corresponding entries in the SQL database. We may use the timestamps later to play audio as text displays, but for now it's unneeded.

Fuzzy Matching with Ratcliff/Obershelp Pattern Recognition

To match the transcriptions with the correct responses in the SQL database, we made a Python script that used a fuzzy matching technique called Ratcliff/Obershelp Pattern Recognition. This algorithm is perfect for finding the best matching subsequence within a larger string, which is exactly what we needed.

Implemented through the fuzz.partial_ratio function in Python, this technique allowed us to compare the transcription to the dialogue responses and identify matches even when the transcription was only a part of a larger response.

We set a threshold (e.g., 80%) to determine how close the match needed to be before it was considered valid.

Updating the Database

Once a match was found, we automatically updated the AudioFile column in our SQL database with the corresponding file number, ensuring that each response was correctly linked to the appropriate audio file, the audiofiles all being stored in an audio directory in the persistent data path of the Unity project. This also means they can be easily replaced with other recordings if desired, as long as the naming scheme matches.

Also remember these FM Towns audio files will NOT be included in Age Of Singularity by default, but setting up the framework to be functional, will allow easily adding voice lines.

In the Python script we added logging at each step so we could track the process and ensure everything was working as expected.

Scaling Up

After testing with a small batch of files, we scaled up the process to handle all characters, ensuring that the entire dialogue database was updated accurately.

This automated approach has just saved like three months of tedious manual work by ensuring the audio files are correctly linked to the in-game dialogue. Now all I gotta do is update my Dialogue C Sharp script in Visual Studio Code in Unity, so it loads audio files as needed and plays them, then we have a system on part with Skyrim, which can be used in Age Of Singularity and in any other game I'm working on.

Again, all this enabled by machine learning tools like Whisper and Chat GPT-4o. It’s another step forward in bringing the world of Ultima VI to life in Age Of Singularity.

A command line Python script automatically updating the SQL database with the correct audio file numbers for each dialogue response..

A command line Python script automatically updating the SQL database with the correct audio file numbers for each dialogue response.

Asking ChatGPT-4o for help with the dialogue matching process.

Asking ChatGPT-4o for help with the dialogue matching process.

ChatGPT-4o providing suggestions for dialogue matching such as using the FOSS (free open-source software) Whisper program.

ChatGPT-4o providing suggestions for dialogue matching such as using the FOSS (free open-source software) Whisper program.

A Transcribe-And-Match Python script open in Visual Studio Code, showing the matching process in action in the SQL database..

A Transcribe-And-Match Python script open in Visual Studio Code, showing the matching process in action in the SQL database.

Lord British's SQL dialogue row entries in the database, showing the AudioFile column updated with the correct audio file numbers automatically (mostly).

Lord British's SQL dialogue row entries in the database, showing the AudioFile column updated with the correct audio file numbers automatically (mostly).

Here's the original fb UDIC post.

You've reached the end of this devlog entry.