feat: Add SiliconFlow TTS API support with custom base URL and model selection

This commit adds comprehensive support for using SiliconFlow's TTS API as an alternative to OpenAI, including: Features: - Configurable API base URL (Settings > API Base URL) - TTS model selection dropdown (CosyVoice2-0.5B, OpenAI compatible models) - Dynamic voice options based on selected model - Editable voice dropdown (Combobox) supporting custom voice IDs - Automatic voice formatting for SiliconFlow (model:voice format) - Debug logging for troubleshooting API calls - Warning for incorrect base URL format Changes: - utils/settings_manager.py: Added api_base_url and tts_model settings - utils/text_to_mic.py: - Added get_available_tts_models() for model options - Added get_siliconflow_voices() for SiliconFlow voices - Added change_api_base_url() method with validation - Added TTS model dropdown in GUI - Converted voice dropdown to Combobox for typing support - Added on_voice_exit() for validation - Updated API call to use selected model and formatted voice - text-to-mic-cli.py: Added OPENAI_API_BASE_URL and OPENAI_TTS_MODEL env var support - Readme.md: Updated documentation with SiliconFlow usage instructions Supported Models: - FunAudioLLM/CosyVoice2-0.5B (SiliconFlow - multi-language, emotional) - tts-1, tts-1-hd (OpenAI compatible) - gpt-4o-mini-tts (OpenAI default) SiliconFlow Voices (CosyVoice2-0.5B): - Male: alex, benjamin, charles, david - Female: anna, bella, claire, diana - Custom voices via voice ID entry Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-27 17:12:34 +08:00
parent 20358adafb
commit 92d20e59e9
4 changed files with 365 additions and 30 deletions
--- a/Readme.md
+++ b/Readme.md
@@ -65,12 +65,29 @@ https://platform.openai.com/docs/quickstart/account-setup

 6. You can change the API key at any time under the 'Settings' menu.

+7. (Optional) You can also configure a custom API Base URL under 'Settings > API Base URL' to use compatible API endpoints other than OpenAI. For example, to use SiliconFlow's API, set the base URL to `https://api.siliconflow.cn/v1` (Note: use just the base URL, NOT the full endpoint path). Leave empty to use OpenAI's default endpoint.
+
+8. (Optional) You can select different TTS models from the "TTS Model" dropdown. When using SiliconFlow, the CosyVoice2-0.5B model will be available with 8 built-in voices (alex, anna, bella, benjamin, charles, claire, david, diana). The voice options will update automatically based on the selected model.
+
+9. (Optional) The Voice dropdown supports both selecting from the list and typing custom voice IDs. Click on the voice field to type a custom voice ID (e.g., for SiliconFlow custom voices like `speech:your-voice-name:xxxx`). This is useful if you've uploaded custom voice samples to SiliconFlow.
+
 This tool was brought to you by Scorchsoft - We build custom apps to your requirements. Please contact us if you have a requirement for a custom app project.

 ## Advanced Tips


-### 1. ChatGPT AI Manipulation
+### 1. Custom Voices with SiliconFlow
+
+When using SiliconFlow's API, you can upload your own voice samples and use them by entering the custom voice ID in the Voice dropdown. To upload a custom voice:
+
+1. Upload your voice sample to SiliconFlow (see their documentation)
+2. You'll receive a voice ID like: `speech:your-voice-name:cm04pf7az00061413w7kz5qxs:mjtkgbyuunvtybnsvbxd`
+3. Click on the Voice dropdown and type/paste this custom voice ID
+4. The app will use this custom voice for TTS
+
+For more information on uploading custom voices, see: [SiliconFlow Text-to-Speech Documentation](https://docs.siliconflow.cn/en/userguide/capabilities/text-to-speech)
+
+### 2. ChatGPT AI Manipulation

 If you go to "Settings > ChatGPT Manipulation" then you can turn this on and pick which model to use.

@@ -104,6 +121,22 @@ run the executable or "python text-to-mic.py"
 https://vb-audio.com/Cable/

 ## 2) ensure the OpenAI API key is specified in the .env file
+You can also optionally set `OPENAI_API_BASE_URL` in the .env file to use a compatible API endpoint other than OpenAI. For example, to use SiliconFlow's API:
+```
+OPENAI_API_KEY=your_api_key_here
+OPENAI_API_BASE_URL=https://api.siliconflow.cn/v1
+OPENAI_TTS_MODEL=FunAudioLLM/CosyVoice2-0.5B
+```
+**Important:** Use just the base URL (e.g., `https://api.siliconflow.cn/v1`), NOT the full endpoint path (don't add `/audio/speech`).
+
+Leave `OPENAI_API_BASE_URL` empty to use OpenAI's default endpoint.
+
+Available TTS models:
+- `tts-1` (OpenAI standard, default)
+- `tts-1-hd` (OpenAI high quality)
+- `gpt-4o-mini-tts` (OpenAI)
+- `FunAudioLLM/CosyVoice2-0.5B` (SiliconFlow - multi-language, emotional TTS)
+
 This sets up a virtual microphone that we can use to sent text to speech audio to. Then, when you join a meeting, such as a google meeting, you can select this virtual cable to hear the audio being sent on the channel.

 ## 3) Run the script:
--- a/text-to-mic-cli.py
+++ b/text-to-mic-cli.py
@@ -10,7 +10,14 @@ import os
 load_dotenv()

 # Set up your OpenAI API key from the environment variable
-client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
+api_key = os.getenv('OPENAI_API_KEY')
+api_base_url = os.getenv('OPENAI_API_BASE_URL', '').strip()
+
+# Create client with custom base URL if provided
+if api_base_url:
+    client = OpenAI(api_key=api_key, base_url=api_base_url)
+else:
+    client = OpenAI(api_key=api_key)

 def list_audio_devices():
    p = pyaudio.PyAudio()
@@ -81,9 +88,13 @@ def play_audio_multiplexed(file_paths, device_indices):
    
    p.terminate()
    
-def stream_audio_to_virtual_mic(text, voice="fable", device_index=None, device_index_2=None):
+def stream_audio_to_virtual_mic(text, voice="fable", model=None, device_index=None, device_index_2=None):
+    # Get model from environment variable or use default
+    if model is None:
+        model = os.getenv('OPENAI_TTS_MODEL', 'tts-1')
+
    response = client.audio.speech.create(
-        model="tts-1",
+        model=model,
        voice=voice,
        input=text,
        response_format='wav'
@@ -114,10 +125,27 @@ if __name__ == "__main__":

    if arglen < 2:
        print("Usage: python script.py 'text to convert'")
+        print("Environment variables:")
+        print("  OPENAI_API_KEY - Your API key (required)")
+        print("  OPENAI_API_BASE_URL - Custom API base URL (optional)")
+        print("  OPENAI_TTS_MODEL - TTS model to use (default: tts-1)")
+        print("")
+        print("Example models:")
+        print("  - tts-1 (OpenAI standard)")
+        print("  - tts-1-hd (OpenAI high quality)")
+        print("  - gpt-4o-mini-tts (OpenAI)")
+        print("  - FunAudioLLM/CosyVoice2-0.5B (SiliconFlow)")
+        print("")
+        print("For SiliconFlow voices with CosyVoice2:")
+        print("  The voice will be auto-formatted as: FunAudioLLM/CosyVoice2-0.5B:alex")
        sys.exit(1)
-    
+
    print(f"arg count {arglen}")

+    # Get TTS model from environment
+    tts_model = os.getenv('OPENAI_TTS_MODEL', 'tts-1')
+    print(f"Using TTS model: {tts_model}")
+
    if arglen == 4:
        device_index = int(sys.argv[2])
        device_index_2 = int(sys.argv[3])
@@ -129,5 +157,5 @@ if __name__ == "__main__":
        device_index = int(input("Enter the device index: "))
        device_index_2 = None

-    
-    stream_audio_to_virtual_mic(sys.argv[1], voice="fable", device_index=device_index,device_index_2=device_index_2)
+
+    stream_audio_to_virtual_mic(sys.argv[1], voice="fable", model=tts_model, device_index=device_index,device_index_2=device_index_2)
--- a/utils/settings_manager.py
+++ b/utils/settings_manager.py
@@ -37,7 +37,9 @@ class SettingsManager:
                "play_last_audio": ["ctrl", "shift", "8"],
                "cancel_operation": ["ctrl", "shift", "1"]
            },
-            "max_tokens": 750
+            "max_tokens": 750,
+            "api_base_url": "",
+            "tts_model": ""
        }
    
    @classmethod
--- a/utils/text_to_mic.py
+++ b/utils/text_to_mic.py
@@ -121,9 +121,20 @@ class TextToMic(tk.Tk):
        # Get API key using APIKeyManager
        self.api_key = APIKeyManager.get_api_key(self)
        self.has_api_key = bool(self.api_key)
-        
+
+        # Initialize settings before loading them
+        self.api_base_url = ""
+
        if self.has_api_key:
-            self.client = OpenAI(api_key=self.api_key)
+            # Load settings to get custom base URL
+            settings = self.load_settings()
+            self.api_base_url = settings.get("api_base_url", "").strip()
+
+            # Create OpenAI client with custom base URL if provided
+            if self.api_base_url:
+                self.client = OpenAI(api_key=self.api_key, base_url=self.api_base_url)
+            else:
+                self.client = OpenAI(api_key=self.api_key)
        
        # Initializing device index variables before they are used
        self.device_index = tk.StringVar(self)
@@ -247,8 +258,9 @@ class TextToMic(tk.Tk):
        settings_menu = Menu(self.menubar, tearoff=0)
        self.menubar.add_cascade(label="Settings", menu=settings_menu)
        settings_menu.add_command(label="API Key", command=self.change_api_key)
+        settings_menu.add_command(label="API Base URL", command=self.change_api_base_url)
        settings_menu.add_command(label="AI Copyediting", command=self.show_ai_editor_settings)
-        settings_menu.add_command(label="Keyboard Shortcuts", command=self.show_hotkey_settings)  
+        settings_menu.add_command(label="Keyboard Shortcuts", command=self.show_hotkey_settings)
        settings_menu.add_command(label="Manage Tones", command=self.show_tone_presets_manager)
        settings_menu.add_separator()
        
@@ -287,7 +299,89 @@ class TextToMic(tk.Tk):
        new_key = APIKeyManager.change_api_key(self)
        if new_key:
            self.api_key = new_key
-            self.client = OpenAI(api_key=self.api_key)
+            # Recreate client with base URL if set
+            if self.api_base_url:
+                self.client = OpenAI(api_key=self.api_key, base_url=self.api_base_url)
+            else:
+                self.client = OpenAI(api_key=self.api_key)
+
+    def change_api_base_url(self):
+        """Change the API base URL."""
+        from tkinter import simpledialog
+
+        # Show current URL in the prompt
+        current_url = self.api_base_url if self.api_base_url else "OpenAI Default"
+        prompt = f"Current API Base URL: {current_url}\n\nEnter custom API Base URL (leave empty to use OpenAI default):\n\nNote: For SiliconFlow, use: https://api.siliconflow.cn/v1"
+
+        new_url = simpledialog.askstring("API Base URL", prompt, parent=self)
+        if new_url is not None:  # User didn't cancel
+            new_url = new_url.strip()
+
+            # Warn if user included /audio/speech in the URL
+            if new_url and "/audio/speech" in new_url:
+                if not messagebox.askyesno("Incorrect Base URL",
+                    f"The base URL should not include '/audio/speech'.\n\n"
+                    f"You entered: {new_url}\n\n"
+                    f"Did you mean: {new_url.replace('/audio/speech', '')}\n\n"
+                    f"Click Yes to correct it, or No to use as-is.",
+                    parent=self):
+                    # User said No, keep as-is
+                    pass
+                else:
+                    # User said Yes, correct it
+                    new_url = new_url.replace('/audio/speech', '')
+
+            # Update settings
+            SettingsManager.update_settings({"api_base_url": new_url})
+
+            # Update instance variable
+            self.api_base_url = new_url
+
+            # Recreate client with new base URL
+            if self.api_key:
+                if self.api_base_url:
+                    self.client = OpenAI(api_key=self.api_key, base_url=self.api_base_url)
+                else:
+                    self.client = OpenAI(api_key=self.api_key)
+
+            # Update TTS model options based on new base URL
+            self.update_tts_model_options()
+
+            # Show confirmation
+            if new_url:
+                messagebox.showinfo("API Base URL Updated", f"API Base URL has been set to:\n{new_url}\n\nTTS model options have been updated.")
+            else:
+                messagebox.showinfo("API Base URL Reset", "API Base URL has been reset to OpenAI default.\n\nTTS model options have been updated.")
+
+    def update_tts_model_options(self):
+        """Update TTS model dropdown options based on current API base URL."""
+        if hasattr(self, 'tts_menu') and hasattr(self, 'tts_model_var'):
+            available_models = self.get_available_tts_models()
+            model_options = [model[1] for model in available_models]
+            model_ids = [model[0] for model in available_models]
+
+            # Store the new model IDs
+            self.tts_model_ids = model_ids
+
+            # Update the dropdown menu
+            self.tts_menu['menu'].delete(0, 'end')
+            for display_name in model_options:
+                self.tts_menu['menu'].add_command(label=display_name, command=tk._setit(self.tts_model_var, display_name, self.on_tts_model_change))
+
+            # Set default based on API base URL
+            if self.api_base_url and "siliconflow" in self.api_base_url.lower():
+                default_model = "FunAudioLLM/CosyVoice2-0.5B"
+            else:
+                default_model = "gpt-4o-mini-tts"
+
+            # Find and set the display name for the default model
+            for i, model_id in enumerate(model_ids):
+                if model_id == default_model:
+                    self.tts_model_var.set(model_options[i])
+                    break
+
+            # Trigger model change to update voices
+            self.on_tts_model_change()

    def get_audio_file_path(self, filename):
        if platform.system() == 'Darwin':  # Check if the OS is macOS
@@ -403,41 +497,87 @@ class TextToMic(tk.Tk):
        # Set fixed width for all labels
        label_width = 35  # Adjust this value as needed for your UI
        
+        # Initialize TTS model selection
+        available_tts_models = self.get_available_tts_models()
+        tts_model_options = [model[1] for model in available_tts_models]  # Use display names
+        tts_model_ids = [model[0] for model in available_tts_models]  # Store model IDs
+
+        # Get saved TTS model or use default
+        settings = self.load_settings()
+        saved_tts_model = settings.get("tts_model", "")
+        if not saved_tts_model:
+            # Default based on API base URL
+            if self.api_base_url and "siliconflow" in self.api_base_url.lower():
+                saved_tts_model = "FunAudioLLM/CosyVoice2-0.5B"
+            else:
+                saved_tts_model = "gpt-4o-mini-tts"
+
+        # Find the display name for the saved model
+        default_tts_model_display = tts_model_options[0]
+        for i, model_id in enumerate(tts_model_ids):
+            if model_id == saved_tts_model:
+                default_tts_model_display = tts_model_options[i]
+                break
+
+        self.tts_model_var = tk.StringVar(value=default_tts_model_display)
+        self.tts_model_ids = tts_model_ids  # Store for later lookup
+
+        # TTS Model selection dropdown
+        tts_label = ttk.Label(voice_frame, text="TTS Model:", width=label_width)
+        tts_label.grid(column=0, row=0, sticky=tk.W, pady=(0, 5))
+        tts_menu = ttk.OptionMenu(voice_frame, self.tts_model_var, self.tts_model_var.get(), *tts_model_options, command=self.on_tts_model_change)
+        tts_menu.grid(column=1, row=0, sticky="ew", pady=(0, 5))
+        tts_menu.config(width=dropdown_width, style='Compact.TMenubutton')
+        self.tts_menu = tts_menu  # Store reference for later updates
+
        # Initialize voice selection
        self.available_voices = self.get_available_voices()
-        
+
        # Determine default voice based on whether API key is available
        default_voice = "fable" if self.has_api_key else self.available_voices[0] if self.available_voices else "[System] Default"
-        
+
        self.voice_var = tk.StringVar(value=default_voice)
-        
+
        voice_label = ttk.Label(voice_frame, text="Voice:", width=label_width)
        voice_label.grid(column=0, row=1, sticky=tk.W, pady=(0, 5))
-        voice_menu = ttk.OptionMenu(voice_frame, self.voice_var, self.voice_var.get(), *self.available_voices, command=self.on_voice_change)
+
+        # Use Combobox instead of OptionMenu to allow both selection and typing
+        voice_menu = ttk.Combobox(voice_frame, textvariable=self.voice_var, values=self.available_voices, state="readonly", width=30)
        voice_menu.grid(column=1, row=1, sticky="ew", pady=(0, 5))
-        voice_menu.config(width=dropdown_width, style='Compact.TMenubutton')
+        voice_menu.bind('<<ComboboxSelected>>', lambda e: self.on_voice_change())
+        # Allow typing by switching to normal state on focus, readonly on unfocus
+        voice_menu.bind('<FocusIn>', lambda e: voice_menu.config(state="normal"))
+        voice_menu.bind('<FocusOut>', lambda e: self.on_voice_exit(voice_menu))
+        self.voice_menu = voice_menu  # Store reference for later updates
+
+        # Add hint label for custom voices
+        voice_hint = ttk.Label(voice_frame,
+                              text="💡 Click to edit or type custom voice ID",
+                              font=("Arial", 7, "italic"),
+                              foreground="gray")
+        voice_hint.grid(column=1, row=2, sticky="w", pady=(0, 5))

        # Tone selection with warning for basic version
        self.tone_var = tk.StringVar(value=self.current_tone_name)
        tone_options = ["None"] + list(self.tone_presets.keys())
        tone_label = ttk.Label(voice_frame, text="Tone Preset:", width=label_width)
-        tone_label.grid(column=0, row=2, sticky=tk.W, pady=(0, 5))
+        tone_label.grid(column=0, row=3, sticky=tk.W, pady=(0, 5))
        self.tone_menu = ttk.OptionMenu(voice_frame, self.tone_var, self.tone_var.get(), *tone_options, command=self.on_tone_change)
-        self.tone_menu.grid(column=1, row=2, sticky="ew", pady=(0, 5))
+        self.tone_menu.grid(column=1, row=3, sticky="ew", pady=(0, 5))
        self.tone_menu.config(width=dropdown_width, style='Compact.TMenubutton')
-        
+
        # Check if we should disable tone menu based on voice type
        if self.voice_var.get().startswith("[System]"):
            self.tone_menu.state(['disabled'])
            self.tone_var.set("None")
-        
+
        # Add warning label for basic version
        if not self.has_api_key:
-            warning_label = ttk.Label(voice_frame, 
-                                    text="⚠️ Basic Version - Add API Key in Settings for full features", 
+            warning_label = ttk.Label(voice_frame,
+                                    text="⚠️ Basic Version - Add API Key in Settings for full features",
                                    foreground="orange",
                                    font=("Arial", 8, "italic"))
-            warning_label.grid(column=0, row=3, columnspan=2, sticky=tk.W, pady=(5, 0))
+            warning_label.grid(column=0, row=4, columnspan=2, sticky=tk.W, pady=(5, 0))

        # Separator between Voice Settings and Device Settings
        separator = ttk.Separator(main_frame, orient='horizontal')
@@ -864,9 +1004,34 @@ class TextToMic(tk.Tk):
                return
            
            try:
+                # Get the selected TTS model
+                selected_tts_model_display = self.tts_model_var.get()
+                selected_tts_model = "gpt-4o-mini-tts"  # Default
+
+                # Find the model ID from display name
+                available_models = self.get_available_tts_models()
+                for model_id, display_name in available_models:
+                    if display_name == selected_tts_model_display:
+                        selected_tts_model = model_id
+                        break
+
+                print(f"[DEBUG] Selected TTS model display: {selected_tts_model_display}")
+                print(f"[DEBUG] Using TTS model ID: {selected_tts_model}")
+                print(f"[DEBUG] Selected voice: {selected_voice}")
+
+                # For SiliconFlow CosyVoice2-0.5B model, format voice as model:voice
+                # Example: FunAudioLLM/CosyVoice2-0.5B:alex
+                voice_to_use = selected_voice
+                if "CosyVoice2" in selected_tts_model and self.api_base_url and "siliconflow" in self.api_base_url.lower():
+                    voice_to_use = f"{selected_tts_model}:{selected_voice}"
+                    print(f"[DEBUG] Formatted voice for CosyVoice2: {voice_to_use}")
+
+                print(f"[DEBUG] API call - Model: {selected_tts_model}, Voice: {voice_to_use}")
+                print(f"[DEBUG] API Base URL: {self.api_base_url if self.api_base_url else 'OpenAI Default'}")
+
                response = self.client.audio.speech.create(
-                    model="gpt-4o-mini-tts",
-                    voice=selected_voice,
+                    model=selected_tts_model,
+                    voice=voice_to_use,
                    input=text,
                    instructions=tone_instructions,
                    response_format='wav'
@@ -1573,13 +1738,13 @@ class TextToMic(tk.Tk):
        if self.has_api_key:
            # Add OpenAI voices
            voices.extend(['alloy', 'ash', 'ballad', 'coral', 'echo', 'fable', 'onyx', 'nova', 'sage', 'shimmer'])
-        
+
        # Add system voices with [System] prefix
        try:
            if hasattr(self, 'system_voices') and self.system_voices:
                for voice in self.system_voices:
                    voices.append(f"[System] {voice.name}")
-            
+
            # If no system voices were found, add a default system voice
            if not voices:
                voices.append("[System] Default")
@@ -1588,14 +1753,81 @@ class TextToMic(tk.Tk):
            # Ensure we have at least one voice option
            if not voices:
                voices.append("[System] Default")
-        
+
        return voices

+    def get_available_tts_models(self):
+        """Get list of available TTS models based on the current API base URL."""
+        # Check if using SiliconFlow
+        is_siliconflow = self.api_base_url and "siliconflow" in self.api_base_url.lower()
+
+        if is_siliconflow:
+            # SiliconFlow TTS models
+            return [
+                ("FunAudioLLM/CosyVoice2-0.5B", "CosyVoice2-0.5B (Multi-language, Emotional)"),
+                ("tts-1", "TTS-1 (OpenAI Compatible)"),
+                ("tts-1-hd", "TTS-1 HD (OpenAI Compatible)")
+            ]
+        else:
+            # OpenAI TTS models
+            return [
+                ("gpt-4o-mini-tts", "GPT-4o Mini TTS (Recommended)"),
+                ("tts-1", "TTS-1 (Standard)"),
+                ("tts-1-hd", "TTS-1 HD (High Quality)")
+            ]
+
+    def get_siliconflow_voices(self):
+        """Get SiliconFlow-specific voices for CosyVoice2-0.5B model."""
+        return [
+            'alex', 'anna', 'bella', 'benjamin', 'charles',
+            'claire', 'david', 'diana'
+        ]
+
+    def update_available_voices(self):
+        """Update available voices based on selected TTS model."""
+        tts_model_display = self.tts_model_var.get() if hasattr(self, 'tts_model_var') else ""
+
+        # Get the actual model ID from display name
+        tts_model_id = None
+        available_models = self.get_available_tts_models()
+        for model_id, display_name in available_models:
+            if display_name == tts_model_display:
+                tts_model_id = model_id
+                break
+
+        # If using CosyVoice2-0.5B with SiliconFlow, use SiliconFlow voices
+        if tts_model_id and "CosyVoice2" in tts_model_id and self.api_base_url and "siliconflow" in self.api_base_url.lower():
+            voices = self.get_siliconflow_voices()
+            print(f"[DEBUG] Using SiliconFlow CosyVoice voices: {voices}")
+            # Also add system voices
+            if hasattr(self, 'system_voices') and self.system_voices:
+                for voice in self.system_voices:
+                    voices.append(f"[System] {voice.name}")
+            if not voices:
+                voices.append("[System] Default")
+        else:
+            voices = self.get_available_voices()
+            print(f"[DEBUG] Using standard voices")
+
+        # Update the voice dropdown (now using Combobox)
+        if hasattr(self, 'voice_menu'):
+            current_voice = self.voice_var.get()
+
+            # Update the combobox values
+            self.voice_menu['values'] = voices
+
+            # Set default if current voice not in list (unless it's a custom voice)
+            if current_voice not in voices and not (current_voice and not current_voice.startswith("[System]")):
+                self.voice_var.set(voices[0] if voices else "")
+                print(f"[DEBUG] Voice changed to: {voices[0] if voices else ''}")
+            else:
+                print(f"[DEBUG] Voice kept as: {current_voice}")
+
    def on_voice_change(self, *args):
        """Handle voice selection change."""
        selected_voice = self.voice_var.get()
        is_system_voice = selected_voice.startswith("[System]")
-        
+
        # Update tone menu state based on voice type
        if is_system_voice:
            self.tone_menu.state(['disabled'])
@@ -1603,6 +1835,46 @@ class TextToMic(tk.Tk):
        else:
            self.tone_menu.state(['!disabled'])

+    def on_voice_exit(self, combobox):
+        """Handle voice combobox focus out - validate and update state."""
+        entered_voice = self.voice_var.get().strip()
+
+        # If empty, set to first available voice
+        if not entered_voice:
+            if hasattr(self, 'voice_menu'):
+                values = self.voice_menu['values']
+                if values:
+                    self.voice_var.set(values[0])
+                    self.on_voice_change()
+
+        # Switch back to readonly state
+        combobox.config(state="readonly")
+
+        # Trigger voice change to update tone menu
+        self.on_voice_change()
+
+    def on_tts_model_change(self, *args):
+        """Handle TTS model selection change."""
+        selected_model_display = self.tts_model_var.get()
+
+        # Find the model ID from display name
+        model_id = None
+        available_models = self.get_available_tts_models()
+        for model_id_val, display_name in available_models:
+            if display_name == selected_model_display:
+                model_id = model_id_val
+                break
+
+        # Save the selected model to settings
+        if model_id:
+            SettingsManager.update_settings({"tts_model": model_id})
+            print(f"[DEBUG] TTS model changed to: {model_id}")  # Debug logging
+        else:
+            print(f"[DEBUG] Warning: Could not find model ID for display name: {selected_model_display}")
+
+        # Update available voices based on the selected model
+        self.update_available_voices()
+
    def update_window_size(self):
        """Update window size based on current banner and presets state."""
        # Calculate a width that preserves the current width if it's larger than default