feat: Add SiliconFlow TTS API support with custom base URL and model selection
This commit adds comprehensive support for using SiliconFlow's TTS API as an alternative to OpenAI, including: Features: - Configurable API base URL (Settings > API Base URL) - TTS model selection dropdown (CosyVoice2-0.5B, OpenAI compatible models) - Dynamic voice options based on selected model - Editable voice dropdown (Combobox) supporting custom voice IDs - Automatic voice formatting for SiliconFlow (model:voice format) - Debug logging for troubleshooting API calls - Warning for incorrect base URL format Changes: - utils/settings_manager.py: Added api_base_url and tts_model settings - utils/text_to_mic.py: - Added get_available_tts_models() for model options - Added get_siliconflow_voices() for SiliconFlow voices - Added change_api_base_url() method with validation - Added TTS model dropdown in GUI - Converted voice dropdown to Combobox for typing support - Added on_voice_exit() for validation - Updated API call to use selected model and formatted voice - text-to-mic-cli.py: Added OPENAI_API_BASE_URL and OPENAI_TTS_MODEL env var support - Readme.md: Updated documentation with SiliconFlow usage instructions Supported Models: - FunAudioLLM/CosyVoice2-0.5B (SiliconFlow - multi-language, emotional) - tts-1, tts-1-hd (OpenAI compatible) - gpt-4o-mini-tts (OpenAI default) SiliconFlow Voices (CosyVoice2-0.5B): - Male: alex, benjamin, charles, david - Female: anna, bella, claire, diana - Custom voices via voice ID entry Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
35
Readme.md
35
Readme.md
@@ -65,12 +65,29 @@ https://platform.openai.com/docs/quickstart/account-setup
|
||||
|
||||
6. You can change the API key at any time under the 'Settings' menu.
|
||||
|
||||
7. (Optional) You can also configure a custom API Base URL under 'Settings > API Base URL' to use compatible API endpoints other than OpenAI. For example, to use SiliconFlow's API, set the base URL to `https://api.siliconflow.cn/v1` (Note: use just the base URL, NOT the full endpoint path). Leave empty to use OpenAI's default endpoint.
|
||||
|
||||
8. (Optional) You can select different TTS models from the "TTS Model" dropdown. When using SiliconFlow, the CosyVoice2-0.5B model will be available with 8 built-in voices (alex, anna, bella, benjamin, charles, claire, david, diana). The voice options will update automatically based on the selected model.
|
||||
|
||||
9. (Optional) The Voice dropdown supports both selecting from the list and typing custom voice IDs. Click on the voice field to type a custom voice ID (e.g., for SiliconFlow custom voices like `speech:your-voice-name:xxxx`). This is useful if you've uploaded custom voice samples to SiliconFlow.
|
||||
|
||||
This tool was brought to you by Scorchsoft - We build custom apps to your requirements. Please contact us if you have a requirement for a custom app project.
|
||||
|
||||
## Advanced Tips
|
||||
|
||||
|
||||
### 1. ChatGPT AI Manipulation
|
||||
### 1. Custom Voices with SiliconFlow
|
||||
|
||||
When using SiliconFlow's API, you can upload your own voice samples and use them by entering the custom voice ID in the Voice dropdown. To upload a custom voice:
|
||||
|
||||
1. Upload your voice sample to SiliconFlow (see their documentation)
|
||||
2. You'll receive a voice ID like: `speech:your-voice-name:cm04pf7az00061413w7kz5qxs:mjtkgbyuunvtybnsvbxd`
|
||||
3. Click on the Voice dropdown and type/paste this custom voice ID
|
||||
4. The app will use this custom voice for TTS
|
||||
|
||||
For more information on uploading custom voices, see: [SiliconFlow Text-to-Speech Documentation](https://docs.siliconflow.cn/en/userguide/capabilities/text-to-speech)
|
||||
|
||||
### 2. ChatGPT AI Manipulation
|
||||
|
||||
If you go to "Settings > ChatGPT Manipulation" then you can turn this on and pick which model to use.
|
||||
|
||||
@@ -104,6 +121,22 @@ run the executable or "python text-to-mic.py"
|
||||
https://vb-audio.com/Cable/
|
||||
|
||||
## 2) ensure the OpenAI API key is specified in the .env file
|
||||
You can also optionally set `OPENAI_API_BASE_URL` in the .env file to use a compatible API endpoint other than OpenAI. For example, to use SiliconFlow's API:
|
||||
```
|
||||
OPENAI_API_KEY=your_api_key_here
|
||||
OPENAI_API_BASE_URL=https://api.siliconflow.cn/v1
|
||||
OPENAI_TTS_MODEL=FunAudioLLM/CosyVoice2-0.5B
|
||||
```
|
||||
**Important:** Use just the base URL (e.g., `https://api.siliconflow.cn/v1`), NOT the full endpoint path (don't add `/audio/speech`).
|
||||
|
||||
Leave `OPENAI_API_BASE_URL` empty to use OpenAI's default endpoint.
|
||||
|
||||
Available TTS models:
|
||||
- `tts-1` (OpenAI standard, default)
|
||||
- `tts-1-hd` (OpenAI high quality)
|
||||
- `gpt-4o-mini-tts` (OpenAI)
|
||||
- `FunAudioLLM/CosyVoice2-0.5B` (SiliconFlow - multi-language, emotional TTS)
|
||||
|
||||
This sets up a virtual microphone that we can use to sent text to speech audio to. Then, when you join a meeting, such as a google meeting, you can select this virtual cable to hear the audio being sent on the channel.
|
||||
|
||||
## 3) Run the script:
|
||||
|
||||
@@ -10,7 +10,14 @@ import os
|
||||
load_dotenv()
|
||||
|
||||
# Set up your OpenAI API key from the environment variable
|
||||
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
|
||||
api_key = os.getenv('OPENAI_API_KEY')
|
||||
api_base_url = os.getenv('OPENAI_API_BASE_URL', '').strip()
|
||||
|
||||
# Create client with custom base URL if provided
|
||||
if api_base_url:
|
||||
client = OpenAI(api_key=api_key, base_url=api_base_url)
|
||||
else:
|
||||
client = OpenAI(api_key=api_key)
|
||||
|
||||
def list_audio_devices():
|
||||
p = pyaudio.PyAudio()
|
||||
@@ -81,9 +88,13 @@ def play_audio_multiplexed(file_paths, device_indices):
|
||||
|
||||
p.terminate()
|
||||
|
||||
def stream_audio_to_virtual_mic(text, voice="fable", device_index=None, device_index_2=None):
|
||||
def stream_audio_to_virtual_mic(text, voice="fable", model=None, device_index=None, device_index_2=None):
|
||||
# Get model from environment variable or use default
|
||||
if model is None:
|
||||
model = os.getenv('OPENAI_TTS_MODEL', 'tts-1')
|
||||
|
||||
response = client.audio.speech.create(
|
||||
model="tts-1",
|
||||
model=model,
|
||||
voice=voice,
|
||||
input=text,
|
||||
response_format='wav'
|
||||
@@ -114,10 +125,27 @@ if __name__ == "__main__":
|
||||
|
||||
if arglen < 2:
|
||||
print("Usage: python script.py 'text to convert'")
|
||||
print("Environment variables:")
|
||||
print(" OPENAI_API_KEY - Your API key (required)")
|
||||
print(" OPENAI_API_BASE_URL - Custom API base URL (optional)")
|
||||
print(" OPENAI_TTS_MODEL - TTS model to use (default: tts-1)")
|
||||
print("")
|
||||
print("Example models:")
|
||||
print(" - tts-1 (OpenAI standard)")
|
||||
print(" - tts-1-hd (OpenAI high quality)")
|
||||
print(" - gpt-4o-mini-tts (OpenAI)")
|
||||
print(" - FunAudioLLM/CosyVoice2-0.5B (SiliconFlow)")
|
||||
print("")
|
||||
print("For SiliconFlow voices with CosyVoice2:")
|
||||
print(" The voice will be auto-formatted as: FunAudioLLM/CosyVoice2-0.5B:alex")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
print(f"arg count {arglen}")
|
||||
|
||||
# Get TTS model from environment
|
||||
tts_model = os.getenv('OPENAI_TTS_MODEL', 'tts-1')
|
||||
print(f"Using TTS model: {tts_model}")
|
||||
|
||||
if arglen == 4:
|
||||
device_index = int(sys.argv[2])
|
||||
device_index_2 = int(sys.argv[3])
|
||||
@@ -129,5 +157,5 @@ if __name__ == "__main__":
|
||||
device_index = int(input("Enter the device index: "))
|
||||
device_index_2 = None
|
||||
|
||||
|
||||
stream_audio_to_virtual_mic(sys.argv[1], voice="fable", device_index=device_index,device_index_2=device_index_2)
|
||||
|
||||
stream_audio_to_virtual_mic(sys.argv[1], voice="fable", model=tts_model, device_index=device_index,device_index_2=device_index_2)
|
||||
|
||||
@@ -37,7 +37,9 @@ class SettingsManager:
|
||||
"play_last_audio": ["ctrl", "shift", "8"],
|
||||
"cancel_operation": ["ctrl", "shift", "1"]
|
||||
},
|
||||
"max_tokens": 750
|
||||
"max_tokens": 750,
|
||||
"api_base_url": "",
|
||||
"tts_model": ""
|
||||
}
|
||||
|
||||
@classmethod
|
||||
|
||||
@@ -121,9 +121,20 @@ class TextToMic(tk.Tk):
|
||||
# Get API key using APIKeyManager
|
||||
self.api_key = APIKeyManager.get_api_key(self)
|
||||
self.has_api_key = bool(self.api_key)
|
||||
|
||||
|
||||
# Initialize settings before loading them
|
||||
self.api_base_url = ""
|
||||
|
||||
if self.has_api_key:
|
||||
self.client = OpenAI(api_key=self.api_key)
|
||||
# Load settings to get custom base URL
|
||||
settings = self.load_settings()
|
||||
self.api_base_url = settings.get("api_base_url", "").strip()
|
||||
|
||||
# Create OpenAI client with custom base URL if provided
|
||||
if self.api_base_url:
|
||||
self.client = OpenAI(api_key=self.api_key, base_url=self.api_base_url)
|
||||
else:
|
||||
self.client = OpenAI(api_key=self.api_key)
|
||||
|
||||
# Initializing device index variables before they are used
|
||||
self.device_index = tk.StringVar(self)
|
||||
@@ -247,8 +258,9 @@ class TextToMic(tk.Tk):
|
||||
settings_menu = Menu(self.menubar, tearoff=0)
|
||||
self.menubar.add_cascade(label="Settings", menu=settings_menu)
|
||||
settings_menu.add_command(label="API Key", command=self.change_api_key)
|
||||
settings_menu.add_command(label="API Base URL", command=self.change_api_base_url)
|
||||
settings_menu.add_command(label="AI Copyediting", command=self.show_ai_editor_settings)
|
||||
settings_menu.add_command(label="Keyboard Shortcuts", command=self.show_hotkey_settings)
|
||||
settings_menu.add_command(label="Keyboard Shortcuts", command=self.show_hotkey_settings)
|
||||
settings_menu.add_command(label="Manage Tones", command=self.show_tone_presets_manager)
|
||||
settings_menu.add_separator()
|
||||
|
||||
@@ -287,7 +299,89 @@ class TextToMic(tk.Tk):
|
||||
new_key = APIKeyManager.change_api_key(self)
|
||||
if new_key:
|
||||
self.api_key = new_key
|
||||
self.client = OpenAI(api_key=self.api_key)
|
||||
# Recreate client with base URL if set
|
||||
if self.api_base_url:
|
||||
self.client = OpenAI(api_key=self.api_key, base_url=self.api_base_url)
|
||||
else:
|
||||
self.client = OpenAI(api_key=self.api_key)
|
||||
|
||||
def change_api_base_url(self):
|
||||
"""Change the API base URL."""
|
||||
from tkinter import simpledialog
|
||||
|
||||
# Show current URL in the prompt
|
||||
current_url = self.api_base_url if self.api_base_url else "OpenAI Default"
|
||||
prompt = f"Current API Base URL: {current_url}\n\nEnter custom API Base URL (leave empty to use OpenAI default):\n\nNote: For SiliconFlow, use: https://api.siliconflow.cn/v1"
|
||||
|
||||
new_url = simpledialog.askstring("API Base URL", prompt, parent=self)
|
||||
if new_url is not None: # User didn't cancel
|
||||
new_url = new_url.strip()
|
||||
|
||||
# Warn if user included /audio/speech in the URL
|
||||
if new_url and "/audio/speech" in new_url:
|
||||
if not messagebox.askyesno("Incorrect Base URL",
|
||||
f"The base URL should not include '/audio/speech'.\n\n"
|
||||
f"You entered: {new_url}\n\n"
|
||||
f"Did you mean: {new_url.replace('/audio/speech', '')}\n\n"
|
||||
f"Click Yes to correct it, or No to use as-is.",
|
||||
parent=self):
|
||||
# User said No, keep as-is
|
||||
pass
|
||||
else:
|
||||
# User said Yes, correct it
|
||||
new_url = new_url.replace('/audio/speech', '')
|
||||
|
||||
# Update settings
|
||||
SettingsManager.update_settings({"api_base_url": new_url})
|
||||
|
||||
# Update instance variable
|
||||
self.api_base_url = new_url
|
||||
|
||||
# Recreate client with new base URL
|
||||
if self.api_key:
|
||||
if self.api_base_url:
|
||||
self.client = OpenAI(api_key=self.api_key, base_url=self.api_base_url)
|
||||
else:
|
||||
self.client = OpenAI(api_key=self.api_key)
|
||||
|
||||
# Update TTS model options based on new base URL
|
||||
self.update_tts_model_options()
|
||||
|
||||
# Show confirmation
|
||||
if new_url:
|
||||
messagebox.showinfo("API Base URL Updated", f"API Base URL has been set to:\n{new_url}\n\nTTS model options have been updated.")
|
||||
else:
|
||||
messagebox.showinfo("API Base URL Reset", "API Base URL has been reset to OpenAI default.\n\nTTS model options have been updated.")
|
||||
|
||||
def update_tts_model_options(self):
|
||||
"""Update TTS model dropdown options based on current API base URL."""
|
||||
if hasattr(self, 'tts_menu') and hasattr(self, 'tts_model_var'):
|
||||
available_models = self.get_available_tts_models()
|
||||
model_options = [model[1] for model in available_models]
|
||||
model_ids = [model[0] for model in available_models]
|
||||
|
||||
# Store the new model IDs
|
||||
self.tts_model_ids = model_ids
|
||||
|
||||
# Update the dropdown menu
|
||||
self.tts_menu['menu'].delete(0, 'end')
|
||||
for display_name in model_options:
|
||||
self.tts_menu['menu'].add_command(label=display_name, command=tk._setit(self.tts_model_var, display_name, self.on_tts_model_change))
|
||||
|
||||
# Set default based on API base URL
|
||||
if self.api_base_url and "siliconflow" in self.api_base_url.lower():
|
||||
default_model = "FunAudioLLM/CosyVoice2-0.5B"
|
||||
else:
|
||||
default_model = "gpt-4o-mini-tts"
|
||||
|
||||
# Find and set the display name for the default model
|
||||
for i, model_id in enumerate(model_ids):
|
||||
if model_id == default_model:
|
||||
self.tts_model_var.set(model_options[i])
|
||||
break
|
||||
|
||||
# Trigger model change to update voices
|
||||
self.on_tts_model_change()
|
||||
|
||||
def get_audio_file_path(self, filename):
|
||||
if platform.system() == 'Darwin': # Check if the OS is macOS
|
||||
@@ -403,41 +497,87 @@ class TextToMic(tk.Tk):
|
||||
# Set fixed width for all labels
|
||||
label_width = 35 # Adjust this value as needed for your UI
|
||||
|
||||
# Initialize TTS model selection
|
||||
available_tts_models = self.get_available_tts_models()
|
||||
tts_model_options = [model[1] for model in available_tts_models] # Use display names
|
||||
tts_model_ids = [model[0] for model in available_tts_models] # Store model IDs
|
||||
|
||||
# Get saved TTS model or use default
|
||||
settings = self.load_settings()
|
||||
saved_tts_model = settings.get("tts_model", "")
|
||||
if not saved_tts_model:
|
||||
# Default based on API base URL
|
||||
if self.api_base_url and "siliconflow" in self.api_base_url.lower():
|
||||
saved_tts_model = "FunAudioLLM/CosyVoice2-0.5B"
|
||||
else:
|
||||
saved_tts_model = "gpt-4o-mini-tts"
|
||||
|
||||
# Find the display name for the saved model
|
||||
default_tts_model_display = tts_model_options[0]
|
||||
for i, model_id in enumerate(tts_model_ids):
|
||||
if model_id == saved_tts_model:
|
||||
default_tts_model_display = tts_model_options[i]
|
||||
break
|
||||
|
||||
self.tts_model_var = tk.StringVar(value=default_tts_model_display)
|
||||
self.tts_model_ids = tts_model_ids # Store for later lookup
|
||||
|
||||
# TTS Model selection dropdown
|
||||
tts_label = ttk.Label(voice_frame, text="TTS Model:", width=label_width)
|
||||
tts_label.grid(column=0, row=0, sticky=tk.W, pady=(0, 5))
|
||||
tts_menu = ttk.OptionMenu(voice_frame, self.tts_model_var, self.tts_model_var.get(), *tts_model_options, command=self.on_tts_model_change)
|
||||
tts_menu.grid(column=1, row=0, sticky="ew", pady=(0, 5))
|
||||
tts_menu.config(width=dropdown_width, style='Compact.TMenubutton')
|
||||
self.tts_menu = tts_menu # Store reference for later updates
|
||||
|
||||
# Initialize voice selection
|
||||
self.available_voices = self.get_available_voices()
|
||||
|
||||
|
||||
# Determine default voice based on whether API key is available
|
||||
default_voice = "fable" if self.has_api_key else self.available_voices[0] if self.available_voices else "[System] Default"
|
||||
|
||||
|
||||
self.voice_var = tk.StringVar(value=default_voice)
|
||||
|
||||
|
||||
voice_label = ttk.Label(voice_frame, text="Voice:", width=label_width)
|
||||
voice_label.grid(column=0, row=1, sticky=tk.W, pady=(0, 5))
|
||||
voice_menu = ttk.OptionMenu(voice_frame, self.voice_var, self.voice_var.get(), *self.available_voices, command=self.on_voice_change)
|
||||
|
||||
# Use Combobox instead of OptionMenu to allow both selection and typing
|
||||
voice_menu = ttk.Combobox(voice_frame, textvariable=self.voice_var, values=self.available_voices, state="readonly", width=30)
|
||||
voice_menu.grid(column=1, row=1, sticky="ew", pady=(0, 5))
|
||||
voice_menu.config(width=dropdown_width, style='Compact.TMenubutton')
|
||||
voice_menu.bind('<<ComboboxSelected>>', lambda e: self.on_voice_change())
|
||||
# Allow typing by switching to normal state on focus, readonly on unfocus
|
||||
voice_menu.bind('<FocusIn>', lambda e: voice_menu.config(state="normal"))
|
||||
voice_menu.bind('<FocusOut>', lambda e: self.on_voice_exit(voice_menu))
|
||||
self.voice_menu = voice_menu # Store reference for later updates
|
||||
|
||||
# Add hint label for custom voices
|
||||
voice_hint = ttk.Label(voice_frame,
|
||||
text="💡 Click to edit or type custom voice ID",
|
||||
font=("Arial", 7, "italic"),
|
||||
foreground="gray")
|
||||
voice_hint.grid(column=1, row=2, sticky="w", pady=(0, 5))
|
||||
|
||||
# Tone selection with warning for basic version
|
||||
self.tone_var = tk.StringVar(value=self.current_tone_name)
|
||||
tone_options = ["None"] + list(self.tone_presets.keys())
|
||||
tone_label = ttk.Label(voice_frame, text="Tone Preset:", width=label_width)
|
||||
tone_label.grid(column=0, row=2, sticky=tk.W, pady=(0, 5))
|
||||
tone_label.grid(column=0, row=3, sticky=tk.W, pady=(0, 5))
|
||||
self.tone_menu = ttk.OptionMenu(voice_frame, self.tone_var, self.tone_var.get(), *tone_options, command=self.on_tone_change)
|
||||
self.tone_menu.grid(column=1, row=2, sticky="ew", pady=(0, 5))
|
||||
self.tone_menu.grid(column=1, row=3, sticky="ew", pady=(0, 5))
|
||||
self.tone_menu.config(width=dropdown_width, style='Compact.TMenubutton')
|
||||
|
||||
|
||||
# Check if we should disable tone menu based on voice type
|
||||
if self.voice_var.get().startswith("[System]"):
|
||||
self.tone_menu.state(['disabled'])
|
||||
self.tone_var.set("None")
|
||||
|
||||
|
||||
# Add warning label for basic version
|
||||
if not self.has_api_key:
|
||||
warning_label = ttk.Label(voice_frame,
|
||||
text="⚠️ Basic Version - Add API Key in Settings for full features",
|
||||
warning_label = ttk.Label(voice_frame,
|
||||
text="⚠️ Basic Version - Add API Key in Settings for full features",
|
||||
foreground="orange",
|
||||
font=("Arial", 8, "italic"))
|
||||
warning_label.grid(column=0, row=3, columnspan=2, sticky=tk.W, pady=(5, 0))
|
||||
warning_label.grid(column=0, row=4, columnspan=2, sticky=tk.W, pady=(5, 0))
|
||||
|
||||
# Separator between Voice Settings and Device Settings
|
||||
separator = ttk.Separator(main_frame, orient='horizontal')
|
||||
@@ -864,9 +1004,34 @@ class TextToMic(tk.Tk):
|
||||
return
|
||||
|
||||
try:
|
||||
# Get the selected TTS model
|
||||
selected_tts_model_display = self.tts_model_var.get()
|
||||
selected_tts_model = "gpt-4o-mini-tts" # Default
|
||||
|
||||
# Find the model ID from display name
|
||||
available_models = self.get_available_tts_models()
|
||||
for model_id, display_name in available_models:
|
||||
if display_name == selected_tts_model_display:
|
||||
selected_tts_model = model_id
|
||||
break
|
||||
|
||||
print(f"[DEBUG] Selected TTS model display: {selected_tts_model_display}")
|
||||
print(f"[DEBUG] Using TTS model ID: {selected_tts_model}")
|
||||
print(f"[DEBUG] Selected voice: {selected_voice}")
|
||||
|
||||
# For SiliconFlow CosyVoice2-0.5B model, format voice as model:voice
|
||||
# Example: FunAudioLLM/CosyVoice2-0.5B:alex
|
||||
voice_to_use = selected_voice
|
||||
if "CosyVoice2" in selected_tts_model and self.api_base_url and "siliconflow" in self.api_base_url.lower():
|
||||
voice_to_use = f"{selected_tts_model}:{selected_voice}"
|
||||
print(f"[DEBUG] Formatted voice for CosyVoice2: {voice_to_use}")
|
||||
|
||||
print(f"[DEBUG] API call - Model: {selected_tts_model}, Voice: {voice_to_use}")
|
||||
print(f"[DEBUG] API Base URL: {self.api_base_url if self.api_base_url else 'OpenAI Default'}")
|
||||
|
||||
response = self.client.audio.speech.create(
|
||||
model="gpt-4o-mini-tts",
|
||||
voice=selected_voice,
|
||||
model=selected_tts_model,
|
||||
voice=voice_to_use,
|
||||
input=text,
|
||||
instructions=tone_instructions,
|
||||
response_format='wav'
|
||||
@@ -1573,13 +1738,13 @@ class TextToMic(tk.Tk):
|
||||
if self.has_api_key:
|
||||
# Add OpenAI voices
|
||||
voices.extend(['alloy', 'ash', 'ballad', 'coral', 'echo', 'fable', 'onyx', 'nova', 'sage', 'shimmer'])
|
||||
|
||||
|
||||
# Add system voices with [System] prefix
|
||||
try:
|
||||
if hasattr(self, 'system_voices') and self.system_voices:
|
||||
for voice in self.system_voices:
|
||||
voices.append(f"[System] {voice.name}")
|
||||
|
||||
|
||||
# If no system voices were found, add a default system voice
|
||||
if not voices:
|
||||
voices.append("[System] Default")
|
||||
@@ -1588,14 +1753,81 @@ class TextToMic(tk.Tk):
|
||||
# Ensure we have at least one voice option
|
||||
if not voices:
|
||||
voices.append("[System] Default")
|
||||
|
||||
|
||||
return voices
|
||||
|
||||
def get_available_tts_models(self):
|
||||
"""Get list of available TTS models based on the current API base URL."""
|
||||
# Check if using SiliconFlow
|
||||
is_siliconflow = self.api_base_url and "siliconflow" in self.api_base_url.lower()
|
||||
|
||||
if is_siliconflow:
|
||||
# SiliconFlow TTS models
|
||||
return [
|
||||
("FunAudioLLM/CosyVoice2-0.5B", "CosyVoice2-0.5B (Multi-language, Emotional)"),
|
||||
("tts-1", "TTS-1 (OpenAI Compatible)"),
|
||||
("tts-1-hd", "TTS-1 HD (OpenAI Compatible)")
|
||||
]
|
||||
else:
|
||||
# OpenAI TTS models
|
||||
return [
|
||||
("gpt-4o-mini-tts", "GPT-4o Mini TTS (Recommended)"),
|
||||
("tts-1", "TTS-1 (Standard)"),
|
||||
("tts-1-hd", "TTS-1 HD (High Quality)")
|
||||
]
|
||||
|
||||
def get_siliconflow_voices(self):
|
||||
"""Get SiliconFlow-specific voices for CosyVoice2-0.5B model."""
|
||||
return [
|
||||
'alex', 'anna', 'bella', 'benjamin', 'charles',
|
||||
'claire', 'david', 'diana'
|
||||
]
|
||||
|
||||
def update_available_voices(self):
|
||||
"""Update available voices based on selected TTS model."""
|
||||
tts_model_display = self.tts_model_var.get() if hasattr(self, 'tts_model_var') else ""
|
||||
|
||||
# Get the actual model ID from display name
|
||||
tts_model_id = None
|
||||
available_models = self.get_available_tts_models()
|
||||
for model_id, display_name in available_models:
|
||||
if display_name == tts_model_display:
|
||||
tts_model_id = model_id
|
||||
break
|
||||
|
||||
# If using CosyVoice2-0.5B with SiliconFlow, use SiliconFlow voices
|
||||
if tts_model_id and "CosyVoice2" in tts_model_id and self.api_base_url and "siliconflow" in self.api_base_url.lower():
|
||||
voices = self.get_siliconflow_voices()
|
||||
print(f"[DEBUG] Using SiliconFlow CosyVoice voices: {voices}")
|
||||
# Also add system voices
|
||||
if hasattr(self, 'system_voices') and self.system_voices:
|
||||
for voice in self.system_voices:
|
||||
voices.append(f"[System] {voice.name}")
|
||||
if not voices:
|
||||
voices.append("[System] Default")
|
||||
else:
|
||||
voices = self.get_available_voices()
|
||||
print(f"[DEBUG] Using standard voices")
|
||||
|
||||
# Update the voice dropdown (now using Combobox)
|
||||
if hasattr(self, 'voice_menu'):
|
||||
current_voice = self.voice_var.get()
|
||||
|
||||
# Update the combobox values
|
||||
self.voice_menu['values'] = voices
|
||||
|
||||
# Set default if current voice not in list (unless it's a custom voice)
|
||||
if current_voice not in voices and not (current_voice and not current_voice.startswith("[System]")):
|
||||
self.voice_var.set(voices[0] if voices else "")
|
||||
print(f"[DEBUG] Voice changed to: {voices[0] if voices else ''}")
|
||||
else:
|
||||
print(f"[DEBUG] Voice kept as: {current_voice}")
|
||||
|
||||
def on_voice_change(self, *args):
|
||||
"""Handle voice selection change."""
|
||||
selected_voice = self.voice_var.get()
|
||||
is_system_voice = selected_voice.startswith("[System]")
|
||||
|
||||
|
||||
# Update tone menu state based on voice type
|
||||
if is_system_voice:
|
||||
self.tone_menu.state(['disabled'])
|
||||
@@ -1603,6 +1835,46 @@ class TextToMic(tk.Tk):
|
||||
else:
|
||||
self.tone_menu.state(['!disabled'])
|
||||
|
||||
def on_voice_exit(self, combobox):
|
||||
"""Handle voice combobox focus out - validate and update state."""
|
||||
entered_voice = self.voice_var.get().strip()
|
||||
|
||||
# If empty, set to first available voice
|
||||
if not entered_voice:
|
||||
if hasattr(self, 'voice_menu'):
|
||||
values = self.voice_menu['values']
|
||||
if values:
|
||||
self.voice_var.set(values[0])
|
||||
self.on_voice_change()
|
||||
|
||||
# Switch back to readonly state
|
||||
combobox.config(state="readonly")
|
||||
|
||||
# Trigger voice change to update tone menu
|
||||
self.on_voice_change()
|
||||
|
||||
def on_tts_model_change(self, *args):
|
||||
"""Handle TTS model selection change."""
|
||||
selected_model_display = self.tts_model_var.get()
|
||||
|
||||
# Find the model ID from display name
|
||||
model_id = None
|
||||
available_models = self.get_available_tts_models()
|
||||
for model_id_val, display_name in available_models:
|
||||
if display_name == selected_model_display:
|
||||
model_id = model_id_val
|
||||
break
|
||||
|
||||
# Save the selected model to settings
|
||||
if model_id:
|
||||
SettingsManager.update_settings({"tts_model": model_id})
|
||||
print(f"[DEBUG] TTS model changed to: {model_id}") # Debug logging
|
||||
else:
|
||||
print(f"[DEBUG] Warning: Could not find model ID for display name: {selected_model_display}")
|
||||
|
||||
# Update available voices based on the selected model
|
||||
self.update_available_voices()
|
||||
|
||||
def update_window_size(self):
|
||||
"""Update window size based on current banner and presets state."""
|
||||
# Calculate a width that preserves the current width if it's larger than default
|
||||
|
||||
Reference in New Issue
Block a user