- Added support for UserStartedSpeakingFrame to enhance user interaction tracking.
- Updated the pipeline to reset idle prompt count when a user starts speaking, improving responsiveness during conversations.
- Integrated new event handlers for better management of user turn events and upstream frame processing.
- Added idle prompt timeout, maximum count, and text to multiple voice configuration files to improve user interaction during idle periods.
- Updated greeting mode to 'fastgpt_opener' in relevant configurations for a more dynamic greeting experience.
- Introduced a new voice configuration file for xfyun TTS, including detailed service settings and parameters.
- Refactored the pipeline to handle idle prompts and user turn events, ensuring smoother interaction flow.
- Adjusted the VAD and turn configurations to accommodate new idle prompt features.
- Introduced a camera drawer for capturing images during the conversation flow.
- Added prompts for various camera states to guide users through the photo capture process.
- Updated HTML structure to include camera-related elements and integrated them with existing chat functionality.
- Enhanced JavaScript logic to manage camera state and button enabling/disabling based on connection status.
- Updated CSS for styling the camera drawer and its components, ensuring responsive design across devices.
- Adjusted README to reflect the new demo URL for voice functionality.
Parse leading <state> tags from LLM replies and emit response.state over the product websocket while stripping tags from TTS/text streams. Add FastGPT+Xfyun voice configs (including state-enabled preset), SuperTTS support, and context sync for interrupted turns. Refresh the voice demo with a state indicator and collapsible audio delta websocket log groups.
Co-authored-by: Cursor <cursoragent@cursor.com>