The parameter video_in_enabled=True was missing in DailyParams, which prevented image capture from working. Without this parameter, UserImageRequestFrame would be sent but no actual image data would be captured from participants. This fix enables the "Let me take a look" functionality to work as intended by allowing the transport to capture video frames for vision processing with Moondream.
Moondream Chatbot
This app connects you to a chatbot powered by GPT-4, complete with animations generated by Stable Video Diffusion. The chatbot also has vision powers thanks to Moondream so you can ask it, for example, "what do you see?".
ℹ️ The first time, things might take some time to get started since VAD (Voice Activity Detection) and vision models need to be downloaded.
Get started
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
Run the server
python server.py
Then, visit http://localhost:7860/ in your browser to start a chatbot
session.
Build and test the Docker image
docker build -t moonbot .
docker run --env-file .env -p 7860:7860 moonbot
For Intel GPUs (Arc, Max and Flex series)
docker build -t moonbot -f Dockerfile.intel .
docker run --env-file .env -p 7860:7860 --device /dev/dri moonbot
You can try to visit http://localhost:7860/ again.