We now distinguish between input and output audio and image frames. We introduce `InputAudioRawFrame`, `OutputAudioRawFrame`, `InputImageRawFrame` and `OutputImageRawFrame` (and other subclasses of those). The input frames usually come from an input transport and are meant to be processed inside the pipeline to generate new frames. However, the input frames will not be sent through an output transport. The output frames can also be processed by any frame processor in the pipeline and they are allowed to be sent by the output transport.
Moondream Chatbot
This app connects you to a chatbot powered by GPT-4, complete with animations generated by Stable Video Diffusion. The chatbot also has vision powers thanks to Moondream so you can ask it, for example, "what do you see?".
ℹ️ The first time, things might take some time to get started since VAD (Voice Activity Detection) and vision models need to be downloaded.
Get started
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
Run the server
python server.py
Then, visit http://localhost:7860/start in your browser to start a chatbot
session.
Build and test the Docker image
docker build -t moonbot .
docker run --env-file .env -p 7860:7860 moonbot
For Intel GPUs (Arc, Max and Flex series)
docker build -t moonbot -f Dockerfile.intel .
docker run --env-file .env -p 7860:7860 --device /dev/dri moonbot
You can try to visit http://localhost:7860/start again.