VuiClient Service Interface
Introduction
Section titled “Introduction”The audio and lighting-related hardware facilities embedded in the upper body of G1 enable the robot to achieve voice interaction capabilities,
including:
- RGB light strip - 256 colors
- Speaker - 8Ω 3W (5W peak)
- Four - microphone array. microphone spacing of 20mm (quasi - linear, silicon microphones)

Secondary Development
Section titled “Secondary Development”Software service version requirement: Vui_Service >= 2.0.3.8, Vui Module>= 2.0.0.3. If the built - in service version is low, please contact technical support to upgrade to the correct version.
The audio and lighting - related interfaces currently mainly provide the following capabilities. Users can use the following combinations to develop their own voice interaction programs.
- ASR (Automatic Speech Recognition), default non - streaming model, local offline. Can include azimuth, emotion, speaker role and other information.
- TTS (Text - to - Speech), local offline. Pronunciation role is selectable, currently only supports Chinese.
- Audio stream control
- Volume control
- Light strip color control
AudioClient Class
Section titled “AudioClient Class”The AudioClient class can implement functions such as text - to - speech, audio control/playback, and lighting control.
Interface List:
Section titled “Interface List:”| Function Name | TtsMaker |
|---|---|
| Function Prototype | int32_t TtsMaker(const std::string& text, int32_t speaker_id) |
| Function Overview | Text - to - speech conversion |
| Parameters | text: Text speaker_id: Role ID |
| Return Value | Returns 0 if the call is successful, otherwise returns the relevant error code. |
| Remarks | speaker_id 0 for Chinese roles and 1 for English roles. Mixed Chinese and English modes are not supported. |
| speaker_id 0 | 你好。我是宇树科技的机器人。例程启动成功 |
| speaker_id 1 | Hello. I’m a robot from Unitree Robotics. The example has started successfully. |
| Function Name | GetVolume |
|---|---|
| Function Prototype | int32_t GetVolume(uint8_t &volume) |
| Function Overview | Get the system volume |
| Parameters | volume: Volume level (0 - 100) |
| Return Value | Returns 0 if the call is successful, otherwise returns the relevant error code. |
| Remarks |
| Function Name | SetVolume |
|---|---|
| Function Prototype | int32_t SetVolume(uint8_t volume) |
| Function Overview | Set the system volume |
| Parameters | volume: Volume level (0 - 100) |
| Return Value | Returns 0 if the call is successful, otherwise returns the relevant error code. |
| Remarks |
| Function Name | LedControl |
|---|---|
| Function Prototype | int32_t LedControl(uint8_t R, uint8_t G, uint8_t B) |
| Function Overview | Light strip control |
| Parameters | R: Red (0 - 255) G: Green (0 - 255) B: Blue (0 - 255) |
| Return Value | Returns 0 if the call is successful, otherwise returns the relevant error code. |
| Remarks | The interval between calls to this interface must be greater than 200ms. |
| Function Name | PlayStream |
|---|---|
| Function Prototype | int32_t PlayStream(std::string app_name, std::string stream_id, std::vector<uint8_t> pcm_data) |
| Function Overview | Audio stream playback |
| Parameters | app_name: Application name stream_id: Identification ID, the same ID means continuous playback from cache, different IDs mean interrupting the current playback pcm_data: PCM format, sampling rate 16K, single - channel, 16 - bit |
| Return Value | Returns 0 if the call is successful, otherwise returns the relevant error code. |
| Remarks | Please pay attention to the audio format. |
| Function Name | PlayStop |
|---|---|
| Function Prototype | int32_t PlayStop(std::string app_name) |
| Function Overview | Stop playback |
| Parameters | app_name: Application name |
| Return Value | Returns 0 if the call is successful, otherwise returns the relevant error code. |
| Remarks |
ASR Messages / Audio Play State
Section titled “ASR Messages / Audio Play State”When the robot’s microphone is turned on (switch to the wake - up mode in the APP or remote control), the built - in microphone + ASR module will recognize the human voice in the environment.
Subscribe to the topic rt/audio_msg (class type: std_msgs::msg::dds_::String_) to obtain the recognition information provided by the built-in offline ASR module.
{ "index": 1, "timestamp":29319303490 "text": "Hello", "angle": 90, "speaker_id": 0, "sense": "unknown", "confidence": 0.95, "language": "en - US", "is_final": true}| Parameter Name | Parameter Type | Meaning |
|---|---|---|
| index | Integer | Unique message sequence number |
| timestamp | Integer | Timestamp |
| text | String | Speech recognition result |
| angle | Integer | Azimuth angle 0 - 180 |
| speaker_id | Integer | Speaker recognition result |
| sense | String | Emotion recognition result |
| confidence | Float | Confidence level |
| language | String | Language type |
| is_final | Integer | End flag (used in the streaming recognition mode, non - streaming by default) |
Play State
{ "play_state": 1}| Parameter Name | Parameter Type | Meaning |
|---|---|---|
| play_state | Integer | 0:play stop 1:start play |
#include <fstream>#include <iostream>#include <thread>#include <unitree/common/time/time_tool.hpp>#include <unitree/idl/ros2/String_.hpp>#include <unitree/robot/channel/channel_subscriber.hpp>#include <unitree/robot/g1/audio/g1_audio_client.hpp>
#include "wav.hpp"
#define AUDIO_FILE_PATH "../example/g1/audio/test.wav"#define AUDIO_SUBSCRIBE_TOPIC "rt/audio_msg"#define GROUP_IP "239.168.123.161"#define PORT 5555
#define WAV_SECOND 5 // record seconds#define WAV_LEN (16000 * 2 * WAV_SECOND)int sock;
void asr_handler(const void *msg) { std_msgs::msg::dds_::String_ *resMsg = (std_msgs::msg::dds_::String_ *)msg; std::cout << "Topic:\"rt/audio_msg\" recv: " << resMsg->data() << std::endl;}
std::string get_local_ip_for_multicast() { struct ifaddrs *ifaddr, *ifa; char host[NI_MAXHOST]; std::string result = "";
getifaddrs(&ifaddr); for (ifa = ifaddr; ifa != nullptr; ifa = ifa->ifa_next) { if (!ifa->ifa_addr || ifa->ifa_addr->sa_family != AF_INET) continue; getnameinfo(ifa->ifa_addr, sizeof(struct sockaddr_in), host, NI_MAXHOST, NULL, 0, NI_NUMERICHOST); std::string ip(host); if (ip.find("192.168.123.") == 0) { result = ip; break; } } freeifaddrs(ifaddr); return result;}
void thread_mic(void) { sock = socket(AF_INET, SOCK_DGRAM, 0); sockaddr_in local_addr{}; local_addr.sin_family = AF_INET; local_addr.sin_port = htons(PORT); local_addr.sin_addr.s_addr = INADDR_ANY; bind(sock, (sockaddr *)&local_addr, sizeof(local_addr));
ip_mreq mreq{}; inet_pton(AF_INET, GROUP_IP, &mreq.imr_multiaddr); std::string local_ip = get_local_ip_for_multicast(); std::cout << "local ip: "<<local_ip << std::endl; mreq.imr_interface.s_addr = inet_addr(local_ip.c_str()); setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));
int total_bytes = 0; std::vector<int16_t> pcm_data; pcm_data.reserve(WAV_LEN / 2); std::cout << "start record!" << std::endl; while (total_bytes < WAV_LEN) { char buffer[2048]; ssize_t len = recvfrom(sock, buffer, sizeof(buffer), 0, nullptr, nullptr); if (len > 0) { size_t sample_count = len / 2; const int16_t *samples = reinterpret_cast<const int16_t *>(buffer); pcm_data.insert(pcm_data.end(), samples, samples + sample_count); total_bytes += len; } }
WriteWave("record.wav", 16000, pcm_data.data(), pcm_data.size(), 1); std::cout << "record finish! save to record.wav " << std::endl;}
int main(int argc, char const *argv[]) { if (argc < 2) { std::cout << "Usage: audio_client_example [NetWorkInterface(eth0)]" << std::endl; exit(0); } int32_t ret; /* * Initilaize ChannelFactory */ unitree::robot::ChannelFactory::Instance()->Init(0, argv[1]); unitree::robot::g1::AudioClient client; client.Init(); client.SetTimeout(10.0f);
/*ASR message Example*/ unitree::robot::ChannelSubscriber<std_msgs::msg::dds_::String_> subscriber( AUDIO_SUBSCRIBE_TOPIC); subscriber.InitChannel(asr_handler);
/*Volume Example*/ uint8_t volume; ret = client.GetVolume(volume); std::cout << "GetVolume API ret:" << ret << " volume = " << std::to_string(volume) << std::endl; ret = client.SetVolume(100); std::cout << "SetVolume to 100% , API ret:" << ret << std::endl;
/*TTS Example*/ ret = client.TtsMaker("你好。我是宇树科技的机器人。例程启动成功", 0); // Auto play std::cout << "TtsMaker API ret:" << ret << std::endl; unitree::common::Sleep(5);
ret = client.TtsMaker( "Hello. I'm a robot from Unitree Robotics. The example has started " "successfully. ", 1); // Engilsh TTS std::cout << "TtsMaker API ret:" << ret << std::endl; unitree::common::Sleep(8);
/*Audio Play Example*/ int32_t sample_rate = -1; int8_t num_channels = 0; bool filestate = false; std::vector<uint8_t> pcm = ReadWave(AUDIO_FILE_PATH, &sample_rate, &num_channels, &filestate);
std::cout << "wav file sample_rate = " << sample_rate << " num_channels = " << std::to_string(num_channels) << " filestate =" << filestate << std::endl;
if (filestate && sample_rate == 16000 && num_channels == 1) { client.PlayStream( "example", std::to_string(unitree::common::GetCurrentTimeMillisecond()), pcm); std::cout << "start play stream" << std::endl; unitree::common::Sleep(3); std::cout << "stop play stream" << std::endl; ret = client.PlayStop("example"); } else { std::cout << "audio file format error, please check!" << std::endl; }
/*LED Control Example*/ client.LedControl(0, 255, 0); unitree::common::Sleep(1); client.LedControl(0, 0, 0); unitree::common::Sleep(1); client.LedControl(0, 0, 255);
std::cout << "AudioClient api test finish , asr start..." << std::endl;
std::thread mic_t(thread_mic);
while (1) { sleep(1); // wait for asr message } mic_t.join(); return 0;}