Skip to content

VuiClient Service Interface

The audio and lighting-related hardware facilities embedded in the upper body of G1 enable the robot to achieve voice interaction capabilities,

including:

  • RGB light strip - 256 colors
  • Speaker - 8Ω 3W (5W peak)
  • Four - microphone array. microphone spacing of 20mm (quasi - linear, silicon microphones)

1

Software service version requirement: Vui_Service >= 2.0.3.8, Vui Module>= 2.0.0.3. If the built - in service version is low, please contact technical support to upgrade to the correct version.

The audio and lighting - related interfaces currently mainly provide the following capabilities. Users can use the following combinations to develop their own voice interaction programs.

  • ASR (Automatic Speech Recognition), default non - streaming model, local offline. Can include azimuth, emotion, speaker role and other information.
  • TTS (Text - to - Speech), local offline. Pronunciation role is selectable, currently only supports Chinese.
  • Audio stream control
  • Volume control
  • Light strip color control

The AudioClient class can implement functions such as text - to - speech, audio control/playback, and lighting control.

Function Name TtsMaker
Function Prototype int32_t TtsMaker(const std::string& text, int32_t speaker_id)
Function Overview Text - to - speech conversion
Parameters text: Text
speaker_id: Role ID
Return Value Returns 0 if the call is successful, otherwise returns the relevant error code.
Remarks speaker_id 0 for Chinese roles and 1 for English roles. Mixed Chinese and English modes are not supported.
speaker_id 0 你好。我是宇树科技的机器人。例程启动成功
speaker_id 1 Hello. I’m a robot from Unitree Robotics. The example has started successfully.
Function Name GetVolume
Function Prototype int32_t GetVolume(uint8_t &volume)
Function Overview Get the system volume
Parameters volume: Volume level (0 - 100)
Return Value Returns 0 if the call is successful, otherwise returns the relevant error code.
Remarks
Function Name SetVolume
Function Prototype int32_t SetVolume(uint8_t volume)
Function Overview Set the system volume
Parameters volume: Volume level (0 - 100)
Return Value Returns 0 if the call is successful, otherwise returns the relevant error code.
Remarks
Function Name LedControl
Function Prototype int32_t LedControl(uint8_t R, uint8_t G, uint8_t B)
Function Overview Light strip control
Parameters R: Red (0 - 255)
G: Green (0 - 255)
B: Blue (0 - 255)
Return Value Returns 0 if the call is successful, otherwise returns the relevant error code.
Remarks The interval between calls to this interface must be greater than 200ms.
Function Name PlayStream
Function Prototype int32_t PlayStream(std::string app_name, std::string stream_id, std::vector<uint8_t> pcm_data)
Function Overview Audio stream playback
Parameters app_name: Application name
stream_id: Identification ID, the same ID means continuous playback from cache, different IDs mean interrupting the current playback
pcm_data: PCM format, sampling rate 16K, single - channel, 16 - bit
Return Value Returns 0 if the call is successful, otherwise returns the relevant error code.
Remarks Please pay attention to the audio format.
Function Name PlayStop
Function Prototype int32_t PlayStop(std::string app_name)
Function Overview Stop playback
Parameters app_name: Application name
Return Value Returns 0 if the call is successful, otherwise returns the relevant error code.
Remarks

When the robot’s microphone is turned on (switch to the wake - up mode in the APP or remote control), the built - in microphone + ASR module will recognize the human voice in the environment.

Subscribe to the topic rt/audio_msg (class type: std_msgs::msg::dds_::String_) to obtain the recognition information provided by the built-in offline ASR module.

{
"index": 1,
"timestamp":29319303490
"text": "Hello",
"angle": 90,
"speaker_id": 0,
"sense": "unknown",
"confidence": 0.95,
"language": "en - US",
"is_final": true
}
Parameter Name Parameter Type Meaning
index Integer Unique message sequence number
timestamp Integer Timestamp
text String Speech recognition result
angle Integer Azimuth angle 0 - 180
speaker_id Integer Speaker recognition result
sense String Emotion recognition result
confidence Float Confidence level
language String Language type
is_final Integer End flag (used in the streaming recognition mode, non - streaming by default)

Play State

{
"play_state": 1
}
Parameter Name Parameter Type Meaning
play_state Integer 0:play stop 1:start play
#include <fstream>
#include <iostream>
#include <thread>
#include <unitree/common/time/time_tool.hpp>
#include <unitree/idl/ros2/String_.hpp>
#include <unitree/robot/channel/channel_subscriber.hpp>
#include <unitree/robot/g1/audio/g1_audio_client.hpp>
#include "wav.hpp"
#define AUDIO_FILE_PATH "../example/g1/audio/test.wav"
#define AUDIO_SUBSCRIBE_TOPIC "rt/audio_msg"
#define GROUP_IP "239.168.123.161"
#define PORT 5555
#define WAV_SECOND 5 // record seconds
#define WAV_LEN (16000 * 2 * WAV_SECOND)
int sock;
void asr_handler(const void *msg) {
std_msgs::msg::dds_::String_ *resMsg = (std_msgs::msg::dds_::String_ *)msg;
std::cout << "Topic:\"rt/audio_msg\" recv: " << resMsg->data() << std::endl;
}
std::string get_local_ip_for_multicast() {
struct ifaddrs *ifaddr, *ifa;
char host[NI_MAXHOST];
std::string result = "";
getifaddrs(&ifaddr);
for (ifa = ifaddr; ifa != nullptr; ifa = ifa->ifa_next) {
if (!ifa->ifa_addr || ifa->ifa_addr->sa_family != AF_INET) continue;
getnameinfo(ifa->ifa_addr, sizeof(struct sockaddr_in), host, NI_MAXHOST, NULL, 0, NI_NUMERICHOST);
std::string ip(host);
if (ip.find("192.168.123.") == 0) {
result = ip;
break;
}
}
freeifaddrs(ifaddr);
return result;
}
void thread_mic(void) {
sock = socket(AF_INET, SOCK_DGRAM, 0);
sockaddr_in local_addr{};
local_addr.sin_family = AF_INET;
local_addr.sin_port = htons(PORT);
local_addr.sin_addr.s_addr = INADDR_ANY;
bind(sock, (sockaddr *)&local_addr, sizeof(local_addr));
ip_mreq mreq{};
inet_pton(AF_INET, GROUP_IP, &mreq.imr_multiaddr);
std::string local_ip = get_local_ip_for_multicast();
std::cout << "local ip: "<<local_ip << std::endl;
mreq.imr_interface.s_addr = inet_addr(local_ip.c_str());
setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));
int total_bytes = 0;
std::vector<int16_t> pcm_data;
pcm_data.reserve(WAV_LEN / 2);
std::cout << "start record!" << std::endl;
while (total_bytes < WAV_LEN) {
char buffer[2048];
ssize_t len = recvfrom(sock, buffer, sizeof(buffer), 0, nullptr, nullptr);
if (len > 0) {
size_t sample_count = len / 2;
const int16_t *samples = reinterpret_cast<const int16_t *>(buffer);
pcm_data.insert(pcm_data.end(), samples, samples + sample_count);
total_bytes += len;
}
}
WriteWave("record.wav", 16000, pcm_data.data(), pcm_data.size(), 1);
std::cout << "record finish! save to record.wav " << std::endl;
}
int main(int argc, char const *argv[]) {
if (argc < 2) {
std::cout << "Usage: audio_client_example [NetWorkInterface(eth0)]"
<< std::endl;
exit(0);
}
int32_t ret;
/*
* Initilaize ChannelFactory
*/
unitree::robot::ChannelFactory::Instance()->Init(0, argv[1]);
unitree::robot::g1::AudioClient client;
client.Init();
client.SetTimeout(10.0f);
/*ASR message Example*/
unitree::robot::ChannelSubscriber<std_msgs::msg::dds_::String_> subscriber(
AUDIO_SUBSCRIBE_TOPIC);
subscriber.InitChannel(asr_handler);
/*Volume Example*/
uint8_t volume;
ret = client.GetVolume(volume);
std::cout << "GetVolume API ret:" << ret
<< " volume = " << std::to_string(volume) << std::endl;
ret = client.SetVolume(100);
std::cout << "SetVolume to 100% , API ret:" << ret << std::endl;
/*TTS Example*/
ret = client.TtsMaker("你好。我是宇树科技的机器人。例程启动成功",
0); // Auto play
std::cout << "TtsMaker API ret:" << ret << std::endl;
unitree::common::Sleep(5);
ret = client.TtsMaker(
"Hello. I'm a robot from Unitree Robotics. The example has started "
"successfully. ",
1); // Engilsh TTS
std::cout << "TtsMaker API ret:" << ret << std::endl;
unitree::common::Sleep(8);
/*Audio Play Example*/
int32_t sample_rate = -1;
int8_t num_channels = 0;
bool filestate = false;
std::vector<uint8_t> pcm =
ReadWave(AUDIO_FILE_PATH, &sample_rate, &num_channels, &filestate);
std::cout << "wav file sample_rate = " << sample_rate
<< " num_channels = " << std::to_string(num_channels)
<< " filestate =" << filestate << std::endl;
if (filestate && sample_rate == 16000 && num_channels == 1) {
client.PlayStream(
"example", std::to_string(unitree::common::GetCurrentTimeMillisecond()),
pcm);
std::cout << "start play stream" << std::endl;
unitree::common::Sleep(3);
std::cout << "stop play stream" << std::endl;
ret = client.PlayStop("example");
} else {
std::cout << "audio file format error, please check!" << std::endl;
}
/*LED Control Example*/
client.LedControl(0, 255, 0);
unitree::common::Sleep(1);
client.LedControl(0, 0, 0);
unitree::common::Sleep(1);
client.LedControl(0, 0, 255);
std::cout << "AudioClient api test finish , asr start..." << std::endl;
std::thread mic_t(thread_mic);
while (1) {
sleep(1); // wait for asr message
}
mic_t.join();
return 0;
}