离线语音 Snowboy 热词唤醒 + 树莓派语音交互实现开关灯
离线语音Snowboy热词唤醒+ 树莓派语音交互实现开关灯 – osc_tdheup3x的个人空间 – OSCHINA
#离线语音 Snowboy 热词唤醒
语音识别现在有非常广泛的应用场景,如手机的语音助手,智能音响 (小爱,叮咚,天猫精灵…) 等。语音识别一般包含三个阶段:热词唤醒,语音录入,识别和逻辑控制阶段.
热词唤醒就是唤醒设备,让设备解析你接下来说的话。通常设备一直在录入周围的声音,但是设备此时不会有任何反应。当通过像「Hi,Siri」这样的唤醒词被唤醒以后,设备就开始处理接下来的声音了。热词唤醒是语音识别的开始。
Snowboy 是比较流行的热词唤醒框架,目前已经被百度收购。Snowboy 对中文支持友好,相对 Pocketsphinx 配置使用较为简单,推荐使用。
snowboy 官方文档地址 [英文的] http://docs.kitt.ai/snowboy
安装
一、获取源代码并编译
安装依赖
树莓派原生的音频设备是不支持语音输入的(无法录音),需要在网上购买一支免驱动的 USB 音频驱动,一般插上即可直接使用。 建议安装下 pulseaudio
软件,减少音频配置的步骤: $ sudo apt-get install pulseaudio
安装 sox
软件测试录音与播放功能: $ sudo apt-get install sox
安装完成后运行 sox -d -d
命令,对着麦克风说话,确认可以听到自己的声音。
安装其他软件依赖:
- 安装 PyAudio:
$ sudo apt-get install python3-pyaudio
- 安装 SWIG(>3.0.10):
$ sudo apt-get install swig
- 安装 ATLS:
$ sudo apt-get install libatls-base-dev
编译源代码
获取源代码:$ git clone https://github.com/Kitt-AI/snowboy.git
编译 Python3 绑定:$ cd snowboy/swig/Python3 && make
测试:
如果使用的是树莓派,你还需要在 ~/.asoundrc
更改声卡设置:
1 2 3 4 5 6 7 8 9 10 11 |
<span class="hljs-class"><span class="hljs-keyword">type</span> <span class="hljs-title">asym</span></span> playback.pcm { <span class="hljs-class"><span class="hljs-keyword">type</span> <span class="hljs-title">plug</span></span> slave.pcm <span class="hljs-string">"hw:0,0"</span> } capture.pcm { <span class="hljs-class"><span class="hljs-keyword">type</span> <span class="hljs-title">plug</span></span> slave.pcm <span class="hljs-string">"hw:1,0"</span> } } |
进入官方示例目录 snowboy/examples/Python3
并运行以下命令: $ python3 demo.py resources/models/snowboy.umdl
( 命令中的 snowboy.umdl
文件即语音识别模型)
然后对着麦克风清晰地讲出 **“snowboy”**,如果可以听到 “滴” 的声音,则安装配置成功。
PS:官方源代码使用 Python3 测试有报错,经测试需修改 snowboy/examples/Python3
目录下的 snowboydecoder.py
文件。 将第 5 行代码 from * import snowboydetect
改为 import snowboydetect
即可直接运行。
快速开始
GitHub 上有比较详细的 Demo,强烈建议先看看。先创建一个 HotwordDetect 类,这个类包含唤醒模型,声音增益,灵敏度等参数。然后初始化 Detector 对象,Snowboy 的 Detector 类存在下载下来的源码里。训练模型可以是单个,也可以是列表形式。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<span class="hljs-keyword">from</span> .. <span class="hljs-keyword">import</span> snowboydetect <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">HotwordDetect</span><span class="hljs-params">(object)</span>:</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, decoder_model, resource, sensitivity=<span class="hljs-number">0.38</span>, audio_gain=<span class="hljs-number">1</span>)</span>:</span> <span class="hljs-string">"""init"""</span> self.detector = snowboydetect.SnowboyDetect( resource_filename=resource.encode(), model_str=decoder_model.encode()) self.detector.SetAudioGain(audio_gain) |
初始化以后可以创建启动方法,启动方法一般会指定一个唤醒回调函数,也就是 「Hi,Siri」之后可能出现的「叮」声;还可以指定录音回调函数,也就是设备唤醒以后你需要用这些声音去干什么:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">HotwordDetect</span><span class="hljs-params">(object)</span>:</span> ... <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">listen</span><span class="hljs-params">(self, detected_callback, interrupt_check=lambda: False, audio_recorder_callback)</span>:</span> <span class="hljs-string">"""begin to listen"""</span> ... state = <span class="hljs-string">"PASSIVE"</span> <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>: status = self.detector.RunDetection(data) ... <span class="hljs-keyword">if</span> state == <span class="hljs-string">"PASSIVE"</span>: tetected_callback() state = <span class="hljs-string">"ACTIVE"</span> <span class="hljs-keyword">continue</span> <span class="hljs-keyword">elif</span> state == <span class="hljs-string">"ACTIVE"</span>: audio_recorder_callback() state = <span class="hljs-string">"ACTIVE"</span> <span class="hljs-keyword">continue</span> |
这里的逻辑可以自己去定义,主要是在两个状态间切换,当设备接收到唤醒词以后,status 会指出被识别到的唤醒词的序号,比如你定义了 「Siri」和 「Xiaowei」两个唤醒词,status 为 1 就表示 Siri 被唤醒,status 为 2 就表示 Xiaowei 被唤醒。然后将状态改成激活状态,这个时候执行 audio_recorder_callback 方法,执行完后将状态切换回唤醒状态。
在线语音识别
当设备被唤醒以后,你可以拿到录音数据去做任何想做的事情,包括调取百度等语音识别接口。这些逻辑都包含在 audio_recorder_callback 回调方法中。需要注意的是 Snowboy 目前只支持 16000 的录音采样率,其他采样率的录音数据都不能使用,你可以通过两种办法来解决:
- 使用支持 16000 采样率的声卡
- 进行录音数据的采样率转换
目前比较大的两家声卡芯片公司 C-Media 和 RealTek 一般产品都是 48k 以上的,支持 16k 的芯片一般比较贵,可能到 60 元左右。「绿联」有两款产品可以支持,购买时请查看产品参数,对照芯片公司的产品型号是否支持 16k 采样。
声音模型的训练
官方提供两种模式进行个性化声音模型创建:
- website。只要你有 GitHub,Google 和 Facebook 帐号中的一种,登录就可以录音完成训练。
- train-api。根据文档传指定的参数就可以完成训练,api 返回给你升学模型的数据。
这两种方式获得的都是私人的声音模型,获取的是 .pmdl
的文件形式。一般化的 universal 模型不提供,需要联系官方商业合作。获取到的模型,越多人测试准确率越高,为了提高准确率,你可以邀请更多人来测试你的模型。还有麦克风的种类也会影响准确度,在什么设备上使用就在那个设备上训练模型能提高准确率。语音识别是一个比较精尖的技术,需要注意很多问题,正如 ChenGuo 说的:
Speech Recognition is not that easy.
在自己的项目中使用
将以下文件复制到自己的项目目录下:
- 下载好的 model.pmdl 模型文件
snowboy/swig/Python3
目录下编译好的_snowboydetect.so
库snowboy/examples/Python3
目录下的demo.py
、snowboydecoder.py
、snowboydetect.py
文件以及resources
目录- 在项目目录下执行
$ python3 demo.py model.pmdl
并使用自己的唤醒词进行测试
orangePi 下使用语音识别来实现语音开关灯,需要联网使用.
gpio.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
<span class="hljs-string">""" @version: ?? @author: lvusyy @license: Apache Licence @contact: lvusyy@gmail.com @site: https://github.com/lvusyy/ @software: PyCharm @file: gpio.py @time: 2018/3/13 18:45 """</span> <span class="hljs-keyword">import</span> wiringpi <span class="hljs-keyword">as</span> wp <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">GPIO</span><span class="hljs-params">()</span>:</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self)</span>:</span> self.wp=wp wp.wiringPiSetupGpio() <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">setPinMode</span><span class="hljs-params">(self,pin,mode)</span>:</span> self.wp.pinMode(pin,mode) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">setV</span><span class="hljs-params">(self,pin,v)</span>:</span> self.wp.digitalWrite(pin,v) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">getV</span><span class="hljs-params">(self,pin)</span>:</span> <span class="hljs-keyword">return</span> self.wp.digitalRead(pin) |
之前案例修改了以下. control.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
<span class="hljs-string">""" @version: ?? @author: lvusyy @license: Apache Licence @contact: lvusyy@gmail.com @site: https://github.com/lvusyy/ @software: PyCharm @file: control.py @time: 2018/3/13 17:30 """</span> <span class="hljs-keyword">import</span> os <span class="hljs-keyword">import</span> sys sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) <span class="hljs-keyword">import</span> time <span class="hljs-keyword">import</span> pyaudio <span class="hljs-keyword">import</span> wave <span class="hljs-keyword">import</span> pygame <span class="hljs-keyword">import</span> snowboydecoder <span class="hljs-keyword">import</span> signal <span class="hljs-keyword">from</span> gpio <span class="hljs-keyword">import</span> GPIO <span class="hljs-keyword">from</span> aip <span class="hljs-keyword">import</span> AipSpeech APP_ID = <span class="hljs-string">'109472xxx'</span> API_KEY = <span class="hljs-string">'d3zd5wuaMrL21IusNqdQxxxx'</span> SECRET_KEY = <span class="hljs-string">'84e98541331eb1736ad80457b4faxxxx'</span> APIClient = AipSpeech(APP_ID, API_KEY, SECRET_KEY) interrupted = <span class="hljs-literal">False</span> CHUNK = <span class="hljs-number">1024</span> FORMAT = pyaudio.paInt16 CHANNELS = <span class="hljs-number">1</span> RATE = <span class="hljs-number">16000</span> RECORD_SECONDS = <span class="hljs-number">5</span> WAVE_OUTPUT_FILENAME = <span class="hljs-string">"./myvoice.pcm"</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Light</span><span class="hljs-params">()</span>:</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self)</span>:</span> self.pin=<span class="hljs-number">18</span> self.mode=<span class="hljs-number">1</span> self.mgpio=GPIO() self.mgpio.setPinMode(pin=self.pin,mode=<span class="hljs-number">1</span>) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">on</span><span class="hljs-params">(self)</span>:</span> <span class="hljs-string">''</span> self.mgpio.setV(self.pin,self.mode) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">off</span><span class="hljs-params">(self)</span>:</span> <span class="hljs-string">''</span> self.mgpio.setV(self.pin,self.mode&<span class="hljs-number">0</span>) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">status</span><span class="hljs-params">(self)</span>:</span> <span class="hljs-keyword">return</span> self.mgpio.getV(self.pin) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_file_content</span><span class="hljs-params">(filePath)</span>:</span> <span class="hljs-keyword">with</span> open(filePath, <span class="hljs-string">'rb'</span>) <span class="hljs-keyword">as</span> fp: <span class="hljs-keyword">return</span> fp.read() <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">word_to_voice</span><span class="hljs-params">(text)</span>:</span> result = APIClient.synthesis(text, <span class="hljs-string">'zh'</span>, <span class="hljs-number">1</span>, { <span class="hljs-string">'vol'</span>: <span class="hljs-number">5</span>, <span class="hljs-string">'spd'</span>: <span class="hljs-number">3</span>, <span class="hljs-string">'per'</span>: <span class="hljs-number">3</span>}) <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> isinstance(result, dict): <span class="hljs-keyword">with</span> open(<span class="hljs-string">'./audio.mp3'</span>, <span class="hljs-string">'wb'</span>) <span class="hljs-keyword">as</span> f: f.write(result) f.close() time.sleep(<span class="hljs-number">.2</span>) pygame.mixer.music.load(<span class="hljs-string">'./audio.mp3'</span>) pygame.mixer.music.play() <span class="hljs-keyword">while</span> pygame.mixer.music.get_busy() == <span class="hljs-literal">True</span>: print(<span class="hljs-string">'waiting'</span>) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_mic_voice_file</span><span class="hljs-params">(p)</span>:</span> word_to_voice(<span class="hljs-string">'请说开灯或关灯.'</span>) stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=<span class="hljs-literal">True</span>, frames_per_buffer=CHUNK) print(<span class="hljs-string">"* recording"</span>) frames = [] <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, int(RATE / CHUNK * RECORD_SECONDS)): data = stream.read(CHUNK) frames.append(data) print(<span class="hljs-string">"* done recording"</span>) stream.stop_stream() stream.close() wf = wave.open(WAVE_OUTPUT_FILENAME, <span class="hljs-string">'wb'</span>) wf.setnchannels(CHANNELS) wf.setsampwidth(p.get_sample_size(FORMAT)) wf.setframerate(RATE) wf.writeframes(<span class="hljs-string">b''</span>.join(frames)) wf.close() print(<span class="hljs-string">'recording finished'</span>) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">baidu_get_words</span><span class="hljs-params">(client)</span>:</span> results = client.asr(get_file_content(WAVE_OUTPUT_FILENAME), <span class="hljs-string">'pcm'</span>, <span class="hljs-number">16000</span>, { <span class="hljs-string">'dev_pid'</span>: <span class="hljs-number">1536</span>, }) words=results[<span class="hljs-string">'result'</span>][<span class="hljs-number">0</span>] <span class="hljs-keyword">return</span> words <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">signal_handler</span><span class="hljs-params">(signal, frame)</span>:</span> <span class="hljs-keyword">global</span> interrupted interrupted = <span class="hljs-literal">True</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">interrupt_callback</span><span class="hljs-params">()</span>:</span> <span class="hljs-keyword">global</span> interrupted <span class="hljs-keyword">return</span> interrupted <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">callbacks</span><span class="hljs-params">()</span>:</span> <span class="hljs-keyword">global</span> detector pygame.mixer.music.load(<span class="hljs-string">'./resources/ding.wav'</span>) pygame.mixer.music.play() <span class="hljs-keyword">while</span> pygame.mixer.music.get_busy() == <span class="hljs-literal">True</span>: print(<span class="hljs-string">'waiting'</span>) detector.terminate() get_mic_voice_file(p) rText=baidu_get_words(client=APIClient) <span class="hljs-keyword">if</span> rText.find(<span class="hljs-string">"开灯"</span>)!=<span class="hljs-number">-1</span>: light.on() <span class="hljs-keyword">elif</span> rText.find(<span class="hljs-string">"关灯"</span>)!=<span class="hljs-number">-1</span>: light.off() wake_up() <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">wake_up</span><span class="hljs-params">()</span>:</span> <span class="hljs-keyword">global</span> detector model = <span class="hljs-string">'./resources/models/snowboy.umdl'</span> signal.signal(signal.SIGINT, signal_handler) detector = snowboydecoder.HotwordDetector(model, sensitivity=<span class="hljs-number">0.5</span>) print(<span class="hljs-string">'Listening... please say wake-up word:SnowBoy'</span>) detector.start(detected_callback=callbacks, interrupt_check=interrupt_callback, sleep_time=<span class="hljs-number">0.03</span>) detector.terminate() <span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>: pygame.mixer.init() p = pyaudio.PyAudio() light=Light() wake_up() |
相关参考文档:
http://docs.kitt.ai/snowboy/#api-v1-train