A language researcher at Missouri S&T is studying perceptions of Siri or Alexa’s accented voice settings. Sarah Hercula found that virtual assistants do not sound human enough yet.

She said it boils down to intonation patterns or what word is stressed in the sentence.

“Part of the reason why is due to the intonation patterns,” she explained. “Especially due to the stress, like you were saying. Sometimes those virtual assistants will stress a different word in a sentence than the way a human would stress that sentence. Because of that, it’s quite obvious that these are not human voices.”

Hercula said virtual assistants cannot replicate language perfectly, yet, as the programs are unable to “match the musicality” of human speech.

“At least right now, we’re not seeing the same replicating factors that humans show with these virtual assistants,” she added. “The reason why, we think, is because the voices are not yet human enough. That has to do especially with those intonation and stress patterns.”

The English language can be difficult to master, even for virtual assistants like Siri and Alexa. Even though they have settings to change their accents, it’s not the same. Hercula said that their neural networks cannot replicate language intricacies.

“The way they’re created is they’re based on a human and then neural processor creates the rest of the language, right, extrapolates to all the other sounds that need to be made,” Hercula said. “There is something in that process that’s not perfect yet. That the neural network can’t quite replicate the intricacies of how a human particularly pronounces and stress times their language.”

Intonation is defined as the way vocal pitch rises and falls during speech. Stress patterns refer to which word or syllable is emphasized within sentences.

Her research focuses on user perceptions of virtual assistants’ accented voice settings.

Click here for more information.

Copyright © 2024 Missourinet

Share this: