SpeechPrompt

  • Overview
  • SpeechPrompt v1
  • SpeechPrompt v2
  • SpeechGen
  • SpeechPrompt Journal

SpeechGen

Unlocking the Generative Power of Speech Language Models with Prompts


  • Paper
  • Code
SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts

Haibin Wu*, Kai-Wei Chang*, Yuan-Kwei Wu*, Hung-yi Lee

National Taiwan University

Email: kaiwei.chang.tw@gmail.com

SpeechGen

Unlocking the Generative Power of Speech Language Models with Prompts

Large language models (LLMs) have gained considerable attention for Artificial Intelligence Generated Content (AIGC), particularly with the emergence of ChatGPT. However, the direct adaptation of continuous speech to LLMs that process discrete tokens remains an unsolved challenge, hindering the application of LLMs for speech generation. The advanced speech LMs are in the corner, as that speech signals encapsulate a wealth of information, including speaker and emotion, beyond textual data alone. Prompt tuning has demonstrated notable gains in parameter efficiency and competitive performance on some speech classification tasks. However, the extent to which prompts can effectively elicit generation tasks from speech LMs remains an open question. In this paper, we present pioneering research that explores the application of prompt tuning to stimulate speech LMs for various generation tasks, within a unified framework called SpeechGen, with around 10M trainable parameters. The proposed unified framework holds great promise for efficiency and effectiveness, particularly with the imminent arrival of advanced speech LMs, which will significantly enhance the capabilities of the framework.

Demo



Speech Translation


Spanish English (Ground Truth) SpeechGen
Your browser does not support the audio element. Your browser does not support the audio element.
The origin of the name of the county is uncertain.
Your browser does not support the audio element.
Origin of the name of the county is uncertain.
Your browser does not support the audio element. Your browser does not support the audio element.
Lastly, the play will devote a reflection to the relationship between art and rebellion.
Your browser does not support the audio element.
And lastly the work will devote a reflection to the relationship between art and rebellion.
Your browser does not support the audio element. Your browser does not support the audio element.
It is around thirty kilometers away from the regional capital city.
Your browser does not support the audio element.
Just one hundred forty kilometers from the regional capital.
Your browser does not support the audio element. Your browser does not support the audio element.
They were easily recognized by the use of the armor and the "Farina" helmet.
Your browser does not support the audio element.
They were frequently recognized for the use of armor and the cascade.
Your browser does not support the audio element. Your browser does not support the audio element.
They played in cover bands but decided to create their own music.
Your browser does not support the audio element.
They played in mandates but they decided to create their own music.


Speech Inpainting


Clean Corrupted SpeechGen
Your browser does not support the audio element.
Would not the loftiest eminences of the city at least be visible?
Your browser does not support the audio element.
Would not the loftiest eminences of the city at lea__ ___ __le?
Your browser does not support the audio element.
Would not the loftiest eminences of the city at least be visible?
Your browser does not support the audio element.
And the whole night the tree stood still and in deep thought.
Your browser does not support the audio element.
And the whole ___ the tree stood still and in deep thought.
Your browser does not support the audio element.
And the whole night the tree stood still and in deep thought.
Your browser does not support the audio element.
If it only were not so dark here and so terribly lonely.
Your browser does not support the audio element.
If it only were not so dark here and so terribly lone__.
Your browser does not support the audio element.
If it only were not so dark here and so terribly lonely.


Speech Continuation


Conditional Ratio Seed Segment SpeechGen
0.25 Your browser does not support the audio element.
He assassinated the president
Your browser does not support the audio element.
He assassinated the president and gave mister johnson the last charge of improvement in his writing possible three point eight nine.
0.25 Your browser does not support the audio element.
If anybody really wanted
Your browser does not support the audio element.
If anybody really wanted to interfere once for some time even if he got anything back to the other folk
0.25 Your browser does not support the audio element.
Under standard procedures
Your browser does not support the audio element.
Under standard procedures the receivers have inspected trade here by this time have recently developed they fully recognize did their business.
0.5 Your browser does not support the audio element.
Who had been greatly upset by her experience was able to view a
Your browser does not support the audio element.
Who had been greatly upset by her experience was able to view a longer time the great farce and receive good care of the mutter.
0.5 Your browser does not support the audio element.
Childless parents widows and helpless orphans broken and controlled
Your browser does not support the audio element.
Childless parents widows and helpless orphans broken and controlled by the master and sentence pursuit life apt to paradise.
0.5 Your browser does not support the audio element.
But these king's witnesses were also put at times into the press yard
Your browser does not support the audio element.
But these king's witnesses were also put at times into the press yard and charged with the service available on a second charge to them.
0.75 Your browser does not support the audio element.
And the obvious bulk of the package which he intended to bring to work
Your browser does not support the audio element.
And the obvious bulk of the package which he intended to bring to work was confirmed
0.75 Your browser does not support the audio element.
Then they set to building and began by bricking the borders of the moat after which they proceeded
Your browser does not support the audio element.
Then they set to building and began by bricking the borders of the moat after which they proceeded to our own places
0.75 Your browser does not support the audio element.
Still watching and waiting for the first chance they ceased when the clerks
Your browser does not support the audio element.
Still watching and waiting for the first chance they ceased when the clerks left the office

Citation

@misc{wu2023speechgen,
	title={SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts}, 
	author={Haibin Wu and Kai-Wei Chang and Yuan-Kuei Wu and Hung-yi Lee},
	year={2023},
	eprint={2306.02207},
	archivePrefix={arXiv},
	primaryClass={eess.AS}
}