I'm using SSML to play audio clips/effects in my skill. They are hosted on S3 but I'm finding that they are played back with variable latency, resulting in gaps between the audio files (up to half a second?) which is unwanted. I'm effectively doing this:
speak_output = '<speak> Here are some sounds' if (a=b): speak_output += '<audio src="https://s3xxx/file/anEffect.mp3" />' speak_output += '<audio src="https://s3xxx/file/anotherEffect.mp3" />' speak_output += '<audio src="https://s3xxx/file/aThirdEffect.mp3" />' speak_output += '</speak>' print (speak_output) return ( handler_input.response_builder .speak(speak_output) .ask(speak_output) .response )
The order of the files is random and determined by other factors in the skill, so I need to find a way of combining them seamlessly that doesn't introduce audio silences.
Is there a way to combine the mp3s / preload / cache them before playback?
I don't need SSML for this part of the skill if it's better to use another playback solution.