For game development, we need the ability to have MP3 sound play behind the Alexa voice. I suggest an attribute that goes in the audio tag that gives the offset, in seconds, from which Alexa starts speaking. For example, if you wanted to play a 20 second background sound of the ocean but wanted Alexa to start speaking 5 seconds into the audio, you would add "offset=5" into the tag, and that would indicate that 5 seconds after the audio started playing, Alexa would deliver the remainder of the verbal response. (If there was an automatic lowering of the MP3 sound level when Alexa began speaking, that might be a nice feature also.) It's critical for gameplay and audiobook development that verbal and audio be intermeshed. The way it works now is clunky.