The Alexa Content action is used to speak a reply to an Alexa Trigger (ie. an Alexa Intent).  If you attach an MP3 file then the MP3 audio will be played instead. Images are only displayed at the moment on large displays (ie. not the Echo Spot).

Connecting Audio

You can attach an MP3 file which will play instead of the text-to-speech but you'll need to encode it according to Amazon's requirements

The image below shows the settings in Adobe Audition.