This is based on the fact that audio is (will be) somehow part of the timeline and is a referencable object.
Challenge: (O ... Object, A ... Audio file)
O1, O2, O3, A1, O4, O5, A2, ....
There should be the ability that O4 (or any post object past A1) can be linked to A1 and started when A1 has finished. (similar to a referenced morph object)
This would allow to create a descriptive audio text where a following object that is referencing to this audio will only start when the audio has finished. For the referencing object: when the audio has already finished - continue, when the audio is running - wait to continue.