I doubt it will be a "Stand Alone Program" anytime soon or if it will even do this in "Real Time"...
This is what it takes to do that right now...
"The system is Google?s second official generation of the technology, which consists of two deep neural networks. The first network translates the text into a spectrogram (pdf), a visual way to represent audio frequencies over time. That spectrogram is then fed into WaveNet, a system from Alphabet?s AI research lab DeepMind, which reads the chart and generates the corresponding audio elements accordingly."