What part of this do you already have in place? You could design a service where your application is a user whose associated DN is a CTI/RP. This DN sits in its own partition, calling search space. The user dials the number for this DN, and your application accepts the call. Then you could use an xml service to prompt the user for them to press a button and say the email address. Your application does its voice recognition translation (which I assume that you already have working), and then pops to the screen prompting the user to press a button and record the message. Another screen could be used during recording to say FINISH, PAUSE, RESTART .
This has not been done to this level to my knowledge, but I believe that this is possible using a combination of IP Phone services and your voice recognition application. Look to the Developer Support Pages on CCO and download the latest API programming guides.