How Amazon Web Services Brings the World Together?
Amazon Web Services provides an innovative way to merge and bridge the gaps between countries separated by differences in communication. AWS Translate , AWS Polly and AWS Transcribe offers the state of the Art Speech to text, Speech translation and Vocal narration services.
The world harvesting a population strength of over 7.5 billion and over a hundred lingua franca make it rather quite a complex world to live in. Every nation every society perceives its own ability to create speeches or rather express its unique thoughts. The language might be the only common medium of thought as well as a stigma to our actions that we express resembling a unified species. By design, every language has its own phonetic expression depending on demographics and/or linguistic feature that defines its trait based on social history or a civilisation. At Amazon Web Services, the team has strived to promote a new and modern way of merging infinite speeches into a single encapsulated entity, thereby setting aside the differences in mankind due to disparities in communication.
One finest example of such an innovation is Amazon Alexa which serves as the influential consumer product to be used by almost every household as of today. Not many people know this but Amazon Alexa is merely a piece of hardware, the life source for Alexa heavily resides on the datacenters of Amazon Web Services. Around 70 availability zones (Data Centers) facilitate the prime functionality of Amazon Alexa by the tenancy of services such as AWS Lambda, AWS Transcribe, AWS Translate and AWS Fargate for Docker Containers. In this post, we shall see a small demonstration of how AWS using it’s language enabled services can hope to create linguistic unity amongst the diverse and complex society of mankind. Amazon Web Services offer AWS Translate that is capable of understanding and translating the following dialects. This forms the foundation of all the language based solutions that facilitate the entire business operations of Amazon and all its subsidiaries.
AWS transcribe and AWS Translate are the epicentre of the functionality for Alexa. The diagrammatic expression for this entire service workflow can be seen below.
AWS Transcribe and AWS Translate act as the auxiliary agents for facilitating Automatic speech recognition, Natural Language Understanding and Text to Speech conversion. The worker processes are managed by AWS Lambda, which is a serverless computing platform provided by Amazon Web Services. AWS Translate lets one Enable multilingual user experiences in your applications:
Translate company-authored content, meeting minutes, technician reports, knowledge-base articles, posts, and more.
Translate social communications, like email, in-game chat, client service chat, and a lot of to alter customers and staff to attach in their most popular language.
One such example can be when someone orders something online or asks for a specific action. Even a small greeting can be identified in such diverse ways.
Speaker Diarization
You can have Amazon Transcribe establish the various speakers in associate degree audio clip, a method referred to as diarization or identification.
When you alter identification, Amazon Transcribe labels every fragment of the speech with the speaker that it known. You can specify that Amazon Transcribe establishes between a combination of and ten speakers within the audio clip.
You get the most effective performance once the amount of speakers that you just raise to spot matches the amount of speakers within the input audio.The following is the JSON file for a short audio file:
By this method, typically AWS transcribe outputs the received audio information into code that can be further used to make more practical outcomes. We will now see a fully functioning Serverless application that uses AWS Polly , Translate and Transcribe.
Making languages - “Serverless”
Being a typical example, this is not a live demonstration for Alexa’s functionality. On a basic note, AWS Translate and Transcribe provide two methods – one for fetching information about a new speaker, which should be converted into an MP3 file, and one for retrieving information about the language (including a link to the MP3 file stored in an S3 bucket).
Both ways square measure exposed as relaxing net services through Amazon API entree.
Let’s look at how the interaction works in an application. When the application sends information about new speakers:
The information is received by the RESTful web service exposed by Amazon API Gateway. In another scenario, this web service is invoked by a static worker process running on AWS Fargate
Amazon API Gateway sets off a dedicated Lambda function, “New Post,” which is responsible for initializing the process of generating MP3 files.
The Lambda function inserts information about the post into a DynamoDB table, where information about the transcription is stored.
To run the whole process asynchronously, we can use Amazon SNS to decouple the process of receiving information about new speakers and starting their conversion.
Another Lambda operate, “Convert to Speech,” is signed to our SNS topic whenever a replacement message seems (which means a replacement speaker ought to be born-again into an audio file).
This is the trigger.
The “Convert to Speech” Lambda function uses Amazon Polly to convert the text into an audio file in the specified language (the same as the language of the text).
The new MP3 file is saved in a dedicated S3 bucket. Information about the speaking entity is updated in the DynamoDB table. Then, the reference (URL) to the S3 bucket is saved with the previously stored data.
When the application retrieves information about a specific speaker the RESTful web service is deployed using Amazon API Gateway.
Amazon API entree exposes the strategy for retrieving info regarding posts.
These methods contain the text of the post and the link to the S3 bucket where the MP3 file is stored.
In our state of affairs, this web service is invoked by a static webpage hosted on Amazon S3.
Amazon API entree invokes the “Get Post” Lambda operate, that deploys the logic for retrieving the post information.
The Following Lambda function retrieves information about the post (including the reference to Amazon S3) from the DynamoDB table.
Conclusion
In this post, we explained and demonstrated an application that can convert text into speech in dozens of languages as well as speak that text in even more voices. Although this a conceptual demo of how Alexa converts speech into data and back, we can use it for many other purposes, such as converting text on websites or adding speech functionality in web applications. And we did it completely serverless. This is only the tip of the Iceberg for AWS to establish coexistence amongst our fellow man by bridging the gaps of communication and language. So now what? Use this approach to imagine and build new applications that give a far higher user expertise than antecedently doable.
Any questions or suggestions are highly appreciated. Thank you.