~~REVEAL~~

======Machine Learning 03 - Entering the Uncanny Valley of Speech======

==== ====

<WRAP hideslide>With State Library closed to the public due to COVID-19, for this Machine Learning (ML) workshop we won't be able to use the Digital Media Lab at The Edge.</WRAP>This online workshop recaps our previous workshops, and  explores the world of Text To Speech (TTS),  voice synthesisers and Speech To Text (STT) voice recognition nbuilt with ML.  The workshop is not an introduction to coding or math, but we will give a of general overview of how ML is defined and where it is commonly used today.  

==== ====

We've chosen an approach that demonstrates the power and limitations of ML and leaves you with an understanding of how use an online  ML environment, along with ideas on how to use State Library resources to explore ML further.

==== ====

 The first half of the workshop will cover

  * a basic explanation of ML
  * recap of previous ML workshops
  * ML for speech 

==== ==== 

The second half of the workshop explore how to impliment ML research using Google's Colab platform

=====Outcomes=====

  * A general basic ML background
    * ML for speech
  * using Google Colab
     * Spleeter (audio source separation) 
     *TTS (Mozilla TTS)
     *STT (Mozilla Deepspeech)

=====Requirements=====

All we need to get started for this workshop is a Google account to access Google Colab in the second half of the workshop.  If you don't have one you can quickly [[https://accounts.google.com/signup/v2/webcreateaccount?hl=en&flowName=GlifWebSignIn&flowEntry=SignUp|
sign up]].  If you don't want to create a Google account, you can always just follow along with the examples.

===== =====


{{page>workshops:public:machine_learning:ideepcolor:start#background}}

{{page>workshops:public:machine_learning:ideepcolor:start#interactive_deep_colorization}}

{{page>workshops:public:machine_learning:paper_to_product#ml_-_from_paper_to_product}}

======Speech Synthesis======

Like many of the 20th century's technological inovations, the frst modern speech synthesiser can be traced back to the invention of the [[https://en.wikipedia.org/wiki/Vocoder|vocoder]] at [[https://en.wikipedia.org/wiki/Bell_Labs|Bell Labs]].   Derived from this, the [[https://en.wikipedia.org/wiki/Voder|Voder]] was demonstrated at the 1939 World Fair.

{{ :workshops:public:machine_learning:uncanny_valley:voder_demonstrated_on_1939_new_york_world_fair_-_the_voder_fascinates_the_crowds_-_bell_telephone_quarterly_january_1940_.jpg?direct&600 |}} ((<wrap lo>By Internet Archive Book Images - https://www.flickr.com/photos/internetarchivebookimages/14776509983/Source book page: https://archive.org/stream/belltelephonemag19amerrich/belltelephonemag19amerrich#page/n78/mode/1upReference[Fig.4] The Voder Fascinates the Crowds from: Williams, Thomas W. (January 1940) I. At the New York World&#039;s Fair. &quot;Our Exhibits at Two Fairs&quot;. Bell Telephone Quarterly XIX (1): 65.&quot;​The Voder Fascinates the Crowds - The manipulative skill of the operator s fingers makes the Voders voice almost loo good to be true &quot;, No restrictions, https://commons.wikimedia.org/w/index.php?curid=43343073</wrap>))

==== ==== 
{{ :workshops:public:machine_learning:uncanny_valley:homer_dudley_october_1940_._the_carrier_nature_of_speech_._bell_system_technical_journal_xix_4_495-515._--_fig.8_schematic_circuit_of_the_voder.jpg?direct&600 |}}

====Historical Audio Examples ====

Here is a playlist of various historical TTS methods. 

https://soundcloud.com/user-552764043

======Modern State of the Art TTS=======


Now - it time to have some fun with TTS - check out the man holding the frog below...

https://vo.codes/#speak


==== ====

And have a listen to some interesting examples from pop/meme culture.

https://fifteen.ai/examples

==== ====

https://www.youtube.com/watch?v=drirw-XvzzQ


==== ==== 
=====Wavenet=====

Modern deep learning based synthesis started with the release of [[https://deepmind.com/blog/article/wavenet-generative-model-raw-audio|Wavenet]] in 2016 by Google's [[https://deepmind.com|Deepmind]].  

WaveNet changes this paradigm by directly modelling the raw waveform of the audio signal, one sample at a time. As well as yielding more natural-sounding speech, using raw waveforms means that WaveNet can model any kind of audio, including music.((https://deepmind.com/blog/article/wavenet-generative-model-raw-audio))

=====Tacotron and Tacotron2=====

Wavenet was followed by Tacoctron (also from Google) in 2017.  

https://google.github.io/tacotron/publications/tacotron/index.html

Then Tacotron2 

https://ai.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html


=====
======Google Colab======

Google's Colaboratory((https://colab.research.google.com/notebooks/intro.ipynb)), or "Colab" for short, allows you to write and execute Python in your browser, with

  * Zero configuration required
  * Free access to GPUs
  * Easy sharing

====Python====

Python is an open source programming language that was made to be easy-to-read and powerful((https://simple.wikipedia.org/wiki/Python_(programming_language))).  ​Python ​is: 	

  * a high-level language,  ​(Meaning programmer can focus on what to do instead of how to do it.)
  * an interpreted language (Interpreted languages do not need to be compiled to run.)
  * is often described as a "​batteries included"​ language due to its comprehensive standard library.

==== ====
 
A program called an interpreter runs Python code on almost any kind of computer. In our case python will be interpreted by  google colab, which is based on Jupyter notebooks.

====Jupyter Notebooks ====

Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text((https://jupyter.org/)).  Usually Jupyter notebooks require set-up for a specific purpose, but Colab takes care of all this for us.

======Getting Started with Colab======

The only requirment for using Colab is (unsurprisingly) a Google account.  Once you have a google account, lets jump into our first ML example  - [[https://github.com/deezer/spleeter|Spleeter]] - that we mentioned earlier.  Go to the Colab here:

https://colab.research.google.com/github/deezer/spleeter/blob/master/spleeter.ipynb

====Making a Colab Copy ====

The first step is make a copy of the notebook to our Google drive - this means we can save any changes we like.

{{:workshops:public:machine_learning:uncanny_valley:01_colab_spleeter.jpg?direct&400|}}

==== ====
This will trigger a google sign-in

{{:workshops:public:machine_learning:uncanny_valley:02_colab_spleeter.jpg?direct&400|}}

==== ====

and the your copy will open in a new tab.

{{:workshops:public:machine_learning:uncanny_valley:03_colab_spleeter.jpg?direct&400|}}

====Select a Runtime====

Next we change our runtime (the kind or processor we use) 

{{:workshops:public:machine_learning:uncanny_valley:04_colab_spleeter.jpg?direct&400|}}

==== ====

to a GPU to take advantage of Googles free GPU offer.

{{:workshops:public:machine_learning:uncanny_valley:04.5_colab_spleeter.jpg?direct&400|}}

==== ====

Now lets connect to our hosted runtime 

{{:workshops:public:machine_learning:uncanny_valley:05_colab_spleeter.jpg?direct&400|}}

==== ====

and check the specs...

{{:workshops:public:machine_learning:uncanny_valley:06_colab_spleeter.jpg?direct&400|}}

=====Step Through the Notebook=====

Now its time to actually use the notebook!  Before we start, lets go over how the notebooks work:

  * The notebook is divided into sections, with each section made up of cells.  
  * These cells have code pre-entered into them, 
  * A play button on the runs (executes) the code in the cell.  
  * The output of the cell is printed (or displayed) directly below each cell. 
  * The output could be text, pictures, audio or video. 

==== ====

Cells usually contain python code, but can also be coded in bash - the UNIX command line shell.  Cells containing bash commands start with an exclamation mark ''!''

===== =====

Our first section is called "Install Spleeter" and contains the bash command ''apt install ffmeg'' . This installs ffmeg in our runtime, which is used to process audio. Press the go button..

{{:workshops:public:machine_learning:uncanny_valley:07_colab_spleeter.jpg?direct&400|}}


==== ====

ffmpeg will be downloaded and installed to our runtime.

{{:workshops:public:machine_learning:uncanny_valley:08_colab_spleeter.jpg?direct&600|}}


==== ====
Next we will run a python command ''pip''to use the [[https://pypi.org/project/pip/|python package manager
]] to install the spleeter python package. 

{{:workshops:public:machine_learning:uncanny_valley:09_colab_spleeter.jpg?direct&1200|}}

==== ====

This will take a while - and at the end we will get a message saying we need to restart our runtime due to some compatibilty issues ((this is not unusual when using a hosted  runtime))

{{:workshops:public:machine_learning:uncanny_valley:10_colab_spleeter.jpg?direct&1200|}}

==== ====

Go ahead and restart 

{{:workshops:public:machine_learning:uncanny_valley:11_colab_spleeter.jpg?direct&600|}}

==== ====

Next is another bash command 

  wget
  
 we use to (web)get our example audio file.

{{:workshops:public:machine_learning:uncanny_valley:12_colab_spleeter.jpg?direct&800|}}

==== ====

And the next cell uses the python ''Audio'' command to give us a nice little audio player so we can hear our example.

{{:workshops:public:machine_learning:uncanny_valley:13_colab_spleeter.jpg?direct&600|}}


==== ====

Now its finally time to use the spleeter tool with the ''separate'' command ((confusingly we need to call it from bash (with the exclamation)) as ''!spleeter separate'' , and lets pass the ''-h'' flag ((a fancy way of saying option)) to show us the built in help for the command.

{{:workshops:public:machine_learning:uncanny_valley:14_colab_spleeter.jpg?direct&800|}}

==== ====

Now that we know what we are doing - we run the tool for real, and will use the ''-i'' flag to define the input as our downloaded example, and the ''-o'' flag to define our output destination as the directory (folder) ''output''.  By default spleeter will download and use the[[ https://github.com/deezer/spleeter/wiki/2.-Getting-started#using-2stems-model|2stems model]].


{{:workshops:public:machine_learning:uncanny_valley:15_colab_spleeter.jpg?direct&1200|}}

==== ====

Another bash command ''ls'' (list) shows us the contents of our output directory

{{:workshops:public:machine_learning:uncanny_valley:16_colab_spleeter.jpg?direct&800|}}

==== ====

And finally onother couple of ''audio'' commands to hear our result!

{{:workshops:public:machine_learning:uncanny_valley:17_colab_spleeter.jpg?direct&800|}}


====Things to try====

Check out the [[https://github.com/deezer/spleeter/wiki/2.-Getting-started#separate-sources|usage instructions]] for the separate tool on the Github site and try your own 4stem and 5tem separations.  
Use your own audio files to test the separation.


======Speech to Text with Mozilla Deepspeech ======

Our next challenge will be to adapt the latest version of Mozilla's Deepspeech for use in Google Colab.

We will be using the documentation here:

https://deepspeech.readthedocs.io/en/v0.8.0/USING.html#getting-the-pre-trained-model

To adapt this colab notebook to run the latest version of Mozilla Deepspeech:

https://colab.research.google.com/github/tugstugi/dl-colab-notebooks/blob/master/notebooks/MozillaDeepSpeech.ipynb#scrollTo=4OAYywPHApuz


=====Text to Speech with Mozilla TTS======

Our final example is TTS with Mozilla TTS:

https://colab.research.google.com/drive/1u_16ZzHjKYFn1HNVuA4Qf_i2MMFB9olY?usp=sharing#scrollTo=6LWsNd3_M3MP

You can dive straight into this and use it to generate speech.  This example usesTacotron2 and MultiBand-Melgan models and LJSpeech dataset.

====Run All Cells====

{{:workshops:public:machine_learning:uncanny_valley:01_melgan.png?400|}}


====Generate Speech====


{{:workshops:public:machine_learning:uncanny_valley:02_melgan.png?400|}}


=====Going Further=====

ML is such a big and fast moving area of research there are countless other ways to explore and learn, here are a few two-minute videos to pique your interest:

  * [[https://www.youtube.com/watch?v=EjVzjxihGvU|Video restoration]]
  * [[https://www.youtube.com/watch?v=Lu56xVlZ40M|OpenAI Plays Hide and Seek]]

==== ====

Make sure you check out the resources in Lynda, which you will have free access to as a State Library of Queensland member


====== Links ======

https://machinelearningforkids.co.uk/#!/links#top

https://experiments.withgoogle.com/collection/ai

https://openai.com/blog/