first

googlecreativelab · Nov 10, 2016 · 9d44eaa · 9d44eaa
commit 9d44eaa
Show file tree

Hide file tree

Showing 113 changed files with 4,498 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,6 @@
+server/data/key.json
+node_modules/
+.DS_Store
+*.pyc
+static/build/*
+static/images/font/
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,27 @@
+Want to contribute? Great! First, read this page (including the small print at the end).
+
+### Before you contribute
+Before we can use your code, you must sign the
+[Google Individual Contributor License Agreement]
+(https://cla.developers.google.com/about/google-individual)
+(CLA), which you can do online. The CLA is necessary mainly because you own the
+copyright to your changes, even after your contribution becomes part of our
+codebase, so we need your permission to use and distribute your code. We also
+need to be sure of various other things—for instance that you'll tell us if you
+know that your code infringes on other people's patents. You don't have to sign
+the CLA until after you've submitted your code for review and a member has
+approved it, but you must do it before we can put your code into our codebase.
+Before you start working on a larger contribution, you should get in touch with
+us first through the issue tracker with your idea so that we can help out and
+possibly guide you. Coordinating up front makes it much easier to avoid
+frustration later on.
+
+### Code reviews
+All submissions, including submissions by project members, require review. We
+use Github pull requests for this purpose.
+
+### The small print
+Contributions made by corporations are covered by a different agreement than
+the one above, the
+[Software Grant and Corporate Contributor License Agreement]
+(https://cla.developers.google.com/about/google-corporate).
diff --git a/README.md b/README.md
@@ -0,0 +1,58 @@
+Giorgio Cam let's you use your camera to make music. It relies on a combination of [Google Cloud Vision API](https://cloud.google.com/vision/) and [MaryTTS](https://github.com/marytts/marytts). 
+
+This is not an official Google product.
+
+## OVERVIEW
+
+The client-side javascript application captures images using WebRTC. When the user hits the shutter button, an image is sent to the server which then returns an array of labels and confidence scores for that image using Cloud Vision. These labels are dropped into a rhyming template to create the next phrase that the computer will speak. To get the audio of that phrase, the client makes another request to the MaryTTS (text to speech) server which returns a wav file of the audio. That audio is then synced to the music using Tone.js.
+
+## FRONT-END
+
+To build the client-side javascript, first install [node](https://nodejs.org) and [webpack](https://webpack.github.io/). Then you can install of the dependencies of the project by typing the following in the terminal: 
+
+```bash
+cd static
+npm install
+```
+
+Then build all of the files
+
+```bash
+webpack -p
+```
+
+## BACK-END
+
+The back-end uses [Google App Engine](https://cloud.google.com/appengine/) to serve static content and mediate between the two other back-end services: Google Cloud Vision and MaryTTS. 
+
+To install the dependencies so they can be launched within Google App Engine they will need to be installed into a local folder (documented [here](https://cloud.google.com/appengine/docs/python/tools/using-libraries-python-27)).
+
+```bash
+cd server
+mkdir lib
+pip install -t lib -r requirements.txt
+```
+
+### Google Cloud Vision API
+
+You will need to first enable the API and [generate credentials](https://cloud.google.com/vision/docs/common/auth). Under "Key type", use "JSON" and then download the key.json file. 
+
+Add your json key to the `env_variables` section of your `.yaml` file like so:
+
+```yaml
+env_variables:
+  GOOGLE_APPLICATION_CREDENTIALS: PATH/TO/CLOUD_VISION_KEY.json
+```
+
+### MaryTTS
+
+Download and install [MaryTTS](https://github.com/marytts/marytts). Then run the MaryTTS Server. Add the IP Adress and Port number that MaryTTS is running on to `server/mary.py`. The default location is `http://localhost:59125`
+
+```python
+MARY_TTS_URL = 'http://127.0.0.1'
+MARY_TTS_PORT = '59125'
+```
+
+### App Engine
+
+Enabled App Engine, created a project and set your project ID under `application`. You can then launch the back-end server which will communication to the front-end application, Cloud Vision, and MaryTTS server. 
diff --git a/app.yaml b/app.yaml
@@ -0,0 +1,60 @@
+application: APP_ID_HERE
+version: dev
+runtime: python27
+api_version: 1
+threadsafe: true
+
+
+handlers:
+
+- url: /
+  script: server.main.app
+  secure: always
+
+- url: /speak/.*
+  script: server.mary.app
+
+- url: /see
+  script: server.vision.app
+
+- url: /(.*\.js)
+  mime_type: text/javascript
+  static_files: static/\1
+  upload: static/(.*\.js)
+
+- url: /(.*\.html)
+  mime_type: text/html
+  static_files: static/\1
+  upload: static/(.*\.html)
+
+- url: /(.*\.(bmp|gif|ico|jpeg|jpg|png|svg|ttf))
+  static_files: static/\1
+  upload: static/(.*\.(bmp|gif|ico|jpeg|jpg|png|svg|ttf))
+
+# index files
+- url: /(.+)/
+  static_files: static/\1/index.html
+  upload: static/(.+)/index.html
+
+# audio files
+- url: /(.*\.mp3)
+  mime_type: audio/mpeg
+  static_files: static/\1
+  upload: static/audio/(.*\.mp3)
+
+
+skip_files:
+- ^(.*/)?.*\.pyc
+- ^(.*/)?.*\.wav
+- ^.git/.*
+- ^(.*/)?node_modules/.*
+- ^static/src/.*
+- ^static/style/.*
+- ^package\.json
+
+libraries:
+- name: webapp2
+  version: latest
+
+env_variables:
+  GOOGLE_APPLICATION_CREDENTIALS: PATH/TO/CLOUD_VISION_KEY.json
diff --git a/server/__init__.py b/server/__init__.py
diff --git a/server/index.html b/server/index.html
@@ -0,0 +1,11 @@
+<!-- <!DOCTYPE html> -->
+<html>
+	<head>
+		<meta charset="UTF-8">
+		<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no">
+
+		<script type="text/javascript" src="build/Main.js"></script>
+	</head>
+	<body>
+	</body>
+</html>
diff --git a/server/main.py b/server/main.py
@@ -0,0 +1,32 @@
+#!/usr/bin/env python
+#
+# Copyright 2016 Google Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import webapp2
+from google.appengine.ext.webapp import template
+
+
+class Main(webapp2.RequestHandler):
+
+	def get(self):
+		main = os.path.join(os.path.dirname(__file__), 'index.html')
+		self.response.out.write(template.render(main, {}))
+
+
+app = webapp2.WSGIApplication([
+	('/', Main)
+], debug=True)
diff --git a/server/mary.py b/server/mary.py
@@ -0,0 +1,60 @@
+#!/usr/bin/env python
+#
+# Copyright 2016 Google Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import webapp2
+import urllib2
+
+from google.appengine.api import memcache
+
+MARY_TTS_URL = 'http://127.0.0.1'
+MARY_TTS_PORT = '59125'
+
+
+class SpeakHandler(webapp2.RequestHandler):
+    def maryRequestUrl(self, text, rate, pitch):
+        f0_scale = '0.8' if float(pitch) == 0 else '1.5'
+        tract_scaler = '0.9'
+        f0_add = str(float(pitch) * 10)
+        text = urllib2.quote(text.encode("utf-8"))
+        urlString = '{0}:{1}/process?INPUT_TYPE=TEXT&OUTPUT_TYPE=AUDIO&INPUT_TEXT={2}'.format(MARY_TTS_URL, MARY_TTS_PORT, text)
+        urlString += '&VOICE_SELECTIONS=cmu-bdl-hsmm%20en_US%20male%20hmm&AUDIO_OUT=WAVE_FILE&LOCALE=en_US&VOICE=cmu-bdl-hsmm&AUDIO=WAVE_FILE'
+        # some voice parameters
+        urlString += '&effect_F0Scale_selected=on&effect_F0Scale_parameters=f0Scale%3A{0}%3B'.format(f0_scale)
+        urlString += '&effect_TractScaler_selected=on&effect_TractScaler_parameters=amount%3A{0}%3B'.format(tract_scaler)
+        urlString += '&effect_F0Add_selected=on&effect_F0Add_parameters=f0Add%3A{0}%3B'.format(f0_add)
+        urlString += '&effect_Rate_selected=on&effect_Rate_parameters=durScale%3A{0}%3B'.format(str(1 / float(rate)))
+        return urlString
+    def get(self):
+        text = self.request.get('text', default_value='this is a test')
+        rate = self.request.get('rate', default_value='1')
+        pitch = self.request.get('pitch', default_value='0')
+
+        key = 'text={0} rate={1} pitch={2}'.format(text, rate, pitch)
+        print self.maryRequestUrl(text, rate, pitch)
+        audio = memcache.get(key)
+        if audio is None:
+            url = self.maryRequestUrl(text, rate, pitch)
+            audio = urllib2.urlopen(url).read()
+            four_hours = 60 * 60 * 4
+            memcache.add(key, audio, four_hours)
+        self.response.headers['Content-Type'] = 'audio/wav'
+        self.response.out.write(audio)
+
+
+app = webapp2.WSGIApplication([
+    ('/.*', SpeakHandler)
+], debug=True)
diff --git a/server/requirements.txt b/server/requirements.txt
@@ -0,0 +1 @@
+google-api-python-client==1.5.1
diff --git a/server/vision.py b/server/vision.py
@@ -0,0 +1,84 @@
+#!/usr/bin/env python
+#
+# Copyright 2016 Google Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import webapp2
+import urllib2
+import json
+
+import base64
+
+from google.appengine.ext import vendor
+vendor.add('server/lib')
+
+import re
+from googleapiclient import discovery
+from oauth2client.client import GoogleCredentials
+
+# remove these things cause it could be taken the wrong way
+DISCOVERY_URL='https://{api}.googleapis.com/$discovery/rest?version={apiVersion}'
+
+class VisionApi(webapp2.RequestHandler):
+    def __init__(self, request, response):
+        # Set self.request, self.response and self.app.
+        self.initialize(request, response)
+        self.vision = self._create_client()
+
+    def _create_client(self):
+        credentials = GoogleCredentials.get_application_default()
+        return discovery.build(
+            'vision', 'v1', credentials=credentials,
+            discoveryServiceUrl=DISCOVERY_URL)
+
+    def _label(self, image, max_results=10, num_retries=2):
+        """
+        Uses the Vision API to detect text in the given file.
+        """
+
+        label_request = {
+            'image': {
+                'content': image
+            },
+            'features': [{
+                'type': 'LABEL_DETECTION',
+                'maxResults': max_results,
+            }]
+        }
+
+        request = self.vision.images().annotate(body={'requests': [label_request]})
+        response = request.execute(num_retries=num_retries)['responses'][0]
+        labels_raw = response.get('labelAnnotations', [])
+        print labels_raw
+        error = response.get('error', [])
+        if len(labels_raw):
+            filtered = [r for r in labels_raw if float(r['score']) > 0.1]
+            labels = [{'label' : str(x['description']), 'score' : float(x['score'])} for x in filtered]
+            return labels
+        elif len(error):
+            return {'error' : error}
+        else:
+            return []
+
+
+    def post(self):
+        image = self.request.POST.multi['image']
+        labels = self._label(image)
+        self.response.out.write(json.dumps(labels))
+
+
+app = webapp2.WSGIApplication([
+    ('.*', VisionApi)
+], debug=True)
diff --git a/static/audio/applause.mp3 b/static/audio/applause.mp3
diff --git a/static/audio/crash.mp3 b/static/audio/crash.mp3
diff --git a/static/audio/end.mp3 b/static/audio/end.mp3
diff --git a/static/audio/fill0.mp3 b/static/audio/fill0.mp3
diff --git a/static/audio/fill1.mp3 b/static/audio/fill1.mp3
diff --git a/static/audio/fill2.mp3 b/static/audio/fill2.mp3
diff --git a/static/audio/racer.mp3 b/static/audio/racer.mp3
diff --git a/static/audio/reverseCrash.mp3 b/static/audio/reverseCrash.mp3
diff --git a/static/audio/shutter.mp3 b/static/audio/shutter.mp3
diff --git a/static/audio/siren.mp3 b/static/audio/siren.mp3
diff --git a/static/audio/voice/alright.mp3 b/static/audio/voice/alright.mp3
diff --git a/static/audio/voice/herewego.mp3 b/static/audio/voice/herewego.mp3
diff --git a/static/audio/voice/herewegogo.mp3 b/static/audio/voice/herewegogo.mp3
diff --git a/static/audio/voice/readytoexplore.mp3 b/static/audio/voice/readytoexplore.mp3
diff --git a/static/audio/voice/takeapicture.mp3 b/static/audio/voice/takeapicture.mp3
diff --git a/static/audio/voice/yeah.mp3 b/static/audio/voice/yeah.mp3
diff --git a/static/audio/voice/yeahup.mp3 b/static/audio/voice/yeahup.mp3