GitHub - kongsally/Deep-Learning-for-Automated-Discourse: CIS 700: Deep Learning Methods for Automated Discourse

CIS 700: Deep Learning Methods for Automated Discourse

HW1: Hello Alexa

Super low res demo video
Alexa Skill Kit Interaction Model Reference
lambda.py: A python interface to build intent handlers and responses for Alexa. Modified from the Alexa Skill Set tutorial to say hello and a CS pickup line.
intent_schema.json and sample_utterances: The sample utterances file has the name of the intent on the first column followed by a possible utterance/words/sentences that the user might say to Alexa to trigger that intent.

HW2: Seq2Seq Tensorflow Tutorials

XOR: Training a XOR model on Tensorflow. Ran experiments where we modified the activation functions and loss functions on this implemention of xor.py

Activation Functions + Cost Average Elapsed Time Average Epoch

2 Sigmoids + Reduce Mean 94.1s 50420

ReLU + Sigmoid + Reduce Mean 1.53s (when lucky) 757

2 Sigmoid + Square Mean 9.41s 5758
- The recorded elapsed time and epoch are the time and epoch that took the model to make 10 conescutive correct hypotheses
- The second model only took 1.53 at times, but there were also times when it would not have converged while the first model was much more consistent.
Seq2Seq: A seq2seq model to translate from English to French. First you would need to download the necessary english and french data, tokenize it, then train the model. This could be done by:

$ python translate.py

Afterwards to try out the interactive decode mode by:

$ python translate.py --decode

Ran 550 steps which resulted to a step-time 3.87 perplexity 411.19. The model paramters could be found in Seq2Seq/checkpoint/ and the output of checkpoints could be found in Seq2Seq/perplexitiy_outputs

HW3: First Chatbot (Seq2Seq + Twitter Corpus + Alexa)

Implementation

We modified the Seq2Seq model from HW2 so that the settings will match the Neural Conversational Model:

Single layer LSTM with 1024 memory cells
Stochastic gradient descent with gradient clipping
Vocabulary consists of the most common 20K words

The model configuration and training code could be found in FirstChatbot/converse.py

The data came from this repo, which credited another repo by another user, who also provided a seq2seq training implementation for format of the Twitter data.

You can train it by following instructions to train on this repo

And try out the interactive decode mode by:

$ python converse.py --decode

Then we imported the functions for setting up the model and decoding in converse.py to chatbot.py which is a simple flask app that integrates with Alexa using flask-ask

Evaluation

As of now, we have been manually testing our chatbot's performance every 50 steps of training. It still seems to respond with repeating high scoring words such as:

Human: How are you?
Bot: fitness fitness fitness fitness fitness dumps

After around 4000 steps, the responses seem better. Here are some of our favorite responses:

|| ||

For further evaluation, we plan to generate the top 5 best responses then designing Human Intelligence Tasks to choose the best response or indicate that all responses are wrong. This is a simlar approach to the Neural Conversational model paper where they used huamn evaluation to compare their mode against CleverBot. This experiment used 200 questions and 4 humans to rate their preferred bot.

HW4: Your Chatbot (Seq2Seq + Cornell + Alexa)

Dataset

The dataset used for our chatbot is the Cornell Movie-Dialogs Corpus, which contains "220,579 conversational exchanges" and "304,713 utterances" of "fictional conversations extracted from raw movie scripts." "A Survey of Available Corpora for Building Data-Driven Dialogue Systems" describes the corpus as "short conversations from film scripts, annotated...dialogues with character metadata." We appreciated that the "open-domain movie transcript dataset" in "A Neural Conversational Model" led the model to "hold a natural conversation and sometimes perform simple forms of common sense reasoning." The seven conversations (Basic, Simple Q&A, General knowledge Q&A, Philosophical Q&A, Morality, Opinions, and Job and Personality) yielded impressive results, and we sought a scripted dataset that would sound like organic and entertaining spoken conversation. We hoped that a movie conversation corpus other than the OpenSubtitles dataset could yield similar results, though we noted that OpenSubtitles had many more utterances, at 140M.

Grammar Check

After using the seq2seq model to generate a result, we also used pythons grammar_check library. We were hoping that this would result to more cohesive and grammatically correct responses. Adding this library did reduce the use of repeated words in a response, however no significant improvements were made.

Project: Milestone 1 (ADOS dataset + Seq2Seq)

Milestone 1 Analysis can be found here

Milestone 2 Model Comparison UI

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
ExperimentChatbot		ExperimentChatbot
FinalProject		FinalProject
FirstChatbot		FirstChatbot
HelloAlexa		HelloAlexa
Readings		Readings
Seq2Seq		Seq2Seq
XOR		XOR
YourChatbot		YourChatbot
imgs		imgs
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CIS 700: Deep Learning Methods for Automated Discourse

HW1: Hello Alexa

HW2: Seq2Seq Tensorflow Tutorials

HW3: First Chatbot (Seq2Seq + Twitter Corpus + Alexa)

Implementation

Evaluation

HW4: Your Chatbot (Seq2Seq + Cornell + Alexa)

Dataset

Grammar Check

Project: Milestone 1 (ADOS dataset + Seq2Seq)

About

Releases

Packages

Contributors 2

Languages

Activation Functions + Cost	Average Elapsed Time	Average Epoch
2 Sigmoids + Reduce Mean	94.1s	50420
ReLU + Sigmoid + Reduce Mean	1.53s (when lucky)	757
2 Sigmoid + Square Mean	9.41s	5758

License

kongsally/Deep-Learning-for-Automated-Discourse

Folders and files

Latest commit

History

Repository files navigation

CIS 700: Deep Learning Methods for Automated Discourse

HW1: Hello Alexa

HW2: Seq2Seq Tensorflow Tutorials

HW3: First Chatbot (Seq2Seq + Twitter Corpus + Alexa)

Implementation

Evaluation

HW4: Your Chatbot (Seq2Seq + Cornell + Alexa)

Dataset

Grammar Check

Project: Milestone 1 (ADOS dataset + Seq2Seq)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages