Codes in NLP, Deep Learning, Reinforcement Learning and Artificial Intelligence
Welcome to my GitHub repo.
I am a Data Scientist and I code in R, Python and Wolfram Mathematica. Here you will find some Machine Learning, Deep Learning, Natural Language Processing and Artificial Intelligence models I developed.
Outputs of the models can be seen at my portfolio: http://www.slideshare.net/RubensZimbres/portfolio-79-2017
Autoencoder for Audio is a model where I compressed an audio file and used Autoencoder to reconstruct the audio file, for use in phoneme classification.
Collaborative Filtering is a Recommender System where the algorithm predicts a movie review based on genre of movie and similarity among people who watched the same movie.
Convolutional NN Lasagne is a Convolutional Neural Network model in Lasagne to solve the MNIST task.
Ensembled Machine Learning is a .py file where 7 Machine Learning algorithms are used in a classification task with 3 classes and all possible hyperparameters of each algorithm are adjusted. Iris dataset of scikit-learn.
Hyperparameter Tuning RL is a model where hyperparameters of Neural Networks are adjusted via Reinforcement Learning. According to a reward, hyperparameter tuning (environment) is changed through a policy (mechanization of knowledge) using the Boston Dataset. Hyperparameters tuned are: learning rate, epochs, decay, momentum, number of hidden layers and nodes and initial weights.
Keras Regularization L2 is a Neural Network model for regression made with Keras where a L2 regularization was applied to prevent overfitting.
Lasagne Neural Nets Regression is a Neural Network model based in Theano and Lasagne, that makes a linear regression with a continuous target variable and reaches 99.4% accuracy. It uses the DadosTeseLogit.csv sample file.
Lasagne Neural Nets + Weights is a Neural Network model based in Theano and Lasagne, where is possible to visualize weights between X1 and X2 to hidden layer. Can also be adapted to visualize weights between hidden layer and output. It uses the DadosTeseLogit.csv sample file.
Multinomial Regression is a regression model where target variable has 3 classes.
Neural Networks for Regression shows multiple solutions for a regression problem, solved with sklearn, Keras, Theano and Lasagne. It uses the Boston dataset sample file from sklearn and reaches more than 98% accuracy.
NLP + Naive Bayes Classifier is a model where movie reviews were labeled as positive and negative and the algorithm then classifies a totally new set of reviews using Logistic Regression, Decision Trees and Naive Bayes, reaching an accuracy of 92%.
NLP Probabilistic ANN is a Natural Langugage Processing model where sentences are vectorized by Gensim and a probabilistic Neural Network model is deveoped using Gensim, for sentiment analysis.
NLP Semantic Doc2Vec + Neural Network is a model where positive and negative movie reviews were extracted and semantically classified with NLTK and BeautifulSoup, then labeled as positive or negative. Text was then used as an input for the Neural Network model training. After training, new sentences are entered in the Keras Neural Network model and then classified. It uses the zip file.
NLP Sentiment Positive is a model that identifies website content as positive, neutral or negative using BeautifulSoup and NLTK libraries, plotting the results.
ROC Curve Multiclass is a .py file where Naive Bayes was used to solve the IRIS Dataset task and ROC curve of different classes are plotted
Text-to-Speech is a .py file where Python speaks any given text and saves it as an audio .wav file.
Time Series Prediction with Neural Networks - Keras is a Neural Network model to forecast time series, using Keras with an adaptive learning rate depending upon derivative of loss.
Variational Autoencoder is a VAE made with Keras.
t-SNE Dimensionality Reduction is a t-SNE model for dimensionality reduction which is compared to Principal Components Analysis regarding its discriminatory power.
t-SNE PCA LDA embeddings is a model where t-SNE, Principal Components Analysis, Linear Discriminant Analysis and Random Forest embeddings are compared in a task to classify clusters of similar digits.