Skip to content

Latest commit

 

History

History
1296 lines (1172 loc) · 98 KB

notes.md

File metadata and controls

1296 lines (1172 loc) · 98 KB

Note/disclaimer:

this doc is not part of this repo in the logical sense but is being hosted here for now. It is currently a set of misc links and notes in no specific structure (well, almost) or purpose or maintenance plan - most resources here are simply 'note to self' items and I am keeping them public.
Use it if you find it useful.
Last updated: October 24, 2022

ML starter and great Resources

Online (free) Jupyter Notebooks: Kaggle - Colab - SageMaker
Guide to explainable AI https://christophm.github.io/interpretable-ml-book/index.html
Explain Paper
Full Stack Deep Learning lectures
Machine Learning study list with resources: learn Python | OOP Python | practical Python

New models @ HF and others:

NLLB-200 (translation): NLLB-200. There are larger models as well.
Google Flan T5 with some reasoning
Stable Diffusion - generate image from text. Try at https://beta.dreamstudio.ai/dream
DALL.E Mini - image generation from prompt
Whisper - speech to text
Low resource languages (Swahili, Bengali, Hausa. Kanuri, ...): Clear Global
HF Models tutorial videos

https://huggingface.co/course/chapter1/1 HuggingFace Transformers course
https://github.com/dair-ai/Transformers-Recipe/blob/main/README.md transformers recipe
https://github.com/josephmisiti/awesome-machine-learning Awesome machine learning
https://ethics-of-ai.mooc.fi/ AI Ethics MOOC
https://explained.ai/matrix-calculus/ Matrix calculus for deep learning
https://stanford-cs329s.github.io/syllabus.html ML systems design

https://fullstackdeeplearning.com/spring2021/ Full Stack Deep Learning
https://github.com/open-mmlab/mmocr OCR library for text detection and recognition
https://towardsdatascience.com/5-open-source-tools-you-can-use-to-train-and-deploy-an-ocr-project-8f204dec862b OCR comparison
https://huggingface.co/spaces/Kforcode/Doctr_plus_TrOCR OCR on Huggingface
https://github.com/pdfminer/pdfminer.six

https://datasetsearch.research.google.com/ google datasets
https://christophm.github.io/interpretable-ml-book/ Interpretable Machine Learning book (free)
https://d2l.ai/chapter_preface/index.html Dive into Deep Learning (PyTorch, TensorFlow and Mxnet)
https://deeplearning.neuromatch.io/tutorials/intro.html Deep Learning wth PyTorch
https://github.com/mauhai/awesome-jupyterlab Awesome Jupyter Lab
https://github.com/microsoft/ML-For-Beginners Machine Learning for Beginners - Microsoft
https://github.com/microsoft/Data-Science-For-Beginners Data Science for Beginners - Microsoft
https://github.com/microsoft/recommenders Recommenders with examples - Microsoft
https://mldurga.github.io/easydl/ deep learning - vision
https://cds.nyu.edu/deep-learning/ NYU Deep learning (2021) - latest techniques in deep learning and representation learning (not for beginners)
https://colab.research.google.com/drive/1OGp9Wgm-oBM5BBhTLx6Qow4dNRSJZ-F5?usp=sharing Arabic NER
https://github.com/pandas-profiling/pandas-profiling
https://github.com/cdpierse/transformers-interpret transformers interprret
https://github.com/nyu-mll/jiant multitask and transfer learning toolkit for natural language processing research
https://stanford-cs329s.github.io/syllabus.html CS 329S: Machine Learning Systems Design from Stanford
https://opendatapolicylab.org/academy/data-reuse-strategy/syllabus-2021/
https://github.com/opendp https://opendp.org/ Differential Privacy (Microsoft) - see https://news.microsoft.com/wp-content/uploads/prod/sites/560/2021/04/Microsoft-Open-Data-Campaign-Report_PDF_FINAL.pdf and https://blogs.microsoft.com/on-the-issues/2021/04/29/open-data-campaign-anniversary-review/
https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn/ sklearn mooc
https://github.com/speechbrain/speechbrain/ + https://colab.research.google.com/drive/1UwisnAjr8nQF3UnrkIJ4abBMAWzVwBMh?usp=sharing SpeechBrain
https://aiethicscourse.org/modules.html AI ethics course
https://github.com/kornia/kornia + https://kornia-tutorials.readthedocs.io/en/latest/ for computer vision based on Pytorch and inspired by OpenCV
https://maelfabien.github.io/machinelearning/wav2vec/#b-the-model wave2Vec
https://www.youtube.com/playlist?list=PL8PYTP1V4I8AkaHEJ7lOOrlex-pcxS-XV Neural Nets for NLP Graham Neubig CMU
https://www.statisticshowto.com/ + https://www.youtube.com/channel/UCs3IhN8VOA_5WxpAgbSmFkg stats in plain English
https://sergey-tihon.github.io/Stanford.NLP.NET/ Stanford NLP.net
Python APIs: https://realpython.com/python-api/ (consume), https://realpython.com/flask-connexion-rest-api/ (build)
https://cwkx.github.io/teaching.html Deep and Reinforcement learning courses (also cyber security). Videos: https://www.youtube.com/user/cwkx/playlists
https://www.kdnuggets.com/2020/11/top-python-libraries-data-science-data-visualization-machine-learning.html Top Python Libraries for Data Science, Data Visualization & Machine Learning
http://learnersdigest.radekosmulski.com/issues/ml-learner-s-digest-demystifying-regex-once-and-for-all-368901 Demystify RegEx
https://stanford-cs329s.github.io/syllabus.html Machine Learning Systems Design (Chip Huyen)
https://elvissaravia.substack.com/p/learn-about-transformers-a-recipe learn about Transformers https://www.paperswithcode.com/datasets + https://huggingface.co/datasets Datasets
https://www.youtube.com/watch?v=oR670Txwh88 The Art of Learning Data Science
https://github.com/linkedin/detext LinkedIn DeText
https://github.com/coursat-ai/NLP + https://github.com/coursat-ai/CV
https://sebastianraschka.com/blog/2021/ml-dl-datasets.html Datasets for Machine Learning and Deep Learning
http://en.arabicspeechcorpus.com/ ASR Arabic Speech Corpus (see also: https://arxiv.org/pdf/2007.03001.pdf https://www.researchgate.net/publication/344799361_End-to-End_Arabic_Speech_Recognition_A_Review https://lionbridge.ai/datasets/best-speech-recognition-datasets-for-machine-learning/ and https://www.openslr.org/12 (LibriSpeech En)
https://aiethicscourse.org/ AI Ethics course https://arxiv.org/pdf/2002.12327.pdf A Primer in BERTology
https://arnicas.github.io/text-gen-arxiv-papers/ https://github.com/arnicas/text-gen-arxiv-papers/
https://github.com/argosopentech/argos-translate/ Open source offline translation app
https://ruder.io/research-highlights-2020/ Seb Ruder NLP summary 2020
https://gltr.io/ A tool to detect automatically generated text
https://twitter.com/akhooli/status/1349438507337584641 (eaarlier: https://twitter.com/akhooli/status/1347609838617042948) Arabic poetry generation
https://eccox.io/ + https://github.com/jalammar/ecco Look Inside Language Models
https://incidentdatabase.ai/ and https://github.com/PartnershipOnAI/aiid AI incident database
https://twitter.com/amitness/status/1347884720751710210 store ML models HF Arabic: ARBERT: https://huggingface.co/UBC-NLP/ARBERT MARBERT: https://huggingface.co/UBC-NLP/MARBERT https://huggingface.co/aubmindlab https://huggingface.co/lanwuwei https://huggingface.co/asafaya + https://huggingface.co/akhooli
https://github.com/EpistasisLab/pmlb Penn Machine Learning Benchmarks + http://timeseriesregression.org + https://twitter.com/rasbt/status/1345407451865243648 + next line
https://datasetlist.com by domain, https://datasetsearch.research.google.com, https://github.com/awesomedata/awesome-public-datasets, https://www.reddit.com/r/datasets/
https://dssoc.github.io/schedule/ Data Science & Society
https://pile.eleuther.ai/ The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together.
https://nlp.johnsnowlabs.com/docs/en/licensed_release_notes Spark NLP BioBert etc (https://github.com/JohnSnowLabs/spark-nlp) -- supports Arabic.
see next line https://parthplc.medium.com/how-to-finetune-mt5-to-create-a-question-generator-for-100-languages-4a3878e63118 MT5 Q gen (pytorch lightning)
QA datasets (incl. Ar): https://github.com/facebookresearch/MLQA + https://github.com/deepmind/xquad
https://atcold.github.io/pytorch-Deep-Learning/ + https://www.youtube.com/playlist?list=PLLHTzKZzVU9eaEyErdV26ikyolxOsz6mq Yann LeCun
http://zna.do/epsilon
https://github.com/GRAAL-Research/poutyne simplified framework for PyTorch
https://algotransparency.org/ + https://github.com/pair-code/lit language interpretability
https://github.com/stared/livelossplot/ Plot losses while raining (TF and PT)
https://web.stanford.edu/~jurafsky/slp3/ Speech and Language Processing
https://github.com/kaushaltrivedi/fast-bert fast-bert fastai + Bert + LAMB
https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/bert-pretraining.md#reproducing-bert-training-results-with-deepspeed deep speed

ToDo

https://arxiv.org/abs/2004.04902 An In-depth Walkthrough on Evolution of Neural Machine Translation
https://dair.ai/newsletter/ NLP Newsletter
https://github.com/eriklindernoren/ML-From-Scratch https://www.kdnuggets.com/2019/12/build-intelligent-chatbot.html
https://github.com/goru001/inltk indic nlp
https://practicalai.me/
https://realpython.com/flask-connexion-rest-api-part-4/
https://80000hours.org/podcast/episodes/nick-beckstead-giving-billions/
https://github.com/fastai/fastai_dev/blob/master/dev/course/lesson1-pets.ipynb
https://github.com/fastai/course-nlp fast.ai NLP
https://github.com/microsoft/nlp-recipes
https://www.youtube.com/watch?v=XRG0bBLRKmc
https://realpython.com/python-statistics/
https://colab.research.google.com/github/practicalAI/practicalAI/blob/master/notebooks/03_Pandas.ipynb#scrollTo=6nwDfMoNT-Qa Pandas
https://github.com/practicalAI/practicalAI
https://twitter.com/amaarora/status/1211043973063536640 seq2seq fast.ai
http://www.rctatman.com/teaching/kaggle_workshops
https://github.com/Kaixhin/grokking-pytorch
https://github.com/minimaxir/gpt-2-simple + https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce
http://roundup.fishtownanalytics.com/issues/data-science-roundup-top-20-posts-of-2019-dsr-211-217188 top 20 ds posts 2019
https://course.spacy.io/ advanced NLP with spaCy
https://www.kdnuggets.com/2019/12/5-features-scikit-learn-release-highlights.html
https://machinelearningmastery.com/standard-machine-learning-datasets-for-imbalanced-classification/
https://twitter.com/ssshanest/status/1214988495422283776 NLProc
https://twitter.com/ogrisel/status/1215045595037097984 BERt
https://sites.google.com/cs.washington.edu/csed514-2020wi DS data management
https://github.com/jqueguiner/polyglot GP2
https://github.com/hammedb197/Recommender-surprise scikit-surpise recommender
http://www.dataatworkbook.com/ data at work, book using Excel for viz (see also: https://excelcharts.com/)
https://github.com/alan-turing-institute/CleverCSV
https://arxiv.org/pdf/1909.04761.pdf + http://nlp.fast.ai/classification/2019/09/10/multifit.html (datasets https://webis.de/data/webis-cls-10.html + https://github.com/facebookresearch/MLDoc No Arabic) + https://github.com/n-waves/multifit + https://github.com/n-waves/multifit/tree/ulmfit-multilingual-original-scripts MultiFiT language models
https://www.kaggle.com/c/nlp-getting-started/overview/prizes ????
https://lamyiowce.github.io/word2viz/
https://github.com/fastai/course-nlp/blob/master/nn-vietnamese.ipynb + https://github.com/n-waves/multifit/blob/master/notebooks/CLS-JA.ipynb /////
https://www.fast.ai/2019/07/08/fastai-nlp/ (nlp vid 2 done) + https://github.com/fastai/course-nlp
http://jimypbr.github.io/2019/09/fast-ai-lesson-6-notes by physicist
https://github.com/amaloraini/cross-lingual-ZP
https://github.com/deepset-ai/FARM
https://towardsdatascience.com/bert-to-the-rescue-17671379687f BERT classifier PyTorch https://huggingface.co/blog/how-to-train <=== train model from scratch using HF transformers and tokenizers (Esperanto)
https://github.com/PyTorchLightning/pytorch-lightning + https://github.com/AntixK/PyTorch-VAE PyTorch-Lightning ||||

TF2

https://twitter.com/random_forests/status/1178687761177554946
https://practicalai.me/ (https://github.com/practicalAI/practicalAI) A practical approach to machine learning *****
https://colab.research.google.com/drive/1UCJt8EYjlzCs1H1d1X0iDGYJsHKwu-NO TF2+Keras basic tutorial by Fracois Chollet

Fast.AI new (2.0)

https://forums.fast.ai/t/a-guided-walk-through-of-2-0-like-practical-deep-learning-for-coders/57161 notebooks
https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0 ***
https://medium.com/@aman.arora0210/fastai-v2-datablocks-api-code-overview-a-gentle-introduction-60338a6c9aa data blocks
https://github.com/fastai/course-nlp code-first intro to nlp (different frameworks, fastai 1, July 2019)
https://www.kaggle.com/jhoward/some-dicom-gotchas-to-be-aware-of-fastai
https://github.com/n-waves/ulmfit-multilingual/tree/multifit ULMFIT Multilingual <====
https://github.com/orendar/imdb_text_generation fast.ai text generation https://towardsdatascience.com/can-we-generate-high-quality-movie-reviews-using-language-models-5158f494aea7

ETL

https://dev.socrata.com/connectors/pentaho-kettle.html

Text augmentation

https://github.com/cerlymarco/MEDIUM_NoteBook/tree/master/Text_Augmentation

GPT-2

https://transformer.huggingface.co/doc/gpt2-xl
https://huggingface.co/openai-detector human or machine?

On using git:

git for poets (https://www.youtube.com/watch?v=BCQHnlnPusY) , also https://www.youtube.com/watch?v=SWYqp7iY_Tc and https://www.youtube.com/watch?v=HVsySz-h9r4
https://github.com/jstray/lede-algorithms for journalism!
https://www.youtube.com/watch?v=EPVwnG-n4B0 git for data scientists Andreas Muller
See also: https://www.linkedin.com/posts/brohrer_if-a-beginner-wanted-to-learn-how-to-use-activity-6581471551160414208-tx0K (Brandon Rohrer's post) ===> https://brohrer.github.io/git_resources.html final list
https://twitter.com/yaser_najjar_ar/status/1177549419119616000 git in 10 Arabic tweets
https://lab.github.com/ and https://www.youtube.com/watch?v=9S0p8YMQzsM
https://realpython.com/advanced-git-for-pythonistas/ Git for Python users

No category:

https://www.linkedin.com/posts/nabihbawazir_heres-the-list-of-tutorials-1-data-science-activity-6574190147376644096-qNeA DataNest
https://hbr.org/2019/02/how-to-choose-your-first-ai-project Andrew Ng
https://end-to-end-machine-learning.teachable.com/p/data-science-concepts/ <=====
https://github.com/rasbt/python-machine-learning-book-2nd-edition Sebastian Raschka book examples
https://web.stanford.edu/~jurafsky/slp3/ Speech and Language Processing (3rd ed. draft)
https://www.kaggle.com/rtatman/getting-started-with-kaggle-workshop-in-a-box http://data8.org/fa18/
https://mml-book.github.io/ Mathematics for Machine Learning <===== http://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064/5
http://www.scipy-lectures.org/
https://medium.com/@ageitgey/natural-language-processing-is-fun-9a0bff37854e NLP is Fun part-1
https://medium.com/@ageitgey/text-classification-is-your-new-secret-weapon-7ca4fad15788 NLP is fun part-2
https://allennlp.org/tutorials Allen NLP Tutorials
https://www.youtube.com/user/sentdex/playlists Sentdex
https://towardsdatascience.com/@actsusanli Susan Li
http://scikit-learn.org/stable/related_projects.html#interoperability-and-framework-enhancements
https://gengo.ai/articles/20-best-youtube-channels-for-ai-and-machine-learning/
https://www.kaggle.com/ldfreeman3/a-data-science-framework-to-achieve-99-accuracy **** Great notebook ****
https://christophm.github.io/interpretable-ml-book/ Interpretable Machine Learning
https://www.cs.toronto.edu/~hinton/coursera_lectures.html (NN Essentials) Jeff Hinton course 2012 <=======
https://arxiv.org/abs/1811.12808 Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning Sebastian Raschka
https://towardsdatascience.com/demystifying-confusion-matrix-confusion-9e82201592fd <======
https://www.datahelpers.org/ ask for help on datascience via Twitter

https://medium.com/ibm-watson-data-lab/the-visual-python-debugger-for-jupyter-notebooks-youve-always-wanted-761713babc62 pixie jupyter notebook debugger https://github.com/pixiedust/pixiedust
https://towardsdatascience.com/python-for-data-science-8-concepts-you-may-have-forgotten-i-did-825966908393
https://explained.ai/matrix-calculus/index.html Matrix Calculus You Need For Deep Learning
https://github.com/THUNLP-MT/MT-Reading-List Machine Translation reading list
https://github.com/radekosmulski/whale/blob/master/fluke_detection.ipynb fastai object detection

https://www.analyticsvidhya.com/blog/2018/12/best-data-science-machine-learning-projects-github/

Machine Learning

https://vas3k.com/blog/machine_learning/ Machine Learning for Everyone
https://github.com/ujjwalkarn/Machine-Learning-Tutorials/
https://mlbook.explained.ai/catvars.html <===== ML Book

DataScience

https://www.linkedin.com/feed/update/urn:li:activity:6523057989522423808 Data Science learning path for beginners <=====
https://medium.mybridge.co/30-amazing-machine-learning-projects-for-the-past-year-v-2018-b853b8621ac7
https://twitter.com/zaidalyafeai/status/1066664451984773121 ML courses
https://github.com/bulutyazilim/awesome-datascience
https://www.kaggle.com/shivamb/data-science-glossary-on-kaggle-updated <=========
https://github.com/amueller/ml-workshop-4-of-4
https://github.com/josephmisiti/awesome-machine-learning/blob/master/README.md
https://github.com/jhfjhfj1/autokeras <=== Auto Keras
https://realpython.com/python-keras-text-classification/ ****
https://blog.udacity.com/2018/09/machine-learning-ai-experts-on-twitter.html who to follow in AI
http://explained.ai/decision-tree-viz/index.html Decision trees viz (see also http://explained.ai/)
http://tfmeter.icsi.berkeley.edu/#activation=relu&batchSize=10&dataset=gauss&regDataset=reg-plane&learningRate=1&trueLearningRate=0&regularizationRate=0&noise=35&networkShape=1&seed=0.95465&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=true&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false TensorFlow viz for NN
https://www.kaggle.com/learn/python *** see also:
https://medium.com/@STXNext/the-most-popular-python-scientific-libraries-40f5cdcb370a
https://towardsdatascience.com/python-basics-for-data-science-6a6c987f2755
https://gist.github.com/versipellis/eb15f8612be76d49922e3e2490e50612 ds podcasts etc (https://twitter.com/bennyjtang/status/1056650349749170176)
https://medium.com/ai%C2%B3-theory-practice-business/top-6-cheat-sheets-novice-machine-engineers-need-5ea43d1be3de
http://www.dataschool.io/start/ great collection
http://www.themenyouwanttobe.com/data-science-resources/ DS resources list
http://www.r2d3.us/visual-intro-to-machine-learning-part-2/ Bias-Variance Trade-off
https://github.com/rasbt/mlxtend/blob/master/docs/sources/user_guide/evaluate/bias_variance_decomp.ipynb mlxtend
https://machinelearningmastery.com/all-of-statistics-for-machine-learning/
https://github.com/scikit-learn-contrib/imbalanced-learn (https://beckernick.github.io/oversampling-modeling/)
https://medium.com/@hiromi_suenaga/deep-learning-2-part-1-lesson-1-602f73869197 fast.ai notes
http://crawles.com/lr-scratch/ crossing features in Titanic
https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects Randy Olson https://www.kaggle.com/rtatman/download-a-csv-file-from-a-kernel/ http://nlpprogress.com/ tracking progress in nlp <=======================
https://medium.com/huggingface/the-best-and-most-current-of-modern-natural-language-processing-5055f409a1d1
http://www.winlp.org/big-directory/

https://github.com/lmcinnes/umap Uniform Manifold Approximation and Projection
http://tomaugspurger.github.io/modern-7-timeseries.html
https://www.kdnuggets.com/2018/03/catboost-vs-light-gbm-vs-xgboost.html
https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/
https://www.kdnuggets.com/2018/08/mxnet-tensor-basics-simple-derivatives.html MXNet (Apache)
https://www.kdnuggets.com/2018/08/introduction-fraud-detection-systems.html fraud detection with lightGBM MSFT
https://medium.com/cracking-the-data-science-interview/14-golden-nuggets-to-demystify-data-science-for-aspiring-data-scientists-d18e8f30de35
https://medium.com/cracking-the-data-science-interview/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-7228aa8ef541
https://towardsdatascience.com/the-10-mining-techniques-data-scientists-need-for-their-toolbox-ae15a5733b02 ==///===>
https://jhudatascience.org/chromebookdatascience/ Chromebook ds

https://towardsdatascience.com/how-to-out-compete-on-a-data-science-competition-insights-techniques-and-tactics-95a0545041d5
https://github.com/neptune-ml/steppy https://github.com/iskandr/fancyimpute ==========///======>

https://twitter.com/afshinea/status/1031393337998069760 Stanford cs ML
https://github.com/afshinea/stanford-cs-230-deep-learning cheatsheet Stanford NLP
https://medium.com/analytics-vidhya/python-implementation-of-andrew-ngs-machine-learning-course-part-1-6b8dd1c73d80

https://twitter.com/GaelVaroquaux/status/1035958024949706757 Sci-py notebook
https://blog.ycombinator.com/learning-math-for-machine-learning/
https://realpython.com/python-pandas-tricks/ ====//==> https://github.com/eriklindernoren/ML-From-Scratch
https://github.com/owainlewis/awesome-artificial-intelligence
https://github.com/shivasj/Integrating-a-Machine-Learning-Model-into-a-Web-app
http://pages.stat.wisc.edu/~sraschka/teaching/stat479-fs2018/#schedule =====//====> Sebastian Raschka
https://github.com/rasbt/stat479-machine-learning-fs18 http://www.cs.columbia.edu/~amueller/comsw4995s18/schedule/ =======//====> Andreas Mueller
https://mlcourse.ai/ <====//==== http://videolectures.net/Top/Computer_Science/Data_Mining/#l=en
https://twitter.com/_brohrer_/status/1054298773541670912 Math4ML
https://www.youtube.com/watch?v=UiF0FyMFO-8 Model selection BRohrer
https://github.com/dustinvtran/ml-videos ML Videos

Explained AI

https://becominghuman.ai/interpretable-machine-learning-an-overview-10684eaa1fd7
https://towardsdatascience.com/understanding-how-lime-explains-predictions-d404e5d1829c
https://christophm.github.io/interpretable-ml-book/ <====
https://lexfridman.com/ai/ AI Podcast
https://github.com/Microsoft/interpret MS Interpret

NLP

https://omarito.me/data-augmentation-in-nlp-sequence-tagging/ NLP data augmentation
https://www.topbots.com/most-important-ai-nlp-research/
https://docs.google.com/presentation/d/e/2PACX-1vSNssslqRy6zjr6OyhGNUJGYnwyrIP5jTc72usf0Bhi_iW_31j_g4wl52Tu-aqxKxOsvjfwrais4I38/pub?start=false&loop=false&delayms=3000#slide=id.g4e4dd52cc4_0_0 NLP Today
https://github.com/tesseract-ocr/tesseract Open OCR
https://twitter.com/supernlpblog/status/1067341618880098309 <== sentence representations
https://github.com/sebastianruder/NLP-progress
https://realpython.com/python-keras-text-classification/ <==================================
https://www.kdnuggets.com/2018/10/github-python-data-science-spotlight.html

https://pmbaumgartner.github.io/blog/notes-on-nlp-projects/
https://arxiv.org/abs/1808.00158 Speaker Recognition from raw waveform with SincNet
http://www.cs.cmu.edu/~rsalakhu/jsm2018.html The deep learning rev
http://jkk.name/neural-tagger-tutorial/ Neural nlp tagger
https://blog.feedly.com/transfer-learning-in-nlp/
https://medium.com/redbus-in/how-to-deploy-scikit-learn-ml-models-d390b4b8ce7a NLP scikit-deploy (see also https://www.opendatagroup.com/ and http://scoring.one/, draw.io to make flow plans)
https://medium.com/@ageitgey/natural-language-processing-is-fun-9a0bff37854e ====///====>
https://www.kdnuggets.com/2017/11/building-wikipedia-text-corpus-nlp.html https://twitter.com/HoseinFooladi7/status/1035582635454672897 nlp resources https://www.kdnuggets.com/2018/08/multi-class-text-classification-scikit-learn.html https://twitter.com/stanfordnlp/status/1035990390480887808
https://itnext.io/list-of-free-resources-to-learn-natural-language-processing-ce27231e79a2 ==//==>
https://blog.sicara.com/train-ner-model-with-nltk-stanford-tagger-english-french-german-6d90573a9486
https://nlp.stanford.edu/IR-book/information-retrieval-book.html Info retrieval Chris Manning Stanford NLP
http://ruder.io/word-embeddings-1/index.html Sebastian Ruder
https://www.kdnuggets.com/2018/10/building-question-answering-system-from-scratch.html
https://drive.google.com/file/d/1kmNAwrSlFYo0cN_DcURMOArBwe9FxWxR/view Transfer Learning by Seb Ruder 2018 Belgium
http://u.cs.biu.ac.il/~yogo/blackbox2018.pdf RNN for NLP
https://github.com/huggingface/pytorch-pretrained-BERT Google's BERT in PyTorch
https://github.com/fastai/course-v3/blob/master/nbs/dl1/00_notebook_tutorial.ipynb V3
https://github.com/huggingface/hmtl/tree/master/demo <=== HMTL NLP
https://towardsdatascience.com/neural-networks-and-philosophy-of-language-31c34c0796da w2vec, glove, fasttext
https://www.lyrn.ai/2019/01/16/transformer-xl-sota-language-model/ (https://jalammar.github.io/illustrated-transformer/)
http://nacloweb.org/
https://github.com/Deffro/text-preprocessing-techniques and https://www.kaggle.com/deffro/text-pre-processing-techniques
https://twitter.com/sleepinyourhat/status/1109507075166691329 Recommendation for parsers

General

https://www.lfd.uci.edu/~gohlke/pythonlibs/ <---- Python wheels (esp. for Windows installation issues)
https://www.analyticsvidhya.com/blog/2017/11/flashtext-a-library-faster-than-regular-expressions
https://machinelearningmastery.com/datasets-natural-language-processing/ nlp datasets
https://www.seas.upenn.edu/~romap/nlp-resources.html great collection by Univ. Penn.
https://t.co/BcBOaeBhUw Stanford nlp book (https://web.stanford.edu/~jurafsky/slp3/ <=====SSSSSSSSSSSS
https://www.youtube.com/watch?v=mhHfnhh-pB4 NLU (1:25 about dialogs) **************
http://web.stanford.edu/class/cs224n/reports.html Stanford NLP student projects
https://docs.google.com/document/d/1mkB6KA7KuzNeoc9jW3mfOthv_6Uberxs8l2H7BmJdzg/edit# NLUCS NYDC Sam Bowman
https://medium.com/nanonets/topic-modeling-with-lsa-psla-lda-and-lda2vec-555ff65b0b05
https://www.kdnuggets.com/2017/07/5-free-resources-getting-started-deep-learning-nlp.html
https://news.ycombinator.com/item?id=14639295 by YC.
https://www.kdnuggets.com/2018/03/text-data-preprocessing-walkthrough-python.html text data processing by https://www.kdnuggets.com/author/matt-mayo (https://github.com/mmmayo13) <----------
https://nlp.stanford.edu/projects/glove/ pretrained glove vecs ( see https://machinelearningmastery.com/develop-word-embeddings-python-gensim/ for working with w2v and glove at end of article and https://github.com/stanfordnlp/GloVe)
https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md FB pretrained fastText
https://dumps.wikimedia.org/arwiki/ Arabic wiki dumps
https://github.com/allenai/arc-solvers ARC solvers (see https://arxiv.org/pdf/1803.05457.pdf)
https://github.com/awesomedata/awesome-public-datasets <-----
https://www.kdnuggets.com/2018/03/text-data-preprocessing-walkthrough-python.html <=== https://github.com/Stephen-Rimac/Python-for-Data-Scientists <=== short notebook (good 4 fast.ai) https://nbviewer.jupyter.org/github/groverpr/learn_python_libraries/blob/master/pandas/pandas_cheatsheet.ipynb Pandas cheatsheet
https://www.rankred.com/javascript-cheat-sheets JS cheat sheets
http://overapi.com/ <========== All cheat sheets in one place
https://github.com/niderhoff/nlp-datasets/blob/master/README.md <===== NLP datasets ====
https://github.com/keon/awesome-nlp <======================= Awesome NLP ===============
https://machinelearningmastery.com/develop-word-embeddings-python-gensim/
https://github.com/niderhoff/nlp-datasets <=== NLP datasets in alphabetical order ====
https://catalog.ldc.upenn.edu/ldc2013t19 is the Ontonotes dataset (includes Arabic) <=======
https://github.com/slanglab/phrasemachine (En only)
https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec
https://www.analyticsvidhya.com/blog/2018/03/essentials-of-deep-learning-sequence-to-sequence-modelling-with-attention-part-i/
https://www.kaggle.com/hoonkeng/deep-eda-word-embeddings-sentiment-analysis/notebook word embeddings and sentiment analysis
http://newsletter.ruder.io/issues/nlp-pytorch-libraries-gan-tutorial-jupyter-tricks-tensorflow-things-representation-learning-making-nlp-more-accessible-michael-jordan-essay-reproducing-deep-rl-rakuten-data-challenge-naacl-outstanding-papers-106347
http://newsletter.ruder.io/issues/bert-transfer-learning-for-dialogue-deep-learning-sota-2019-gaussian-processes-vi-nlp-lesson-curricula-fast-ai-lessons-alphastar-how-to-manage-research-teams-154831 <==== (see other issues)
http://web.stanford.edu/class/cs224n/lectures/lecture10.pdf seq2seq translation
https://github.com/bentrevett/pytorch-seq2seq
https://github.com/ArthurSpirling/Text-as-Data-Class-Spring-2018- lectures in R
https://blogs.technet.microsoft.com/machinelearning/2018/04/24/deep-learning-for-emojis-with-vs-code-tools-for-ai/
https://www.kaggle.com/arthurtok/spooky-nlp-and-topic-modelling-tutorial
https://datascienceplus.com/scikit-learn-for-text-analysis-of-amazon-fine-food-reviews/
https://medium.com/@alyafey22/sentiment-classification-from-keras-to-the-browser-7eda0d87cdc6
https://www.reddit.com/r/textdatamining/ text mining reddit
https://medium.com/@datancoffee/predicting-user-engagement-with-news-on-reddit-using-kaggle-or-colab-d5ef0dcaff6a Kaggle GPU
https://www.kaggle.com/dansbecker/running-kaggle-kernels-with-a-gpu <==== Kaggle GPU example <======
https://summari.es/ dataset for many articles for text summarization
https://github.com/Mybridge/machine-learning-articles Top 10 ML articles monthly
https://medium.mybridge.co/machine-learning-top-10-articles-for-the-past-month-v-may-2018-681489a05135 Top 10 May
https://machinelearningmastery.com/best-practices-document-classification-deep-learning/
https://github.com/eisenjulian/nlp_estimator_tutorial (https://opendatascience.com/text-classification-with-tensorflow-estimators/) NLP Tutorial *********************
https://github.com/google/sentencepiece / https://www.kaggle.com/lefant/example-usage-pre-trained-bpe-embeddings/code
Berkeley Neural Parser: https://github.com/nikitakit/self-attentive-parser
http://www.arabicnlp.pro/alp/ Arabic ALP - An online Arabic linguistic tool <========== https://github.com/Barqawiz/Shakkala Shakkala project (see also https://www.linkedin.com/feed/update/urn:li:activity:6509847630170984448)
https://github.com/sebastianruder/NLP-progress <==== https://explained.ai/rf-importance/index.html Random forest feature importance <=============

Torch

https://github.com/WillKoehrsen/pytorch_challenge/blob/master/Transfer%20Learning%20in%20PyTorch.ipynb
https://github.com/perone/medicaltorch/ Medical Torch
https://github.com/bentrevett/pytorch-sentiment-analysis <=============== https://code.fb.com/ai-research/pythia/ <===============

Tensor Flow / Keras

https://medium.com/tensorflow/introducing-tensorflow-federated-a4147aa20041 TensorFlow Federated (TFF)
https://medium.com/tensorflow/text-classification-using-tensorflow-js-an-example-of-detecting-offensive-language-in-browser-e2b94e3565ce?linkId=64850815 TExt Classification in browser
https://www.blog.google/technology/ai/creative-coder-adding-color-machine-learning image coloring
https://github.com/oswaldoludwig/Seq2seq-Chatbot-for-Keras chatbot in Keras
https://github.com/zaidalyafeai/Notebooks/blob/master/README.md TF 2.0 examples (Zaid Yafeai)
https://twitter.com/fchollet/status/1105139360226140160 TF 2.0 crash course (Francois Chollet)

TorchText

https://www.analyticsvidhya.com/blog/2018/02/pytorch-tutorial/ <==== pyTorch
https://github.com/mjc92/TorchTextTutorial
https://towardsdatascience.com/use-torchtext-to-load-nlp-datasets-part-i-5da6f1c89d84 (see also part II and FB's StarSpace) https://towardsdatascience.com/use-torchtext-to-load-nlp-datasets-part-ii-f146c8b9a496
https://towardsdatascience.com/learning-note-starspace-for-multi-label-text-classification-81de0e8fca53 StarSpace
http://anie.me/On-Torchtext/ (see also repos at https://github.com/windweller)
https://medium.com/@hiromi_suenaga/deep-learning-2-part-1-lesson-4-2048a26d58aa series from Fast.ai learner. See all at https://medium.com/@hiromi_suenaga
https://github.com/PetrochukM/PyTorch-NLP Torch NLP (new) and https://github.com/outcastofmusic/quick-nlp
https://www.cpuheater.com/deep-learning/introduction-to-recurrent-neural-networks-in-pytorch/
https://rockt.github.io/2018/04/30/einsum <=== EinSum https://github.com/rockt
https://medium.com/@hiromi_suenaga/deep-learning-2-part-2-lesson-8-5ae195c49493 Fast.ai notes
https://towardsdatascience.com/deep-learning-book-notes-chapter-3-part-1-introduction-to-probability-49d13c997f2a
https://github.com/vi3k6i5/flashtext replace keywords in sentences or extract keywords from sentences
https://github.com/rtqichen/torchdiffeq ODE solvers
https://github.com/facebookresearch/pytext PyText

TextaCy

(based on spaCy, GH https://github.com/chartbeat-labs/textacy)
https://towardsdatascience.com/summarizing-tweets-in-a-disaster-e6b355a41732 tweets corpora. Also https://medium.com/data-for-democracy/learning-to-track-refugees-a90aa334a0a2
https://towardsdatascience.com/twitter-api-and-nlp-7a386758eb31 Tweets by different users (code, with tweets miner class at https://github.com/elaiken3/twitter_api-nlp-project1)

spaCy

https://realpython.com/natural-language-processing-spacy-python/ Real Python
https://github.com/cristianasp/spacy Spacy course in notebooks
https://www.datacamp.com/courses/advanced-nlp-with-spacy based on 2.1 version by Ines
https://github.com/explosion/spacy-stanfordnlp https://stanfordnlp.github.io/stanfordnlp/
https://github.com/kororo/excelcy <====================== train and annotate from Excel =====================
https://medium.com/@ageitgey/natural-language-processing-is-fun-9a0bff37854e
https://dataflume.wordpress.com/2017/03/17/intro-nlp-python-spacy/
https://nbviewer.jupyter.org/github/repmax/topic-model/blob/master/topic-modelling.ipynb *****
https://github.com/oudalab/Arabic-NER Spacy Ar NER http://mlreference.com/spacy
https://twitter.com/julien_c/status/977189681283633153 Neuralcoref v2.0 Stanford NLP, Coref link
https://twitter.com/spacy_io/status/977286812510060545 visualize spacy with tensorborad
https://github.com/explosion/talks/blob/master/2018-04-12__Embed-Encode-Attend-Predict.pdf **************
https://github.com/explosion/talks/blob/master/2018-04_12__Rapid-NLP-Annotation.pdf **************
See https://github.com/explosion/talks (latest in SF: https://www.youtube.com/watch?v=jB1-NukGZm0)
https://spacy.io/usage/linguistic-features edit and run in browser
https://spacy.io/universe/ stuff built for or with spaCy
http://www.joshblog.co.uk/posts/20180210_python_text_visualization_coffee_analysis_part_III
http://wp.arenji.com/2018/05/09/beyond-intents-and-entities/
https://www.reddit.com/r/textdatamining/comments/8k3tpk/getting_started_with_spacy_for_natural_language/?st=jhai93us&sh=d8ab3dc3
explosion/spaCy#2931
https://allenai.github.io/scispacy/ ScispaCy! - A full spacy pipeline for #biomedical text
https://nbviewer.jupyter.org/github/repmax/topic-model/blob/master/topic-modelling.ipynb
https://github.com/textpipe/textpipe clean and extract metadata from text <===============
https://blog.quanteda.org/2019/03/28/using-spacy-v2.1-with-spacyr/
https://github.com/explosion/spacy-stanfordnlp use Stanord NLP in SpaCy

Gensim

https://github.com/RaRe-Technologies/talks/tree/master/2018-03-23_MLPrague-workshop Notebooks on W2Vec, FT, etc.
https://nbviewer.jupyter.org/github/repmax/topic-model/blob/master/topic-modelling.ipynb Gensim, Spacy and textacy

CoreNLP

https://stanfordnlp.github.io/stanfordnlp/ new, Pytorch tokenization, POS and Dep. <===== https://levelup.gitconnected.com/first-look-at-stanfordnlp-2b7d43190957 <=================== https://www.analyticsvidhya.com/blog/2019/02/stanfordnlp-nlp-library-python/ <=====
https://stanfordnlp.github.io/CoreNLP/ v 3.9.1
https://stanfordnlp.github.io/CoreNLP/human-languages.html Level of support for Arabic (limited). For an Arabic lemmatizer, check http://oujda-nlp-team.net/en/programms/lemmatizer/ (not tested), this abstract (https://arxiv.org/pdf/1710.06700.pdf) and their related Farasah tool (Java): http://alt.qcri.org/farasa/segmenter.html (http://qatsdemo.cloudapp.net/farasa/) See also earlier work called SALMA (https://www.researchgate.net/publication/261313488_SALMA_Standard_Arabic_Language_Morphological_Analysis) and Madamira: https://camel.abudhabi.nyu.edu/madamira/ or http://innovation.columbia.edu/technologies/cu14012_morphological-analysis-and-disambiguation-for-dialectal-arabic-madamira
http://kaldi-asr.org the Kaldi toolkit for Arabic speech recognition (https://github.com/kaldi-asr/kaldi)
https://www.khalidalnajjar.com/setup-use-stanford-corenlp-server-python/
https://blog.sicara.com/train-ner-model-with-nltk-stanford-tagger-english-french-german-6d90573a9486 <====
http://web.stanford.edu/class/cs224n/reports.html SQuAd projects
https://gitlab.com/tcool/nlpviz-batch NLP Viz

Rasa / Intents

https://github.com/RasaHQ/rasa_nlu figure out intents See also: https://www.youtube.com/watch?v=0hZay4KSLFw <======================
and https://www.youtube.com/watch?v=2kaILO_ERgY (recast.ai now taken by SAP)
See also: https://github.com/mlehman/nlp-intent-toolkit (Java)
Techniques: https://www.quora.com/What-techniques-are-generally-used-for-intent-recognition-in-NLP
From ChatBotsLife, see Text Classification
https://www.analyticsvidhya.com/blog/2018/01/faq-chatbots-the-future-of-information-searching
https://medium.com/rasa-blog/supervised-word-vectors-from-scratch-in-rasa-nlu-6daf794efcd8

http://www.realworldnlpbook.com/blog/training-sentiment-analyzer-using-allennlp.html ***** http://www.realworldnlpbook.com/blog/improving-sentiment-analyzer-using-elmo.html
https://github.com/mhagiwara/realworldnlp/blob/master/examples/sentiment/sst_classifier.py
https://nlp.stanford.edu/sentiment/

http://wp.arenji.com/2018/05/09/beyond-intents-and-entities/
https://github.com/jsalt18-sentence-repl/jiant transfer learning (has Arabic, see also https://arxiv.org/pdf/1809.05053v1.pdf)

Twitter tools

https://github.com/bear/python-twitter Python Twitter (see also Tweepy)

RegEx

Brandon Rohrer's collection: https://www.oreilly.com/ideas/an-introduction-to-regular-expressions, https://regexr.com/ and https://rubular.com/ (Ruby)
https://regex101.com/ online tool
https://www.kdnuggets.com/2018/04/python-regular-expressions-cheat-sheet.html <=========
https://www.youtube.com/watch?v=VU60rEXaOXk jupyter notebook (many episodes) https://github.com/CoreyMSchafer/code_snippets/blob/master/Python-Regular-Expressions/snippets.txt regex codes
https://www.youtube.com/watch?v=sa-TUpSx1JA in general (see below, same person)
https://www.youtube.com/watch?v=K8L6KVGG-7o in Python using re (code at https://github.com/CoreyMSchafer/code_snippets/tree/master/Python-Regular-Expressions)
https://www.youtube.com/watch?v=zN8rwVXwRUE python from edureka
https://www.youtube.com/watch?v=ZdDOauFIDkw oldish, Python
https://www.youtube.com/watch?v=sZyAn2TW7GY oldish, python
https://www.youtube.com/watch?v=r6I-Ahc0HB4 (very basic series 4 or more: https://github.com/iamshaunjp/regex-playlist)
https://www.dataquest.io/blog/large_files/python-regular-expressions-cheat-sheet.pdf <=== cheatsheet

Misc

http://www.cs.columbia.edu/~amueller/comsw4995s18/schedule/ Applied Machine Learning Spring 2018
http://www.datasciencefree.com/cheatsheets.html ****
https://www.kaggle.com/learn/overview Hands-On data science edu from Kaggle
https://github.com/shik3519/machine-learning/blob/master/tutorials/003-python-basics-numpy-regex.ipynb Py/NP/RegEx ***
https://www.analyticsvidhya.com/blog/2018/02/top-5-github-repositories-january-2018 (Top 5 Data Science & Machine Learning Repositories)
https://testdriven.io/building-a-concurrent-web-scraper-with-python-and-selenium#.WorbWQNWXbw.twitter
https://arxiv.org/ftp/arxiv/papers/1702/1702.07835.pdf survey of free Arabic corpora (great list) <=====
UN corpora: http://www.uncorpora.org/
https://blog.jupyter.org/jupyterlab-is-ready-for-users-5a6f039b8906 Announcing Jupyter lab beta
https://machinelearningmastery.com/resources-for-linear-algebra-in-machine-learning/ linear algebra resources https://fossbytes.com/10-cheat-sheets-programming-languages/
https://github.com/zhezhaoa/ngram2vec https://github.com/maciejkula/glove-python ***
https://towardsdatascience.com/word-embeddings-exploration-explanation-and-exploitation-with-code-in-python-5dac99d5d795
https://www.slideshare.net/TessFerrandez/notes-from-coursera-deep-learning-courses-by-andrew-ng ****
https://github.com/ilkarman/DeepLearningFrameworks/tree/master/notebooks deep learning frameworks comparison notebooks
https://github.com/lukas/ml-class Keras and Scikit class notes
https://github.com/janniec/GinsBot use Gensim W2vec to predict sentence completion (legal opinions) https://realpython.com/python-speech-recognition/ speech recognition in Python
http://serialmentor.com/dataviz/aesthetic-mapping.html Fata Viz e-book
https://github.com/geopandas/geopandas GeoPandas mapping
https://www.kdnuggets.com/2018/03/catboost-vs-light-gbm-vs-xgboost.html XG/CAT/boost & LightGBM
https://github.com/anati89/wazen (see https://www.linkedin.com/pulse/wazen-arabic-nlp-project-find-word-variation-pattern-root-al-anati/)
https://www.kdnuggets.com/2017/09/essential-data-science-machine-learning-deep-learning-cheat-sheets.html
https://realpython.com/python-matplotlib-guide
https://github.com/fastai/fastai
https://medium.com/@jamesdell/if-i-can-you-can-and-you-should-a470d7aea89d bird classification in fastai
https://aischool.microsoft.com/
https://www.kdnuggets.com/2018/03/5-things-sentiment-analysis-classification.html
https://www.kaggle.com/rtatman/data-cleaning-challenge-handling-missing-values <=====
https://www.kaggle.com/athoul01/predicting-interview-attendence uses fast.ai
https://github.com/collections/machine-learning <==== ML collection
https://www.kdnuggets.com/2018/04/right-metric-evaluating-machine-learning-models-1.html <== eval metrics
https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf Pandas cheat sheet http://www.dataschool.io/learn/
http://gael-varoquaux.info/interpreting_ml_tuto/index.html
https://github.com/NavinManaswi/Book-Deep-Learning-Applications-with-Applications-Using-Python
https://github.com/awesomedata/awesome-public-datasets
https://www.reddit.com/r/datasets/
https://www.pyimagesearch.com/2018/05/07/multi-label-classification-with-keras/ Image classification (clothes) <===== https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68 *********
http://scikit-learn.org/stable/auto_examples/applications/plot_topics_extraction_with_nmf_lda.html
http://scikit-learn.org/stable/modules/clustering.html Clustering in scikit-learn <==========
https://towardsdatascience.com/improving-the-interpretation-of-topic-models-87fd2ee3847d
https://stackoverflow.com/questions/45145368/python-scikit-learn-get-documents-per-topic-in-lda assign topics
https://elitedatascience.com/category/tutorials
https://docs.google.com/document/d/1dr4GvVtnOf60x1uj4PbYeoFDZMumzic11S_drqjlT08/edit PyCon2018 resources <======== https://www.youtube.com/watch?v=q42hCs2E4So <=== API design
https://www.youtube.com/watch?v=PXJtFc8DjsE Siraj Raval - AI Education channel
https://www.reddit.com/r/MachineLearning/comments/8j9rx8/d_what_are_the_best_libraries_frameworks_out/ <++++++++++
https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/
https://towardsdatascience.com/how-to-train-neural-network-faster-with-optimizers-d297730b3713
https://www.kdnuggets.com/2018/10/confusion-matrices-quantify-cost-being-wrong.html
https://github.com/keiffster/program-y/wiki/Tutorial-Pattern-Matching

To watch on Ch9

https://channel9.msdn.com/Events/Connect/2017/AI1 intro to MS AI https://channel9.msdn.com/Events/Connect/2017/AI2 intro to bots https://channel9.msdn.com/Events/Connect/2017/T100 use Azure bot srvice

Chatbots

https://medium.com/tensorflow/a-transformer-chatbot-tutorial-with-tensorflow-2-0-88bf59e66fe2
https://medium.com/analytics-vidhya/building-a-simple-chatbot-in-python-using-nltk-7c8c8215ac6e
https://github.com/deepmipt <==== see DeepPavlov and Intent-Classifier. See also: https://github.com/snipsco/snips-nlu and other repos
https://ankitbko.github.io/2017/03/human-handover-bot/
https://www.gobeyond.ai/
https://github.com/vishwanathsrikanth/mycode/tree/master/SimpleProactiveBot https://github.com/jamesemann/botframeworkresources
https://blog.botframework.com/2017/12/13/conversational-bots-deep-dive-whats-new-general-availability-azure-bot-service-language-understanding/
https://github.com/jamesemann/dotnetyork
https://building.lang.ai/sorry-i-didnt-get-that-how-to-understand-what-your-users-want-a90c7ca18a8f

https://gitter.im/Microsoft/BotBuilder <== gitter blog.botframework.com
https://code.msdn.microsoft.com/ look for bot stuff
https://twitter.com/jssuthahar (https://www.c-sharpcorner.com/article/getting-started-with-receipt-card-design-using-microsoft-bot-framework/)
https://twitter.com/rajeeshmenoth () https://www.c-sharpcorner.com/article/getting-started-chatbot-using-azure-bot-service/ (https://www.c-sharpcorner.com/search/bot)
https://github.com/SherifElMahdi/botsfromzerotohero Sherif elMahdi ??
https://github.com/DanielEgan/BotWorkshop <------------------- https://github.com/mithun-prasad/Bot/ <-------------- MS, slides and code, testing bots etc.
https://ankitbko.github.io/2017/03/human-handover-bot/ hand-over
https://github.com/tompaana/bot-message-routing intermediator see above .. https://github.com/alyssaong1/botfwk-scenarios
https://github.com/jamescarpinter/bot-service
https://github.com/RobStand/IgniteDemoThr (see other repos)
https://youtu.be/cumYtCVjl6Q?t=1013 (cards)
https://github.com/JoeMayo/MSBotFrameworkBook and https://github.com/JoeMayo/BotDemos MS Press book chapters code (see http://blog.botcontext.com/) and https://github.com/DanielEgan/BotWorkshop (see pdf in isamara folder 70 p)
https://github.com/alyssaong1?tab=repositories <--- node.js examples
https://docs.microsoft.com/en-us/bot-framework/dotnet/bot-builder-dotnet-quickstart <=====
https://github.com/MicrosoftBotFrameworkDiplomado/TanukiBOT <-- Cosmosdb
https://mva.microsoft.com/en-US/training-courses/17590?l=ALwJe9kqD_4000115881 intro
https://channel9.msdn.com/Series/Explain/Bots-101-Scenarios-for-bots <=== no tech intro
https://channel9.msdn.com/Shows/AI-Show/Announcing-General-Availability-of-Azure-Bot-Service-and-Language-Understanding-service?term=bot Announcing GA of Bot Framework
(see also https://mva.microsoft.com/search/SearchResults.aspx#!q=ASP.NET%20Core&lang=1033)
https://docs.microsoft.com/en-us/bot-framework/dotnet/bot-builder-dotnet-quickstart **** Bot Framework (VS2017)
https://docs.microsoft.com/en-us/bot-framework/dotnet/bot-builder-dotnet-overview Bot SDK docs
https://github.com/Microsoft/BotBuilder-Samples/tree/master/CSharp Samples <==============
https://github.com/Microsoft/BotBuilder/tree/master/CSharp/Samples other samples
https://channel9.msdn.com/events/DEVintersection/DEVintersection-2017-Las-Vegas/GB009?term=bot *************
https://channel9.msdn.com/events/Connect/2017/AI2?term=bot James Carpinter (https://github.com/jamescarpinter)
https://channel9.msdn.com/events/Ignite/Microsoft-Ignite-Orlando-2017/BRK3301?term=bot
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/linux-dsvm-intro
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/linux-dsvm-walkthrough
http://nlp.town/blog/sentence-similarity/ comparing sentence similarity
https://www.kdnuggets.com/2018/05/50-useful-machine-learning-prediction-apis-2018-edition.html 50 prediction APIs
https://github.com/neerjad/DataVisualization dataviz in different packages

VS and AI

https://marketplace.visualstudio.com/items?itemName=ms-toolsai.vstoolsai-vs2017 Tools for AI in VS
https://github.com/Microsoft/vs-tools-for-ai/
https://blogs.msdn.microsoft.com/visualstudio/2018/05/07/introducing-visual-studio-intellicode/
http://onnx.ai/
https://www.microsoft.com/en-us/aiforearth

Extra

These were shared on Linked In:

  1. Stanford’s Natural Language Processing with Deep Learning:
  1. Oxford’s Natural Language Processing with Deep Learning https://lnkd.in/eraYdz5
  2. Scikit-learn tutorial https://lnkd.in/eVHdni8 (see also https://zablo.net/blog/post/pandas-dataframe-in-scikit-learn-feature-union ) ****
  3. edX – Microsoft https://lnkd.in/eWH6Vr2
  4. Deep Learning for Natural Language Processing: Tutorials with Jupyter Notebooks https://lnkd.in/e3HRKiv
  5. Natural Language Processing with Python http://www.nltk.org/book/
  6. How to solve 90% of NLP problems: a step-by-step guide https://lnkd.in/e-vU2az
  7. NLP with Keras https://lnkd.in/eFirb_6
  8. Twitter Sentiment Analysis Using Combined LSTM-CNN Models https://lnkd.in/er8fzRC

https://www.reddit.com/r/MachineLearning/comments/8j9rx8/d_what_are_the_best_libraries_frameworks_out/ https://github.com/ioam/holoviews (https://t.co/Ke7QTFQpIY) <=== dataviz
https://github.com/jupyter/nbdime diff/merge for notebooks

Transfer Learning

http://nlp.fast.ai/classification/2018/05/15/introducting-ulmfit.html + http://nlp.fast.ai/category/model_zoo.html +
http://forums.fast.ai/t/language-model-zoo-gorilla/14623
https://github.com/binga/fastai_notes/blob/master/experiments/notebooks/lang_models/WikiExtractor.py
https://github.com/sgugger/Deep-Learning/blob/master/Building%20a%20French%20LM.ipynb
https://github.com/nafizh/Neural_language_model_bangla
https://github.com/binga/fastai_notes/blob/master/experiments/notebooks/lang_models/Telugu_Language_Model.ipynb
https://www.kaggle.com/anshulrai/preprocessing-train-and-test-data/code sentencepiece
https://github.com/Separius/awesome-sentence-embedding (may be https://arxiv.org/abs/1806.06259)
https://code.fb.com/ai-research/laser-multilingual-sentence-embeddings/ <===== LASER

https://github.com/jupyter-widgets/ipyleaflet
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2
https://machinelearningmastery.com/develop-neural-machine-translation-system-keras/
https://www.kdnuggets.com/2018/05/data-labeling-machine-learning.html/2 data labeling @@@@@@@@
https://www.kdnuggets.com/2018/07/ultimate-list-web-scraping-tools-software.html
https://www.kdnuggets.com/2018/07/receiver-operating-characteristic-curves-demystified-python.html
https://www.datasciencecentral.com/profiles/blogs/comparison-of-top-6-python-nlp-libraries

https://t.co/CdPyuEE4eb https://www.kdnuggets.com/2018/07/5-quick-easy-data-visualizations-python-code.html
https://medium.com/@shub777_56374/learn-deep-learning-with-gpu-enabled-kaggle-kernels-and-fastai-mooc-72fee41bb4b5
https://towardsdatascience.com/semantic-code-search-3cd6d244a39c

https://towardsdatascience.com/multi-label-text-classification-with-scikit-learn-30714b7819c5

https://medium.com/apache-mxnet/gluonnlp-deep-learning-toolkit-for-natural-language-processing-98e684131c8a
https://github.com/dmlc/gluon-nlp

https://github.com/akshaybahadur21/DigiEncoder OpenCV, Keras, TF
https://akshaydominator.wixsite.com/akshaybahadur21 (author)

https://www.kdnuggets.com/2017/03/email-spam-filtering-an-implementation-with-python-and-scikit-learn.html e-mail spam

https://github.com/miguelgfierro/sciblog_support/blob/master/Intro_to_Fraud_Detection/fraud_detection.ipynb Fraud detection

https://github.com/WillKoehrsen/taxi-fare/blob/master/Start%20Simple.ipynb ny taxi
https://www.kaggle.com/willkoehrsen/a-walkthrough-and-a-challenge ny Taxi Will Koehrsen
https://github.com/bakrianoo/aravec ========//=====> Ara Vec
https://www.codementor.io/oluwagbengajoloko/how-to-scrape-data-from-a-website-using-python-n3fmtc63q
https://www.kdnuggets.com/2018/08/make-machine-learning-models-robust-outliers.html <===//=== https://machinelearningmastery.com/use-pre-trained-vgg-model-classify-objects-photographs/

On using git: git for poets (https://www.youtube.com/watch?v=BCQHnlnPusY) , also https://www.youtube.com/watch?v=SWYqp7iY_Tc and https://www.youtube.com/watch?v=HVsySz-h9r4 - mostly note to self
https://guides.github.com/features/mastering-markdown/
https://humanitiesdata.com/ humanities data sets
https://medium.freecodecamp.org/the-power-of-a-neuron-9b5526c2ed46 deep image learn
https://github.com/diux-dev/imagenet18 image net in 18 min

https://cds.nyu.edu/newsletter/
http://cs231n.github.io/ Andrej Karpathy Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition
https://www.pyimagesearch.com/2018/10/22/object-tracking-with-dlib/ object tracking with dlib

math 4 ML

https://brohrer.github.io/calculus_resources.html Calculus resources for ML
https://brohrer.github.io/blog.html
https://www.youtube.com/user/profrobbob/playlists (https://www.profrobbob.com/home) ******
https://www.youtube.com/c/ProfGhristMath ******
https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw ******
http://explained.ai/matrix-calculus/index.html The Matrix Calculus You Need For Deep Learning
https://betterexplained.com/articles/a-gentle-introduction-to-learning-calculus/ ?
https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw 3blue1brown
https://www.youtube.com/channel/UC9SPN6qaM0DB455-DrWAdpA?pbjreload=10 Tarrou Chalk Talk
https://github.com/pim-book/programmers-introduction-to-mathematics
http://vmls-book.stanford.edu/ Algebra https://medium.com/@rathi.ankit/multivariate-calculus-for-data-science-abccf47fce0f

Fast.Ai

"Val loss lower than train loss means you are under-fitting. Val loss should always be higher than train loss when you are finished fitting." JH
https://github.com/fastai/fastai
https://github.com/radekosmulski/quickdraw
https://medium.com/@wgilliam/finding-data-block-nirvana-a-journey-through-the-fastai-data-block-api-c38210537fe4 <===
https://medium.com/@keremturgutlu/understanding-building-blocks-of-ulmfit-818d3775325b (https://www.kaggle.com/keremt, https://github.com/KeremTurgutlu) <=======
https://hackernoon.com/rtx-2080ti-vs-gtx-1080ti-fastai-mixed-precision-training-comparisons-on-cifar-100-761d8f615d7f
https://github.com/PPPW/deep-learning-random-explore/blob/master/CNN_archs/cnn_archs.ipynb
https://cristianduguet.com/an-only-text-description-of-text-generation-using-neural-networks/ <================
Examples:
https://github.com/jantic/deoldify == https://www.colorize.ml/
https://www.kaggle.com/tamlyn/titanic-fastai Titanic
https://github.com/etown/dl1/blob/master/UrbanSoundClassification.ipynb
https://github.com/poppingtonic/dl-studies/blob/master/imageCLEF2013_plant_types.ipynb
https://github.com/kheyer/ML-DL-Projects/blob/master/Pets%20TSNE/pets_tsne.ipynb
https://github.com/MicPie/fastai_course_v3/blob/master/L1-stonefly.ipynb
https://medium.com/@aayushmnit/transfer-learning-using-the-fastai-library-d686b238213e
https://blog.usejournal.com/if-i-can-you-can-and-you-should-a470d7aea89d + http://nbviewer.jupyter.org/github/jamesdellinger/fastai_practical_deep_learning_course_v3/blob/master/lesson1_mini_sideproject.ipynb?flush_cache=true
https://medium.com/@alenaharley/the-mystery-of-the-origin-cancer-type-classification-using-fast-ai-libray-212eaf8d3f4e
https://forums.fast.ai/t/share-your-work-here/27676/197 audio
https://colab.research.google.com/drive/16BiLegPGx5G911B15wAUBSKjFZRcyhxR sound
https://github.com/Insiyaa/Music-Genre-Classification
https://medium.com/@zachcaceres/deep-learning-can-we-use-computer-vision-to-predict-the-composer-of-classical-music-464dd5516996 (https://github.com/zcaceres/deep-learning-composer-prediction) sound
https://medium.com/@johnhartquist/audio-classification-using-fastai-and-on-the-fly-frequency-transforms-4dbe1b540f89 audio + github: https://github.com/sevenfx/fastai_audio (https://twitter.com/jeremyphoward/status/1092913576459661313)
https://colab.research.google.com/drive/1AwlscWIaaygthL7p0ZQfbREmiM19NFUb Fashion
https://github.com/albertnaur/fastaiNotebooks/blob/master/mammo/mammo_tiles.ipynb
https://github.com/ademyanchuk/course-v3/blob/master/nbs/dl1/lesson-1-pneumonia-dai.ipynb
https://github.com/dzlab/deepprojects/blob/master/classification/EyeEm_Image_Dataset_Download.ipynb
https://redditech.blog/2018/11/04/hosting-fastai-app-in-azure-websites-for-containers/ web app azure
https://github.com/nikhilno1/healthy-or-not + https://github.com/nikhilno1/healthy-or-not/blob/master/heroku-deploy.md
https://gist.github.com/hkristen/971af4233952c506b8cfbcfc007c52c1 plants https://gist.github.com/oguiza/c9c373aec07b96047d1ba484f23b7b47 (https://gist.github.com/oguiza/26020067f499d48dc52e5bcb8f5f1c57) time series
https://medium.com/@lankinen/fastai-model-to-production-this-is-how-you-make-web-app-that-use-your-model-57d8999450cf production how-to
https://github.com/sparalic/Poisonous-Plants-Image-Classifier/tree/master poisonous plants
https://github.com/etown/dl1/blob/master/face/README.md face emotions
https://github.com/oasis789/Arabic-Handwritten-Characters-Dataset Arabic hand written chars (https://www.primaresearch.org/RASM2018/)
https://github.com/tchambon/deepfrench/blob/master/ULMFit%20Classifier%20example.ipynb Deep French calssification
https://medium.com/@tmckenzie.nz/using-the-fastai-data-block-api-b4818e72155b data block API
https://medium.com/@iliazaitsev/how-to-implement-a-recommendation-system-with-deep-learning-and-pytorch-2d40476590f9 collab
https://medium.com/@surhar88/fast-ai-journey-part-1-lesson-3-theory-review-learning-rates-and-activation-functions-ab967a5f0eec
https://gist.github.com/joshfp/b62b76eae95e6863cb511997b5a63118 Tabular/ulmfit (https://forums.fast.ai/t/share-your-work-here/27676/526)
https://medium.com/@meghana97g/classification-of-tumor-tissue-using-deep-learning-fastai-77252ae16045 Tumor tissue
https://github.com/btahir/age-detector
https://github.com/dzlab/deepprojects/blob/master/collabfiltering/Collaborative_Filtering_Book_Recommendation.ipynb collab books
https://radekosmulski.github.io/answers/html/What%20are%20pretrained%20models%3F.html pretrained models?
https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000
https://github.com/LaurenSpiegel/course-v3/blob/master/nbs/dl1/age_regression.ipynb age regression
https://gist.github.com/andrewreece/9112dc5f893f79d53dc4d219905dd92a lang model GUtenburg
https://vinpetersen.github.io/2018-11-23-a-guided-tour-through-a-convolutional-neural-network-part-1/ CNNs (4 parts)
https://github.com/anurags25/FastAI-LIME/blob/master/LIME-Pets.ipynb Lime adapted
https://medium.com/@diazagasatya/will-dropout-regularization-prevents-your-model-to-overfit-11afa10cd4e0 dropout
https://medium.com/@lunchwithalens/deploying-my-fastai-predictor-to-microsoft-azure-c7e635d464a1 deploy to Azure
https://forums.fast.ai/uploads/default/original/2X/a/aba4b114ffabc2cfe69e570d39090f88103e3ca4.pdf thermal face auth.
https://towardsdatascience.com/how-to-build-an-image-duplicate-finder-f8714ddca9d2 duplicatae image detector
https://github.com/ricknta/fake-news fake news detection
https://github.com/martin-merener/deep_learning/tree/master/more_transfer_learning Misc
https://github.com/ttdoucet/mnist/blob/master/mnist.ipynb Mnist
https://github.com/renato145/fastai_scans medical imaging
https://nbviewer.jupyter.org/github/shubhajitml/crop-disease-detector/blob/master/notebook/plant_village.ipynb crop disease
https://medium.com/@pierre_guillou/data-augmentation-by-fastai-v1-84ca04bea302 augmentation
https://twitter.com/icecold_spinbot/status/1090638453493116929 train/test leakage
https://github.com/DaveSmith227/deep-elon-tweet-generator (https://deepelon.com/) Deep Elon Musk Tweets
https://github.com/tchambon/DeepGuru Deep Guru tweets
https://towardsdatascience.com/cutting-edge-face-recognition-is-complicated-these-spreadsheets-make-it-easier-e7864dbf0e1a https://github.com/JoshVarty/ImageClassification/blob/master/3_CountingAgain.ipynb counting objects (vision)
https://raimanu-ds.github.io/tutorial/can-ai-guess-which-the-simpsons-character/ recognize simpsons (vision) https://medium.com/@JamesDietle/beginning-deep-learning-classifications-in-the-gastrointestinal-tract-with-fast-ai-7a97e3924b96 https://twitter.com/JamesDietle vision medical
https://github.com/kheyer/ML-DL-Projects/tree/master/Experiments/Residual%20UNets residual unets (vision) https://towardsdatascience.com/the-keys-of-deep-learning-in-100-lines-of-code-907398c76504 cancer diag (vision) <===
https://www.kaggle.com/deepbilal/using-fastai-tabular-on-petfinder/output Pet Finder competition
https://forums.fast.ai/t/lesson-4-advanced-discussion/30319/85 text gen = beam search (https://forums.fast.ai/t/improving-text-generation/31467) https://www.youtube.com/watch?v=RLWuzLLSIgw Andrew Ng <======
https://github.com/bfarzin/wiki103_from_scratch WikiText training and classification <=====
https://github.com/ohmeow/seq2seq-pytorch-fastai seq2seq RNNs with fastai
https://hackernoon.com/@init_27 Interviews on deep learning
https://medium.com/@pierre_guillou/data-augmentation-by-fastai-v1-84ca04bea302 images via fastai
https://forums.fast.ai/t/lesson-4-advanced-discussion/30319/126 collab wiki post
https://forums.fast.ai/t/time-series-sequential-data-study-group/29686/19 time series study group
https://medium.com/@pierre_guillou/deep-learning-in-practice-dl-series-69d39e22132b Series (DL in pracice)

Rapids: https://medium.com/rapids-ai/using-rapids-with-pytorch-e602da018285 (tabular)

Python Hosting (free)

http://kyso.io/ ex. Bokeh
https://www.pythonanywhere.com/
https://www.heroku.com/
https://course.fast.ai/deployment_render.html
https://course.fast.ai/deployment_zeit.html
https://realpython.com/flask-connexion-rest-api/ API in Flask

Tools collection

https://github.com/hardikvasa/google-images-download Google image downloader
https://github.com/cwerner/fastclass + https://www.christianwerner.net/tech/Build-your-image-dataset-faster (download images)
https://github.com/wfleshman/DatasetScraper
https://github.com/aakashns/servefastai + https://www.youtube.com/watch?v=xwN7arEgvBg "takes a FastAI Learner and creates a web-based UI where you can upload one more images and check your model’s predictions"
https://github.com/gurvindersingh/mlapp
https://towardsdatascience.com/building-web-app-for-computer-vision-model-deploying-to-production-in-10-minutes-a-detailed-ec6ac52ec7e4?gi=2f093af38b0c (https://github.com/pankymathur/fastai-vision-app)
https://github.com/tchambon/LabelMyTextWidget text labeling widget
https://github.com/unit8co/vegans Train your own GANs

visual explanations

http://setosa.io/ https://platform.ai/ http://matrixmultiplication.xyz/
http://neuralnetworksanddeeplearning.com/ https://nbviewer.jupyter.org/gist/joshfp/85d96f07aaa5f4d2c9eb47956ccdcc88/lesson2-sgd-in-action.ipynb
https://seeing-theory.brown.edu/basic-probability/index.html Stats
http://playground.tensorflow.org Neural nets
https://ezyang.github.io/convolution-visualizer/index.html conv networks visualizer https://twitter.com/jeremyphoward/status/1098758987271426048 SGD 3D
https://github.com/google-research/google-research/tree/master/interpretability_benchmark
https://www.youtube.com/watch?v=JB8T_zN7ZC0 How cnns work

LSTM, BERT, GANs, GPT-2

https://medium.com/huggingface/introducing-fastbert-a-simple-deep-learning-library-for-bert-models-89ff763ad384
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://medium.com/dissecting-bert
http://juditacs.github.io/2019/02/19/bert-tokenization-stats.html
https://github.com/huggingface/pytorch-pretrained-BERT (https://twitter.com/Thom_Wolf/status/1094976790001520640)
https://medium.com/huggingface/multi-label-text-classification-using-bert-the-mighty-transformer-69714fa3fb3d
https://twitter.com/Thom_Wolf/status/1093455492972724230 Bert Kaggle classification <=========
https://twitter.com/jesse_vig/status/1095394504478154753 Bert viz
http://gandissect.res.ibm.com/ganpaint.html?project=churchoutdoor&layer=layer4
https://github.com/AyushExel/GANs https://www.youtube.com/watch?v=aZpsxMZbG14&feature=youtu.be
https://medium.com/syncedreview/microsofts-new-mt-dnn-outperforms-google-bert-b5fa15b1a03e
https://blog.insightdatascience.com/using-bert-for-state-of-the-art-pre-training-for-natural-language-processing-1d87142c29e7 <==== legal text walk-through
https://blog.openai.com/better-language-models/
https://github.com/huggingface/pytorch-pretrained-BERT
https://towardsdatascience.com/openai-gpt-2-understanding-language-generation-through-visualization-8252f683b2f8 (https://twitter.com/jesse_vig/status/1102987963451891712)
https://gpt2.apps.allenai.org/ GPT-2 Explorer 117M parameter OpenAI GPT-2 language model demo
http://gltr.io/dist/index.html Giant Language model Test Room
http://newsletter.ruder.io/issues/gpt-2-sequence-generation-in-arbitrary-order-160799 Newsletter covering GPT-2
https://gist.github.com/thomwolf/ca135416a30ea387aa20edaa9b21f0ed A very small and self-contained gist to train a GPT-2 transformer model on wikitext-103

Data Lit (siraj Raval)

https://www.youtube.com/watch?v=3Pzni2yfGUQ https://github.com/llSourcell/Sentiment_Analysis/

Vision

https://cloud.google.com/vision/
https://towardsdatascience.com/how-to-build-an-image-duplicate-finder-f8714ddca9d2
https://github.com/bourdakos1/tfjs-object-detection-training tf js object detection
https://www.rsipvision.com/ComputerVisionNews-2019February/18/
https://iconary.allenai.org/ draw and guess
https://www.pyimagesearch.com/free-opencv-computer-vision-deep-learning-crash-course/

Transfer learning

https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a
https://github.com/facebookresearch/LASER

Stats for ML

https://www.datasciencecentral.com/profiles/blogs/29-statistical-concepts-explained-in-simple-english-part-1
https://www.datasciencecentral.com/profiles/blogs/25-statistical-concepts-explained-in-simple-english-part-2
https://simplystatistics.org/2017/03/16/evo-ds-class/
https://drive.google.com/file/d/1VmkAAGOYCTORq1wxSQqy255qLJjTNvBI/view Intro to probability pdf 2019
https://brohrer.github.io/stats_resources.html Brandon Rohrer stats resources http://www.bcfoltz.com/blog/stats-101/ + https://statquest.org/ <=====

Datasets

https://archive.ics.uci.edu/ml/datasets
https://registry.opendata.aws/ Amazon datasets
https://www.linkedin.com/feed/update/urn:li:activity:6459106684878020608
https://github.com/brohrer/academic_advisory
https://toolbox.google.com/datasetsearch
https://msropendata.com/ Microsoft open research data
https://www.kaggle.com/datasets
https://github.com/awesomedata/awesome-public-datasets
https://www.menadata.net
https://www.interviewqs.com/blog/free_online_data_sets
https://digitalimpact.io/
https://www.data.gov/
https://www.wikidata.org/wiki/Wikidata:Main_Page
https://databank.worldbank.org/data/home https://data-miner.io/
https://medium.com/datadriveninvestor/the-50-best-public-datasets-for-machine-learning-d80e9f030279
https://einstein.ai/research/blog/the-wikitext-long-term-dependency-language-modeling-dataset
https://storage.googleapis.com/openimages/web/index.html Google Images V4
https://github.com/Tencent/tencent-ml-images Tencent ML Images
https://www.modelzoo.co/ pretrained models
https://github.com/bakrianoo/Datasets/tree/master/Arabic%20Tweets (see also https://github.com/bakrianoo/aravec (AraVec))
https://twitter.com/jeremyphoward/status/1071093745645428736 NER and POS from FastAi and Spacy
https://twitter.com/stanfordnlp/status/1080136432075722752 Stanfrd NLP group tweet
http://wilmabainbridge.com/facememorability2.html 10k US faces
http://groups.inf.ed.ac.uk/ami/corpus/ AMI corpus Idialogs, summaries, ...) <====
https://github.com/CU-ITSS/Web-Data-Scraping-S2019 <== scrape data
http://xviewdataset.org/ annotaed satellite images (boxed bounded)
https://www.europeandataportal.eu/data/en/group/transport EU Portal / Trasport
https://universaldependencies.org/treebanks/ar_padt/index.html Arabic treebank
https://arabicspeech.org/resources data and resources for Arabic speech processing
https://github.com/chiphuyen/lazynlp LazyNLP scrape massive text datasets (40GB+)
https://twitter.com/rctatman/status/1101177780954255360v Human languages datasets
https://github.com/fastai/imagenette subset of ImageNet (www.image-net.org)
https://nlp.stanford.edu/links/statnlp.html#Corpora
http://kevinchai.net/datasets dataset directories (general and specialized)
https://github.com/UniversalDependencies/UD_Arabic-PUD Universal dependencies Arabic
https://github.com/UniversalDependencies/UD_Arabic-PADT
https://github.com/UniversalDependencies/UD_Arabic-NYUAD
https://www.nuscenes.org/overview Autonomous driving dataset
https://www.opendatasoft.com/ and https://dataportals.org/ (mapping)
https://bioacousticsdatasets.weebly.com/ bio acoustic datasets
https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/ Flicker8k image captioning dataset
http://kitab-project.org/2019/06/08/first-open-access-release-of-our-arabic-corpus/ Arabic Kitab corpus
https://www.kaggle.com/kenshoresearch/kensho-derived-wikimedia-data

Dataviz

https://twitter.com/albertocairo/status/1057974210960723968 Alberto Cairo's invitees
https://charticulator.com/
https://twitter.com/jayvanbavel/status/1060543214665564161 graphic design and data viz cheat sheets
https://blog.visme.co/best-data-visualizations/ Best Data Viz
https://www.allendowney.com/blog/2019/01/18/the-library-of-data-visualization/ books, blogs, videos and tools <======
https://twitter.com/kimay/status/1097793415222362113 data viz design processes
https://socviz.co/ Data Visualization A practical introduction (read book online) https://www.kaggle.com/learn/data-visualization-from-non-coder-to-coder Data viz in Seaborn <=====
https://python-graph-gallery.com/

AI Ethics

https://medium.com/@miad/100-brilliant-women-in-ai-ethics-to-follow-in-2019-and-beyond-92f467aa6232
https://www.kdnuggets.com/datasets/index.html
https://course.fast.ai/videos/?lesson=6 @ 1:50:00 - end (2:17:00)
https://www.fast.ai/2018/09/24/ai-ethics-resources/
https://twitter.com/DSEthics
https://www.fast.ai/2019/01/29/five-scary-things/ https://www.youtube.com/watch?v=LqjP7O9SxOM
https://www.youtube.com/watch?v=ZVN8aDVyg1I Information manipulation: How the media ecosystem is being gamed and exploited
https://www.technologyreview.com/s/612876/this-is-how-ai-bias-really-happensand-why-its-so-hard-to-fix/
https://github.com/EthicalML/XAI (Hanna Walash: https://twitter.com/hannawallach/status/1093141481907462144)
https://github.com/jphall663/awesome-machine-learning-interpretability
https://twitter.com/DanitGal/status/1099517258190082049 ethics with east asia focus
https://twitter.com/iamtrask/status/1101894655631912960 fairnes,s privacy, security, ... http://ai.stanford.edu/blog/ethical_best_practices/ In Favor of Developing Ethical Best Practices in AI Research
https://twitter.com/Klonick/status/1102970732890316801 Information privacy
https://imagenet-roulette.paglen.com/
https://www.scu.edu/ethics-in-technology-practice/overview-of-ethics-in-tech-practice/
https://twitter.com/math_rachel/status/1110609213665865729 lecture of data ethics and other resources
https://towardsdatascience.com/ethics-of-ai-a-comprehensive-primer-1bfd039124b0 (part 1/3)
https://twitter.com/halhod/status/1143798334135472128 journalists covering AI/ML thread

Learn Python

https://coreyms.com/ + https://coreyms.com/
https://kite.com/ AI-Powered Python auto-completion
https://twitter.com/math_rachel/status/1058089708155166720 Rachel
https://twitter.com/_brohrer_/status/1039805324423823361 Brandon
https://brohrer.github.io/python_resources.html
https://www.youtube.com/playlist?list=PLQVvvaa0QuDeAams7fkdcwOGBpGdHpXln
https://www.youtube.com/watch?v=ZDa-Z5JzLYM&list=PL-osiE80TeTsqhIuOqKhwlXsIBIdSeYtc
https://py4e.com/
http://cs231n.github.io/python-numpy-tutorial/
https://learnpythonthehardway.org/book/intro.html
https://www.youtube.com/playlist?list=PL6gx4Cwl9DGAcbMi1sH6oAMk4JHw91mC_
https://www.python-course.eu/
https://learnxinyminutes.com/docs/python3/
https://www.kaggle.com/learn/python
https://realpython.com/
http://www.dabeaz.com/tutorials.html and https://speakerdeck.com/dabeaz (David Beazley) https://www.crowdcast.io/treyhunner
https://www.reddit.com/r/learnpython
https://docs.python.org/3.6/ (https://docs.python.org/3.6/tutorial/index.html)
https://github.com/yasoob/intermediatePython <====
https://towardsdatascience.com/python-for-data-science-from-scratch-part-i-390f01d91748
https://towardsdatascience.com/python-for-data-science-from-scratch-part-ii-e4dd4b943aba
https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428 Pandas pandas <=====
https://github.com/gto76/python-cheatsheet <===== full Python cheat sheet <======
https://tomaugspurger.github.io/modern-1-intro Pandas and more
http://www.conquerprogramming.com/blog/3-Exceptions.html Python exceptions
https://youtu.be/C-gEQdGVXbk 10 Python tips
https://twitter.com/_inesmontani/status/1144173215293591555 Productivity tips by Ines and Goel
https://twitter.com/_inesmontani/status/1144173242082574341 design and css by ines
https://jalammar.github.io/visual-numpy/ A Visual Intro to NumPy and Data Representation

https://www.grnewsletters.com/archive/dataschool/Best-resources-for-going-deeper-with-Python-702398502.html
https://www.youtube.com/channel/UCW6TXMZ5Pq6yL6_k5NZ2e0Q socratica
https://towardsdatascience.com/master-python-through-building-real-world-applications-part-1-b040b2b7faad Part-1/10/DS
Above code: https://github.com/Dhrumilcse/Interactive-Dictionary-in-Python

Pandas speed up

https://github.com/modin-project/modin
https://github.com/ray-project/ray/ https://pbpython.com/pandas-crosstab.html

Deep Learning

https://github.com/rasbt/deeplearning-models Sebastian Raschka - deep learning models
https://github.com/rasbt/DeepLearning-Gdansk2019-tutorial Sebastian Raschka - deep learning summer school
https://github.com/iamtrask/Grokking-Deep-Learning
https://stallion.ai/en/home <--- Ar word embedding
https://www.youtube.com/playlist?list=PLqYmG7hTraZDNJre23vqCGIVpfZ_K2RZs
https://github.com/Machine-Learning-Tokyo/DL-workshop-series colab deep nets
https://lilianweng.github.io/lil-log/ meta learning, word embedding, NLP developments ... etc <=====///========
https://github.com/GokuMohandas/practicalAI <=================================
http://josh-tobin.com/troubleshooting-deep-neural-networks <===========
https://arxiv.org/abs/1901.07931 <=== text generation: https://twitter.com/verena_rieser, https://twitter.com/tuetschek
https://www.kdnuggets.com/2018/02/8-neural-network-architectures-machine-learning-researchers-need-learn.html
https://people.csail.mit.edu/madry/6.883/ science of deep learning class notes
https://towardsdatascience.com/gate-recurrent-units-explained-using-matrices-part-1-3c781469fc18 GRUs explained
https://www.youtube.com/channel/UCNIkB2IeJ-6AmZv7bQ1oBYg Arxiv insights
https://github.com/dpressel/dliss-tutorial International Summer School on Deep Learning, 2019

Bias, Variance and Overfitting

https://twitter.com/rctatman/status/1060325912434995201 bias-variance tradeoff

Free/Draft and other AI or ML Books

http://www.aimlbooks.com/ top mentioned AI/ML books from stack exchange and overflow
http://themlbook.com/wiki/doku.php
https://medium.com/datadreamsdragons
https://nlp.stanford.edu/software/dependencies_manual.pdf from Stanford NLP group

100 days of ML coding

https://www.kdnuggets.com/2018/09/journey-machine-learning-100-days.html

Feature Engineering

https://becominghuman.ai/good-feature-building-techniques-tricks-for-kaggle-my-kaggle-code-repository-c953b934f1e6

Model evaluation/selection

https://sebastianraschka.com/pdf/manuscripts/model-eval.pdf

other resources

Data Elixir: https://dataelixir.com/
Python Weekly: https://www.pythonweekly.com/
Data Science Weekly: https://www.datascienceweekly.org/
Real Python: https://realpython.com/
Practical Business Python: http://pbpython.com/
https://twitter.com/justmarkham/lists/data-science/members DS list
https://www.youtube.com/channel/UCnVzApLJE2ljPZSeQylSEyg Kevin Marcham YT ch

dictionaries/translation

https://glosbe.com/
http://context.reverso.net/translation/english-arabic/

NER

https://www.kdnuggets.com/2018/12/introduction-named-entity-recognition.html
http://nlpprogress.com/english/named_entity_recognition.html
https://research.zalando.com/welcome/mission/research-projects/
https://towardsdatascience.com/solving-nlp-task-using-sequence2sequence-model-from-zero-to-hero-c193c1bd03d1

Data Annotation

Amazon mechanical turk
prodi.gy https://www.figure-eight.com/ figure eigtht (prev, crowdFlower)
Snorkel: https://ai.stanford.edu/blog/weak-supervision/
https://appen.com/ Appen (bought F8)

Kaggle

https://www.kaggle.com/shivamb/data-science-glossary-on-kaggle/notebook Notebooks by topic

https://ai.googleblog.com/2019/01/looking-back-at-googles-research.html
https://medium.com/@rtatman/evaluating-text-output-in-nlp-bleu-at-your-own-risk-e8609665a213 NLP BLEU
https://hotpotqa.github.io/wiki-readme.html
https://course.fast.ai/lessons/lessons.html <===================================
https://www.technologyreview.com/s/612726/this-algorithm-browses-wikipedia-to-auto-generate-textbooks/
https://pytorch.org/tutorials/beginner/nn_tutorial.html
https://www.drivendata.org/competitions/
https://www.topbots.com/most-important-ai-nlp-research/ <========================

https://ai.googleblog.com/2019/01/natural-questions-new-corpus-and.html

Cloud

https://www.basvankaam.com/wp-content/uploads/2019/02/Final-Cloud-Services-Cheat-Sheet-v2.0.pdf

https://www.fast.ai/2019/01/24/course-v3/ (https://course.fast.ai/videos/?lesson=1) FastAI Course using fastai V1
https://cedrickchee.gitbook.io/knowledge/courses/fast.ai/deep-learning-part-1-practical-deep-learning-for-coders/2019-edition
https://www.youtube.com/watch?v=p9tpTt6ZsLI&index=2&list=PLZSO_6-bSqHQHBCoGaObUljoXAyyqhpFW http://courses.d2l.ai http://www.d2l.ai

https://colab.research.google.com/drive/1jUpGwTaY9vJsUVw1tgwwXqKz6UOsvV1a

Recent Courses

https://github.com/laurenfklein/emory-qtm340 Practical Approaches to Data Science with Text
http://onlinehub.stanford.edu/cs224 Chris Manning 2019 deep learning NLP
https://mlcourse.ai/ Pytorch: http://phontron.com/class/nn4nlp2019/schedule.html => https://github.com/neubig/nn4nlp-code https://deeplearning.mit.edu/ https://www.youtube.com/watch?v=O5xeyoRL95U&list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf https://github.com/lexfridman/mit-deep-learning https://medium.com/tensorflow/mit-deep-learning-basics-introduction-and-overview-with-tensorflow-355bcd26baf0 MIT <====
https://www.cs.columbia.edu/~amueller/comsw4995s19/schedule/ <==========================
http://phontron.com/class/nn4nlp2019/schedule.html https://github.com/neubig/nn4nlp-code <===-====-====-===-=====
http://phontron.com/class/nn4nlp2019/assets/slides/nn4nlp-01-intro.pdf Graham Neubig CMU slides
https://www.youtube.com/watch?v=N17ovGpUz3M&list=PLZSO_6-bSqHQHBCoGaObUljoXAyyqhpFW&index=6 Alex Smola
https://www.youtube.com/watch?v=fdY7dt3ijgY https://github.com/openai/spinningup-workshop/blob/master/rl_intro/rl_intro.pdf OpenAI
https://github.com/glouppe/info8010-deep-learning Gilles Louppe https://glouppe.github.io/info8010-deep-learning/?p=outline.md#1
https://university.dremio.com/ (https://www.dremio.com/ https://github.com/dremio/dremio-oss data as a service platform) courses: basic, data consumers, data engineers https://glouppe.github.io/info8010-deep-learning/?p=outline.md#1
http://web.stanford.edu/class/cs224n/ (pytorch)
http://people.ischool.berkeley.edu/~dbamman/info256.html (https://github.com/dbamman/anlp19)
https://github.com/jacobeisenstein/gt-nlp-class/tree/master/notes Georgia Tech (includes book)
https://www.cs.toronto.edu/~hinton/coursera_lectures.html Geoff Hinton <=====
https://sites.google.com/view/berkeley-cs294-158-sp19/home UC Berkely Spring 2019
https://github.com/rasbt/stat479-deep-learning-ss19 Sebastian Raschka deep learning
https://github.com/rasbt/stat479-machine-learning-fs19 Sebastian Raschka Machine Learning (FS 2019)

https://paperswithcode.com/sota <================
https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html <=======

Study groups and hackathons

https://kaggledays.com/
https://twimlai.com/twiml-x-fast-ai/ (https://www.youtube.com/watch?v=eisxYm19C5I)
https://nurture.ai/ai-saturdays (see projects)
https://www.theschool.ai/
https://medium.com/ai2-blog/how-to-get-up-to-speed-on-machine-learning-and-ai-a0fd923d4169 AI2 resources

https://ai.google.com/research/NaturalQuestions Google Natural Questions dataset
https://github.motakasoft.com/trending/?d=2019-02-01&l=python github trends
https://www.analyticsvidhya.com/blog/2019/02/stanfordnlp-nlp-library-python/
https://www.colorize.ml/
https://www.elementsofai.com/ + deeplearning.ai (https://www.youtube.com/channel/UCcIXc5mJsHVYTZR1maL5l9w)
https://www.microsoft.com/en-us/research/project/academic/articles/aaai-conference-analytics/ <==================
https://www.pyimagesearch.com/2019/01/21/regression-with-keras/ 1/3 parts
https://www.pyimagesearch.com/2019/01/28/keras-regression-and-cnns/ 2/3
https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/ 3/3

https://www.pyimagesearch.com/2019/01/14/machine-learning-in-python/ <=== scikit/keras

https://youtu.be/tMAU3gLbKII Ines from Spacy /prodigy (tips)
https://twitter.com/harmonslide/status/1092178779382972417 Pizza model overfitting
https://github.com/dperezrada/evidence-tools/tree/master/nlp/keywords2vec
https://towardsdatascience.com/do-the-keywords-in-your-resume-aptly-represent-what-type-of-data-scientist-you-are-59134105ba0d resume screening <====
https://www.youtube.com/watch?v=g-Hb26agBFg PCA explained well <==============

https://medium.com/@prtdomingo/editing-files-in-your-linux-virtual-machine-made-a-lot-easier-with-remote-vscode-6bb98d0639a4 Remote VSCode

Speech

https://web.stanford.edu/~jurafsky/slp3/ speech and language processing book

Semantic Similarity

https://github.com/NTMC-Community/awesome-neural-models-for-semantic-match
https://gist.github.com/GhibliField/c3c97b742d346baa5f14b3a796c12a4a
https://tfhub.dev/ universal sentence encoder
https://medium.com/@liangzhang6677/nlp-notes-5e964f5d740e
http://ixa2.si.ehu.es/stswiki/index.php/Main_Page

Arabic Specific

https://github.com/zaidalyafeai/ARBML (Models and datasets, https://twitter.com/arabicml2)
https://github.com/niderhoff/nlp-datasets
https://buhuth.org/ar/home (https://www.linkedin.com/posts/hassan-sarhan-471566151_index-of-activity-6576768412834496512-fmCb)
http://mazajak.inf.ed.ac.uk:8000 embeddings
https://www.aclweb.org/anthology/W19-4608/

Transcribus / crowd

https://transkribus.eu/Transkribus/ https://www.zooniverse.org/

fastai v2 walk throughs

https://www.youtube.com/watch?v=44pe47sB4BI&t=649s

organize research

https://twitter.com/chrisalbon/status/1175539804114583552 Zotero etc

Arabic Corpora

datasets:
https://dl.acm.org/doi/abs/10.1145/2911451.2914677 (https://sites.google.com/view/arabicweb16)
http://opus.nlpl.eu/
https://www.kaggle.com/linuxscout/tashkeela
https://traces1.inria.fr/oscar/ shuffled by line
https://github.com/zaidalyafeai/ARBML#datasets
http://opus.nlpl.eu/OpenSubtitles-v2016.php
https://archive.alsharekh.org/
https://github.com/soskek/bookcorpus (https://www.smashwords.com) [no Ar]
http://www.alwaraq.net/Core/index.jsp?option=1 http://dlib.nyu.edu/aco/ (scans)
https://www.blindarab.net/index.php?action=view_subcat&catid=4&id=24&page=5
https://www.al-mostafa.com/disp.php?page=list&n=0 (mixed formats)
https://github.com/abdelrahmaan/Hadith-Data-Sets
https://www.hindawi.org/books/ epubs, some are not MSA
https://archive.org/details/texts?and%5B%5D=languageSorter%3A%22Arabic%22&sort=-downloads&page=4
http://www.al-eman.com/index.htm
https://catalog.ldc.upenn.edu/topten
http://waqfeya.com/
https://www.noor-book.com
https://www.wdl.org/ar/