Amsterdam University College -- Text Mining and Collective Intelligence -- Fall 2019
- Hello World: a first notebooks to check everything is working.
- Lab 1: Fundamentals: variables, built-in data types and structures, syntax, flow control.
- Lab 2: Fundamentals: functions, exceptions, classes, I/O.
- Lab 3: More fundamentals: modules, packages, standard library.
- Lab 4_1: Scientific programming: NumPy, matplotlib.
- Lab 4_2: Regular expressions (only for reference).
- Lab 5: NLP pipelines: sentence splitting, tokenizing, stemming and lemmatizing, part-of-speech tagging.
- Lab 6: Web scraping and APIs.
- Lab 7_1: Distributions in texts.
- Lab 7_2: WordNet (only for reference).
- Lab 8: Vector Semantics.
- Lab 9: Intro to ML: linear regression, logistic regression, SGD, Sklearn.
- Lab 10: Word Embeddings: Word2Vec using Gensim.
- Lab 11: Sentiment Analysis.
- Clone the repository locally:
git clone https://github.com/Giovanni1085/AUC_TMCI_2019.git
- Get updates (from time to time):
git pull
- Create a conda environemnt:
conda create -n myenv python=3.7 anaconda
(wheremyenv
is the envirnoment name) - Activate it:
conda activate myenv
- Install packages (see the
requirements.txt
file), e.g.conda install pandas
- Launch a Jupyter notebook:
jupyter notebook
- More on conda enviroments
- Conda cheatsheet
- Getting started with Jupyter notebooks
- On using git and GitHub for version control
Alternatively, use Binder (link above).
A more detailed guide to setup your environment, with multiple options.
See the projects folder.
- Michael Repplinger, who ran the 2018/19 edition and Gianluca Lebani, who ran the 2017/18 edition.
- Giovanni Colavizza and Matteo Romanello, Applied Data Analysis course for the Oxford Digitial Humanities Summer School
- James Hetherington and Giovanni Colavizza, Research Software Engineering with Python