forked from bt5153msba/bt5153msba.github.io
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathindex.md.dump
118 lines (75 loc) · 7.52 KB
/
index.md.dump
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# H6751 Web and Text Mining
#### <center>NTU, WKW / Spring 2020</center>
## <font color='Red'>Course Description </font>
Nowadays, with the popularity of the Internet, there is a massive amount of text content available on the Web, and it becomes an important resource for mining useful knowledge. From a business and government point of view, there is an increasing need to interpret and act upon the large-volume text information. Therefore, text mining (or text analytics) is getting more attention to analyze text content on the Web. For instance, opinion mining and sentiment analysis is one of text mining techniques to analyze user-generated content on social media platforms.
This course is an introduction to text and web mining. It covers how to analyse unstructured data (i.e. text content) on the Web using text mining techniques. Students will learn various text mining techniques and tools both through lectures and hands-on exercises in labs. The course will also explore various usages of text mining techniques to real world applications. This course focuses on Web content mining, but not on Web structure and usage mining.
Students will learn following topics in the course:
* Principles and concepts of text and web mining.
* Various text mining techniques: Pre-processing for Text Mining, Text Categorization, Document Clustering, Information Extraction, and Opinion Mining & Sentiment Analysis.
* Practical use of text mining to real world applications, such as Text Message Spam Detection,
and Sentiment Analysis Systems analyzing public opinion towards various subjects, such as electronic gadgets, movies, stocks, etc., using social media content.
### Contact Information:
- Lecturers:
* [Zhao Rui](https://rzntu.github.io), [[email protected]](mailto:[email protected])
* [Chen Zhenghua](https://zhenghuantu.github.io), [[email protected]](mailto:[email protected])
### Course Objectives:
At the end of this course, students should be able to:
- Appreciate the basics of text and web mining.
- Understand the advantages and disadvantages of different text mining techniques.
- Work on practical problems that can be solved using text mining techniques.
### Prerequisites:
- The student has some aptitude for low-level logical thinking since lectures and labs will focus
on technical aspects of Text and Web Mining.
- Basic knowledge in **Python programing**.
### Reference Books
The following books are helpful, but not required. You will easily get these books from Internet.
- Foundations of Statistical Natural Language Processing *Christopher D. Manning and Hinrich Sch眉tze*
- Neural Network Methods for Natural Language Processing *Yoav Goldberg*
- Introduction to Computation and Programming Using Python : With Application to Understanding Data *John V. Guttag*
- Applied Text Analysis with Python *Benjamin Bengfort*
If you are not proficient in python, you may find [some tutorials](material/coding.md) helpful.
### Course material and links
- [Timetable](#schedule)
- [Final project](project/project.md)
- [Useful Tips](material/dspractice.md)
- [Syllabus](material/H67512020.pdf)
- [Honor Code](honorcode.md)
## <font color='Red'>Announcement</font>
> - *2020-01-18*: Welcome to H6751.
> - *2020-01-16*: ~~[Group Project Team Table](https://docs.google.com/spreadsheets/d/1V93TaLzOjVksmbdAsAfB20KkY9aeNJKpH3Vm-ZH3G2Y/edit?usp=sharing)~~
> - *2020-01-04*: ~~this site has been public.~~
## <font color='Red'>Assessment</font>
### Class Participation (5%)
We aapreciate everyone being actively involved in the class! There are serveral ways of earning participation credit, which is capped at 5%:
1. **Attending guest speakers' lectures**: In the semester, we have two invitied speakers, who are making a great efforts to come lecture for us. We do not want them speaking to a empty room. Your attendance at lectures with guest spearks is expected! In addition, it will be a very awesome chance for networking! You will get 1% per speark (total 2%) for attending.
2. Instructors are going to pick students for questions during class. One point will be deducted for absence. Each student has a total of 2 points.
3. **Karma Point**: Any other act that improves the class, which instructors notics and deems worthy: 1%.
Based on the saved chat files in Zoom, the active student list is provided [here](material/karma_points.csv) with two columns: zoom id and all active comments (via regular expression and some hand-crafted rules). If you found your zoom ID is in the provided CSV, pls email the lecturer: Zhao Rui with your zoom ID in the list and your ntu student ID.
### Individual In-class Assignment (25%)
We are going to have a 90-minutes in-class assignment, which covers programming. You can refer to the [template](project/template.zip). This online assignment will test materials covered until Week 9 (Introduction to deep learning).
### Group Project (40%)
You are required to form a project group with 3-4 members. This is a text mining project where you collect your own sample text dataset (or use an existing dataset), and using text mining techniques and tools, build an interesting model / application that mines knowledge/information from the text dataset. Generally, the project scope is entirely up to you, but I suggest that you build a useful and interesting application. Then, write a project report explaining your methodology and presenting the results and present your work in class. The detailed instructions and the guidelines for this course project could be found [here](project/h6751_guidlines_grading.pdf). Some project ideas have been provided [here](project/project.md)
- **Credit**:
* Project proposal (5%)
* Project report (20%)
* Project presentation (15%)
### In-class Kaggle Competition(30%)
See the [page](project/kaggle.md) for more details. And check the kaggle [summary](project/kaggle_summary.md).
## <font color='Red'>Schedule</font>
Class Venue: Tan Tong Meng (TTM) PC Lab CS02-35a WKWSCI Bldg
**Date** | **Topic** | **Material** | **Assignment Due**
:----: | ------- | :----: | ---------------
Sat a.m 01/18 | Introduction to Text Mining | [LINK](note/blogs01.md) | N.A.
Sat a.m 02/01 | Pre-processing for Text Mining I | [LINK](note/blogs02.md) | N.A
Sat p.m 02/01 | Pre-processing for Text Mining II | [LINK](note/blogs03.md) | <font color='SeaGreen'>Form a Group</font>
Sat a.m 02/15 | Text Categorization I | [LINK](note/blogs04.md) | [E-learning](note/blogsie.md)
Sat p.m 02/15 | Text Categorization II | [LINK](note/blogs05.md) | <font color='SeaGreen'>Project Proposal Submission</font>
Sat a.m 02/29 | Text Categorization III | [LINK](note/blogs06.md) | N.A.
Sat p.m 02/29 | Document Clustering| [LINK](note/blogs07.md) | N.A.
Sat a.m 03/21 | Sentiment Analysis | [LINK](note/blogs08.md) | N.A.
Sat p.m 03/21 | Introduction to Deep Learning | [LINK](note/blogs09.md) | <font color='SeaGreen'>Kaggle Starts</font>
Sat a.m 04/04 | Word Embeddings | [LINK](note/blogs10.md) | Guest Speaker: [Li Pengfei](https://www.linkedin.com/in/li-pengfei-44454080/?originalSubdomain=sg)
Sat p.m 04/04 | Recurrent Neural Network | [LINK](note/blogs11.md) | <font color='SeaGreen'>Kaggle Ends</font>
Sat a.m 04/18 | Convolutional Neural Network | [LINK](note/blogs12.md) | Guest Speaker: [Weng Quanchi](https://www.linkedin.com/in/quanchi-weng-10822711a/?originalSubdomain=sg) [slides](slides/qc_ner.pdf)
Sat p.m 04/18 | [Course Summary](slides/w13.pdf) | N.A. | <font color='SeaGreen'>In-class Assignment (online)</font>
Sat p.m 05/02 | N.A | N.A. |<font color='SeaGreen'>Project Paper & Recorded Video Submission</font>