You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"This snippit was written by [Chris R. Albon](http://www.chrisralbon.com/) and is part of his collection of [well-documented Python snippits](https://github.com/chrisalbon/code_py). All code is written in Python 3 in iPython notebook and offered under the [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/).\n",
18
+
"\n",
19
+
"- Based on [http://nbviewer.ipython.org/gist/rjweiss/7577004](http://nbviewer.ipython.org/gist/rjweiss/7577004)"
20
+
]
21
+
},
22
+
{
23
+
"cell_type": "markdown",
24
+
"metadata": {},
25
+
"source": [
26
+
"## Create some raw text"
27
+
]
28
+
},
29
+
{
30
+
"cell_type": "code",
31
+
"collapsed": false,
32
+
"input": [
33
+
"# Create a list of three strings.\n",
34
+
"incoming_reports = [\"We are attacking on their left flank but are losing many men.\", \n",
35
+
"\"We cannot see the enemy army. Nothing else to report.\", \n",
36
+
"\"We are ready to attack but are waiting for your orders.\"]"
37
+
],
38
+
"language": "python",
39
+
"metadata": {},
40
+
"outputs": [],
41
+
"prompt_number": 3
42
+
},
43
+
{
44
+
"cell_type": "markdown",
45
+
"metadata": {},
46
+
"source": [
47
+
"## Seperate by word"
48
+
]
49
+
},
50
+
{
51
+
"cell_type": "code",
52
+
"collapsed": false,
53
+
"input": [
54
+
"# import word tokenizer\n",
55
+
"from nltk.tokenize import word_tokenize\n",
56
+
"\n",
57
+
"# Apply word_tokenize to each element of the list called incoming_reports\n",
58
+
"tokenized_reports = [word_tokenize(report) for report in incoming_reports]\n",
0 commit comments