![Getting Started with Python for the Internet of Things](https://wfqqreader-1252317822.image.myqcloud.com/cover/880/36698880/b_36698880.jpg)
上QQ阅读APP看书,第一时间看更新
How to do it...
- Introduce sentence tokenization:
from nltk.tokenize import sent_tokenize
- Form a new text tokenizer:
tokenize_list_sent = sent_tokenize(text)
print "nSentence tokenizer:" print tokenize_list_sent
- Form a new word tokenizer:
from nltk.tokenize import word_tokenize print "nWord tokenizer:" print word_tokenize(text)
- Introduce a new WordPunct tokenizer:
from nltk.tokenize import WordPunctTokenizer word_punct_tokenizer = WordPunctTokenizer() print "nWord punct tokenizer:" print word_punct_tokenizer.tokenize(text)
The result obtained by the tokenizer is shown here. It divides a sentence into word groups:
![](https://epubservercos.yuewen.com/3257B9/19470381808825206/epubprivate/OEBPS/Images/fb4eb393-47a2-496c-a3cb-a2f99a36dfa6.png?sign=1739296221-oWNbuukhvn3AXgt0OED8uHSGKndtg0zs-0-92d1f9544ccad979d732173d575890b7)