Data&Chart&Task
Text Analysis & Mining:
Preprocessing
Text wrangling.
Tokenization
remove stop words: a, the, that, etc.
Plural form to singular form: men->man, truths->truth
Vector-space Model
Bag-of-words 词频统计 向量化
TF-IDF
Word2Vec, Doc2Vec, Glove, fasttext(按字母)
Feature Extraction
Keyword/Word frequency/Topic
Topic Retrieving
SVD/pLSI/LDA
Measurement
Visualization:
Visual Design
Layout
Visualizing Document Content
Tag Cloud/Text Cloud/Wordle(Word Cloud)
Spiral Layout 判断词重叠 四叉树检测
Context-preserving World Cloud
ManiWordle 自定义化的 Word Cloud
Consistency-preserving Word Cloud
更好的自定义
Document Card
将文章变成图+字
ThemeRiver
时序信息的可视化
TextFloe
TIARA
ThemeRiver的改进
History Flow
wiki的修改记录
Topic Competition on Social Media
Senten Tree
序列挖掘算法 对很多句话得到频繁序列
TexArc
Literature Fingerprinting
比较文本的差异性
Wordnet
树状图
PhraseNet
找到词语之间的关联 采取图表形式表达
Newsmap
使用Treemap
jigsaw
Parallel Tag Cloud
0 条评论