Data&Chart&Task
Text Analysis & Mining:
Preprocessing Text wrangling.
Tokenization
remove stop words: a, the, that, etc.
Plural form to singular form: men->man, truths->truth
Vector-space Model
Bag-of-words 词频统计 向量化
TF-IDF
Word2Vec, Doc2Vec, Glove, fasttext(按字母)
Feature Extraction Keyword/Word frequency/Topic
Topic Retrieving
SVD/pLSI/LDA
Measurement
Visualization:
Visual Design
Layout
Visualizing Document Content Tag Cloud/Text Cloud/Wordle(Word Cloud)
Spiral Layout 判断词重叠 四叉树检测
Context-preserving World Cloud
ManiWordle 自定义化的 Word Cloud
Consistency-preserving Word Cloud 更好的自定义
Document Card 将文章变成图+字
ThemeRiver 时序信息的可视化
TextFloe
TIARA ThemeRiver的改进
History Flow wiki的修改记录
Topic Competition on Social Media
Senten Tree 序列挖掘算法 对很多句话得到频繁序列
TexArc
Literature Fingerprinting 比较文本的差异性
Wordnet 树状图
PhraseNet 找到词语之间的关联 采取图表形式表达
Newsmap 使用Treemap
jigsaw
Parallel Tag Cloud
0 条评论