Data&Chart&Task

Text Analysis & Mining:

Preprocessing Text wrangling.

Tokenization

remove stop words: a, the, that, etc.

Plural form to singular form: men->man, truths->truth

Vector-space Model

Bag-of-words 词频统计 向量化

TF-IDF

Word2Vec, Doc2Vec, Glove, fasttext(按字母)

Feature Extraction Keyword/Word frequency/Topic

Topic Retrieving

SVD/pLSI/LDA

Measurement

Visualization:

Visual Design

Layout

Visualizing Document Content Tag Cloud/Text Cloud/Wordle(Word Cloud)

Spiral Layout 判断词重叠 四叉树检测

词云

Context-preserving World Cloud

ManiWordle 自定义化的 Word Cloud

Consistency-preserving Word Cloud 更好的自定义

Document Card 将文章变成图+字

ThemeRiver 时序信息的可视化

TextFloe

TIARA ThemeRiver的改进

History Flow wiki的修改记录

Topic Competition on Social Media

Senten Tree 序列挖掘算法 对很多句话得到频繁序列

TexArc

Literature Fingerprinting 比较文本的差异性

Wordnet 树状图

PhraseNet  找到词语之间的关联 采取图表形式表达

Newsmap 使用Treemap

jigsaw

Parallel Tag Cloud


0 条评论

发表评论

Avatar placeholder