By Jordan Goetze
Computer Science
North Dakota State University
Fargo, North Dakota 58103
jordan.goetze@ndsu.edu
Attacking a speaker's character rather than their argument.
@RealBenCarson take your common sense BS and stick it. What this country needs is a laxative. Your a doctor and can't see that? #Trump2016
Yoon Kim's Convolutional Neural Networks for Sentence Classification
word2vec
|
|
|
Total Tweets | 5808 |
Negative Examples | 4155 |
Positive Examples | 1653 |
Original Tweet | | | |
@HillaryClinton why always so #smug? https://t.co/eOU1rOaOlR |
Preprocessed Tweet | | | |
<AT_NAME/> why always so #smug <URL/> |
Filter Windows | 3, 4, 5 |
Dropout Rate | 0.5 |
L2 constrains | 3 |
Mini-batch size | 64 |
Filters per size | 50 |
Word embedding dimensions* | 20 |
Baseline model |
Non-unique twitter names are removed
Example: <AT_NAME/>
|
Word embeddings randomly initialized and trained with model |
Based on the baseline model (N-T-Names) |
Twitter usernames are replaced with a unique token
Example: <AT_NAME_123/>
|
Base on the baseline model (N-T-Names) |
Initialized with 300 dimension pre-trained word embeddings
|
If word is not included in 300 dimmension pre-trained word embeddings it is initialized randomly. |
Total Words | 8313 | 100% of words |
Pre-trained Embeddings | 5841 | 70.3% of words |
Randomly Initialized embeddings | 2472 | 29.7% of words |
Run Time: 500 epochs/35.4K runs
Model | Accuracy | Specificity | Sensitivity |
---|---|---|---|
N-T-Names | 87.4% | 95.6% | 35.5% |
U-T-Names | 87.3% | 95.1% | 38.2% |
G-N-Vecs | 87.5% | 95.8% | 34.2% |
Model | Accuracy | Specificity | Sensitivity |
---|---|---|---|
N-T-Names | 87.4% | 95.6% | 35.5% |
U-T-Names | 87.3% | 95.1% | 38.2% |
Model | Accuracy | Specificity | Sensitivity |
---|---|---|---|
N-T-Names | 87.4% | 95.6% | 35.5% |
G-N-Vecs | 87.5% | 95.8% | 34.2% |
Total Tweets | 5808 |
Negative Examples | 4155 |
Positive Examples | 1653 |
Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. In Preceedings of ACL 2002, October 2013. |
Britz D. Implementation a CNN for Text Classification in Tensorflow. WildML, December 2015. |
Kim, Y. Convolutional Neural Networks for Sentence Classification. In EMNLP 2014, September 2014. |
Abadi et al. Vector Representations of Words. 2015 |
Goldberg and Levy. Word2Vec Explained: Deriving Mikolov et al.'s Negative-Sampling Word-Embedding Method. February 2014. |
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.