The below graph visualizes the connections between different presidents of the United States based on the similarity of their speeches.
The graph uses data dating back to George Washington for State of the Union Addresses and Inaugural Addresses. There are also press releases for presidents including and after Hoover.
A connection is created for presidents who have a cosine similarity score of >= 0.75. The cosine similarity was calculated using the document matrix after performing latent semantic indexing (LSI) on the TFIDF matrix of presidential speeches. In this partcular analyses, all of a president's speeches were considered as one document in the model.
Each of the buttons below can be toggled on or off. The nodes are force directed; however, the labels are not and therefore I made it so that it can be toggled. As for the party or period, click twice to see things change (just the way I set it up :)).
In order to get more specific about what each president is talking about, we can segment the presidents by the period in which they were president.
In this section, topics are displayed as circles. After a button is selected, the data will filter and only show the prominent topics for that particular period. Latent dirichlet allocation (LDA) was used to extract the topics from the presidents' speeches after a term matrix was constructed. This term matrix was over 100k documents long because each paragraph within each speech was considered a separate document.
Click each time period to see the different topics that were prevalent during that time period! Be aware that when you hover you can see the topic words of that particular topic. Furthermore, hover for a bit longer and a pop-up should show up with the number of paragraphs that fall under that topic. Click on the circle to see some samples of the paragraphs in that topic (also you can scroll and see a bit more)!