Twitter and Network Data Visualizations: A Python Tutorial

Twitter and Network Data Visualizations: A Python Tutorial

Note to the readers: For this tutorial I will be posting and explaining multiple code blocks. I will not be posting code in a “Pythonic” or more advanced format. It is my opinion that longer more laid out blocks are easier to understand. If you disagree, please tell me why! However, if you see code repeated, this is intentional.

Social media has become one of the most prominent aspects of our culture; even a zeitgeist of the early 21st century. It allows anyone to deliver content, true or false, to hundreds or thousands of people. With 96% of the US owning a smartphone, and 68 million active US twitter accounts, information from influencers can impact how people think and feel. The current US president Donald Trump, is notable for his use of twitter to deliver content to his political base. It can be argued that social media is directly impacting how political systems operate, and is a prominent factor to the behavior of elected officials and voters in our country. Regardless of your political standings, it cannot be ignored that social media has been a major factor for the past two US presidential elections.

To this end, I found it important to highlight twitter data focusing on the 2020 US election. The data pool in this exercise is limited, as twitter has thousands of tweets generating per day and data can change drastically in a short period of time. There are also limitations to the free APIs for twitter. To this end, data has been serialized to reduce the number of calls made on each notebook run.

This exercise will elucidate a visualization technique that allows you to ask questions about the the behavior and potential impact of social media through data networks. The libraries and techniques chosen for this assignment are aimed at short and frequent monitoring of social media content.

Hermeneutics Visualization Technique

The visualization of network data can help data scientists reveal hidden patterns and meaning from structures in textual sources. The term “Hermeneutics” is a theory of interpretation, when we interpret patters it helps us give context to our life story. Hermeneutics focuses on the interpretation of written or verbal communication, and establishes rules for this interpretation. It is the influence of our language, content, speech, and textual documents that describe our culture and society as a whole. The point of this visualization technique is to define origins, and descriptors that lead us to understand how content is distributed.

In this context Hermeneutics will be used to map distribution of information through a network based on key values. These values demonstrate an understanding or common thought, the visualization then defines how this concept is being delivered through the network from its origin. The technique is best used for the continuous monitoring of online content and feelings. This technique is at its best when used frequently in small doses, to understand how information is being distributed. This technique works well with Twitter “topics” which are defined by algorithms that monitor phases, words, hashtags, user mentions and more to create a communal concept that we share. The technical understanding of how Twitter develops a topic is not as important as how it relates to hermeneutics and how it is applied here.

This method is not designed for large scale interpretation of a single concept or topic. While large models can be built, it would be inaccurate to conceive how information is distributed in this manner. While the tool provides the ability to explore complex relations, it is not meant to provide total interpretation of a data network, and should drive more questions than answers.

Other techniques and considerations for analyzing social networking data is the focus of connections of Homophily, the extent to which actors form ties with similarity. While visually similar, this is a different technique than Hermeneutics where similarity is weighted and used to associate connections.

2-Dimensional and 3-Dimensional Data

Network data is plotted using data from a source to a target; it defines data “edges” which is a connection between two major points. Each “edge” represents a connection event between two people who tweeted within the data sample period. Edges can represent the various kinds of relationships that can be created through Twitter. NodeXL constructs four different types of Twitter edges from the data it collects: follows, replies, mentions and tweet. A “follows” edge is created if one author follows another who also tweeted in the sample data set (the time stamp for a follows edge is the date of the query rather than the time when one user followed another user, which is information that is not available from Twitter). A “mentions” edge is created when one user creates a tweet that contains the name of another user (indicated with a preceeding “@” character, ex: “just spoke about social media with @marc_smith”); we will be focusing on this edge. A “reply” relationship is a special form of “mention” that occurs when the user’s name is at the very start of a tweet (ex: “@itaih just spoke about social media”). A tweet is a message that does not contain a reply or mention.

If network data is “flat” and branching from origin points with no overlap, we would consider this “2-dimensional”. If, however, there is overlap within the nodes, this would take on a 3-dimensional shape. This will be displayed below.

GraphiPy, Networkx, and Pyvis

The libraries selected for this exercise were chosen based on their simplicity in the extraction and graphing of data. While the Twitter APIs have more robust features, for trend analysis in real time a data scientist must be agile and have the ability to generate insight in frequent intervals. To this end, the following libraries were selected:

  • GraphiPy, this is a universal social data extractor. GraphiPy simplifies the extraction of data from various social media platforms. This specific library is key to structuring data in the optimal output for networking visualization. It has built in methods to create visuals, but the strength of this library is in its graph construction and its API use. This library was authored by Shobeir Fakhraei and is supported on github. This library has an open source license.
  • Pyvis was chosen for the ease in which network data is visualized. This library significantly decreases the amount of code required to graph complex data networks. The graphs produced from this library are visually appealing, and can be manipulated with “physics” that come from within the graph data structures. This library also supports buttons and sliders to adjust how graphs appear with a single line of code. The visualizations can be displayed within jupyter notebooks or generate as standalone html pages viewed from any web browser. This library was authored by WestHealth and is supported on github. This library has an open source license.
  • NetworkX is used to manipulate and create structure around complex network data. This library is the foundation for the visualizations and will be used to transform the data gathered from GraphiPy to be used with pyvis. NetworkX is currently a collaborative project but was originally authored and distributed by Aric Hagberg, Dan Schult, and Peiter Swart. NetworkX is supported on github and has an open source license.

All three libraries are declarative, and greatly reduce the number of commands needed to extract meaningful data from social networks. The major drawback to utilizing these tools is that they reduce the controls over the incoming data. You can only extract current tweets and connections based on pre-defined algorithms or “black boxes”. To have full control over content, retweets, tweet time frames, mentions, etc., you would need to employ more sophisticated libraries.

Installation

This exercise assumes you have installed python/anaconda and are using jupyter notebooks. I will be posting more context on this in the future if you are unfamiliar.

This exercise uses a variety of libraries to support the tutorial and data analysis. This section will review every library included and the installation steps. I found the best method for installation was with the python package manager “pip” and to restart the notebook kernel.

1) GraphiPy primarily used for pulling and initially structuring twitter data.

Install with pip : pip install GraphyiPy

2) IPython is used to show various outputs in this file.

Install with pip: pip install ipython

Install with Anaconda: conda update conda conda update ipython

3) pyvis is used to construct the visual network.

Install with pip: pip install pyvis

4) Networkx a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

Install with pip: pip install networkx

5) Pandas a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.

Install with pip: pip install pandas Install with conda: conda install pandas

6) Pickle a serialization tool to convert data into a byte stream.

Install with pip: pip install pickle

7) re This module provides regular expression matching operations similar to those found in Perl.

Install with pip: pip install re

Tutorial

To begin, we need to import the libraries that will be used in this analysis.

from graphipy.graphipy import GraphiPy
from IPython.display import HTML
from pyvis.network import Network
import IPython.display # for jupyter notebook users
import networkx as nx
import pandas as pd
import pickle
import re

Watermarked Versions:

graphipy 0.0.1
IPython 7.18.1
pyvis 0.1.8.2
networkx 2.0
pandas 1.1.3
pickle unknown
re 2.2.1

First I am going to set up several functions for cleaning and preparing the data. I will also be serializing results for fast recall (my “pickled” function) since pulling fresh data from twitter is restricted to a specific number of calls for the free API.

def listToString(s):
    str1= " "
    return(str1.join(s))

def extract_string(dataframe, column, regex, new_column=None):
    if new_column == None:
        dataframe[column] = dataframe[column].apply(lambda x: re.findall(regex, x))
        dataframe[column] = dataframe[column].apply(lambda x: listToString(x))
        dataframe[column] = dataframe[column].apply(lambda x: x.lstrip())
        return dataframe[column]
    else:
        dataframe[new_column] = dataframe[column].apply(lambda x: re.findall(regex, x))
        dataframe[new_column] = dataframe[new_column].apply(lambda x: listToString(x))
        dataframe[new_column] = dataframe[new_column].apply(lambda x: x.lstrip())
        return dataframe[new_column]
    
def pickled(filename, data):
    outfile = open('assets/{}'.format(filename), 'wb')
    pickle.dump(data, outfile)
    outfile.close()

net = Network()
graphipy = GraphiPy()
token = pd.read_csv('assets/twitterapi.csv')

Twitter API account and connections

The first step to retrieving data from twitter is to set up an account and apply for developer access. Twitter supports free and paid APIs and has a variety of tools. Setup instructions are clearly laid out on Twitters webpage.

Credentials for the account and developer application have been masked in a CSV file, the below code uses a dictionary to pass the credentials to GraphiPy in order to use the extraction methods. I strongly suggest you do the same for all your outputs.

# The twitter API needs these credentials
CONSUMER_KEY = str(token['APIkey'].values[0])
CONSUMER_SECRET = str(token['APIseckey'].values[0])
ACCESS_TOKEN = ""
TOKEN_SECRET = ""
twitter_api_credentials = {
    "consumer_key": CONSUMER_KEY,
    "consumer_secret": CONSUMER_SECRET,
    "access_token": ACCESS_TOKEN,
    "token_secret": TOKEN_SECRET
}
# create the twitter object
twitter = graphipy.get_twitter(twitter_api_credentials)

Getting Data from Twitter

Here we will use GraphiPy to get some twitter data. Here we will query the topic “2020 US Presidential Election” which will return the 10 most recent user tweets. The data for this example was pulled on 11/11/20.

keyword = "2020 US presidential election"
limit = 10

# Every function call modifies the graph that is sent as input, this reduces the need to clean and manipulate raw twitter data:
# tweets_graph = graphipy.create_graph()

# This function call returns the graph modified so you can assign it to other variables like so:
tweets_graph = twitter.fetch_tweets_by_topic(graphipy.create_graph(), keyword, limit)

Here I am serializing the data for reuse.

outfile = open('assets/2020_US_pres_elect.data', 'wb')
pickle.dump(tweets_graph, outfile)
outfile.close()

Pandas has a truly expansive set of tools, even though we are not pulling the .data file into a dataframe (yet) we can use read_pickle to get the exact object we saved with little additional lines of code. This will return the data in the exact form and structure it was serialized.

twitter_save = pd.read_pickle('assets/2020_US_pres_elect.data')

Now we will use both pyvis and networkx to set up an interactive html file that displays the network connection.

exporter = graphipy.get_nx_exporter() #this exporter will create the edges and nodes from our data automatically. It is part of the GraphiPy library.
nx_graph = exporter.create_from_pd(twitter_save) #the exporter is now creating our graph object from the twitter data. This is the network object that pyvis will translate

g = Network(height = 800, width = 1000, notebook = True, heading="") #constructing the graph, if you wish to return data into a jupyter notebook the parameter "notebook" must be True. This is from the NetworkX library.

#pyvis methods for the graph construction
g.toggle_hide_edges_on_drag(False) #This parameter will affect the interaction with nodes in the html file, play around with this!
g.from_nx(nx_graph)

And the result:

Interesting….what am I looking at?

This image shows the tweets of 10 users that were tweeting about our topic and how this information reached 13 users (23 nodes total). The labels are the tweet IDs which you can use to track the tweets themselves. Let’s examine how this works and the time frame we are visualizing.

tweets_df = twitter_save.get_df("tweet") #turning the graphical data into a dataframe for further examination (GraphiPy)

Let’s clean up the labels and isolate the tweet ID

tweets_df['Tlabel'] = extract_string(tweets_df, 'Label', '[0-9]+', 'Tlabel')

tweets_df['Tlabel'] #here we see the tweet ids as displayed in the visualization above.

Let’s examine the time frame in which this networking of data occurred

tweets_df['created_at'].min() #the most recent tweet
tweets_df['created_at'].max() #the least recent tweet

The twitter API stores time in GMT format; so without having to analyze the timezone of the data repository, we can easily see that these 23 connects were made over the course of approximately 50 minutes.

Let’s analyze a small group of nodes and describe how the interactions take place.

Here the middle node is showing two connections, what we are seeing is how twitter relates the users and tweets into a complext network. Tweet ID 2191562701 isn’t in our data set, lets see why. We will pull the other two tweet IDs.

tweets_df[  (tweets_df['Label']=='1326696291657768967') | (tweets_df['Label']=='1326330142814253057')  ]

As stated above, the edges we will be focusing on are the user mentions and tweet topic content.

connection_example = tweets_df[  (tweets_df['Label']=='1326696291657768967') | (tweets_df['Label']=='1326330142814253057')  ]
connection_example['user_mentions']

We can see that the middle node (1326696291657768967) mentions two users, and thus two connections are made. We see that user [nytimes] is also tweeting regarding the same topic so we capture this data directly, but another tweet (2191562701) is not directly tweeting about the content, but is still exposed to the information and part of the overall map, and thus we capture this node.

Tracking a tweet

Now that we understand how the general relationships work, let’s take a tweet from the example above and see how a tweet from a user looks when networked in a graph:

tweet1326696291657768967 = twitter.fetch_tweet_by_id(graphipy.create_graph(),1326696291657768967)

nx_graph2 = exporter.create_from_pd(tweet1326696291657768967)

g = Network(height =250, width = 250, notebook = True, heading = "")

g.toggle_hide_edges_on_drag(False)
g.from_nx(nx_graph2)

g.show('assets/tweet1326696291657768967.html')

Awesome! While this example isn’t impressive in magnitude we can see how all these elements relate.

Trump Words vs Biden Words

For this next section we are going to look at the top words that relate to the US President Elect Joe Biden, and President Donald Trump. To standardize the data sets the same number of users tweeting about the topics will be pulled. As we see form the examples above the networks expand with user mentions, in this exercise we will look at the trends on how the topics spread.

How will we choose the words? We will look at this article. Here the top 2 positive words and top 1 negative word will be used; any word that is in both candidate sets will be discarded and the next will be chosen (example “Son” will be skipped as it is notable for both presidential candidates).

PositiveNegative
BidenHealthcare, PersonBasement
TrumpEconomy, AmericaSpread

Once again I will use pickle files to encapsulate the tweets since they are such a moving target. Let’s take a look at an example of these graphs, and then plot a nice visual. The following two blocks of code were used to create the base HTML and data files.

def make_my_graph(data, filename):
    nx_graph = exporter.create_from_pd(data)
    g = Network(height = 800, width = 1000, bgcolor="#222222", font_color="white", heading = "", notebook = True)
    g.toggle_hide_edges_on_drag(False)
    g.from_nx(nx_graph)
    return g.show('assets/{}.html'.format(filename))



bwords = ['Healthcare', 'Person', 'Basement']
twords = ['Economy', 'America', 'Spread']

for word in bwords:
    obj = twitter.fetch_tweets_by_topic(graphipy.create_graph(), word, 50)
    pickled('b_{}.data'.format(word), obj)
    make_my_graph(obj, 'b_{}'.format(word))
    
for word in twords:
    obj = twitter.fetch_tweets_by_topic(graphipy.create_graph(), word, 50)
    pickled('t_{}.data'.format(word), obj)
    make_my_graph(obj, 't_{}'.format(word))    

b_example = pd.read_pickle('assets/b_Healthcare.data')
make_my_graph(b_example, 'b_example')

That cluster at the bottom is AOC, which makes this type of visualization very interesting in context.

Lets compile the results:

From the above, we can see tighter, larger clusters of networks surrounding the Trump related words, both positive and negative. Biden related words have looser network structures except for “Healthcare” which we see being tweeted to many individuals from a single user, having a large single distribution pattern. This could be an example of an influencer, and may be demonstrating some homophily, however this is just conjecture and should be analyzed further. Utilizing the graphs in this way we see topics that may be of interest to focus on and watch over time.

It is also interesting in the example above that the Trump related words have greater distribution, both positive and negative. Trump is certainly popular on twitter with 89 million followers, having followers that may be reviewing content to criticize it from opposite political bases may explain the similar distribution between positive and negative words.

One may also notice from this graph that the Biden topics were distributed less and may not be as popular within the political base. This would be interesting to track over time and see if this changes.

Connecting tweets and mentions

To model more complex network connections let’s use the tweet data from above tweet_graph.

We will remodel the data to look at how tweets are connecting to users by mention and the impact this has on the visualization. The first step we need is to isolate the tweet label and which targets they relate to. The targets here are the user mentions.

tweets_df2 = tweets_df[['Label', 'user_mentions']]
tweets_df2 = tweets_df2.explode('user_mentions')

Utilizing the simplicity of pyvis, we can now map this data into a more complex network graph. We will use the source and target to set up the edges for each node, and also adjust the way the “physics” of the graph work with the barns_hut() method. This will help make our layout more readable. Here we will use more of the pyvis library where we have more control over the edges.

tweet_net = Network(height = 800, width = 1000, bgcolor="#222222", font_color="white", heading = "Tweet Network", notebook = True)
tweet_net.barnes_hut(gravity=-90000, #this will let use make use of the physics that are part of the visualization. These parameters control how the graph is organized.
        central_gravity=9,
        spring_length=5,
        spring_strength=0.01,
        damping=0.09,
        overlap=0)

sources = tweets_df2['Label']
targets = tweets_df2['user_mentions']

edge_data = zip(sources,targets)

for e in edge_data: #here we create the edge based on the source and target, we will add each node individually.
    src = e[0]
    dst = e[1]
    
    tweet_net.add_node(src, src, title = src)
    tweet_net.add_node(dst, dst, title = dst)
    tweet_net.add_edge(src, dst)
    tweet_net.inherit_edge_colors(True)
    
neighbor_map = tweet_net.get_adj_list()

for node in tweet_net.nodes:
    node["title"] += " Neighbors:<br>" + "<br>".join(neighbor_map[node["id"]])
    node["value"] = len(neighbor_map[node["id"]])

WOW! Now we are seeing how complex networks can be formed in social media. Let’s see if the physic parameters in the barnes_hut method can help us see trends differently

tweet_net = Network(height = 1000, width = 1400, bgcolor="#222222", font_color="white", heading = "Tweet Network", notebook = True)

#we are goign to mess with these numbers!
tweet_net.barnes_hut(gravity=-50000,
        central_gravity=8,
        spring_length=5,
        spring_strength=0.00001,
        damping=0.09,
        overlap=1)
############################################

sources = tweets_df2['Label']
targets = tweets_df2['user_mentions']

edge_data = zip(sources,targets)

for e in edge_data:
    src = e[0]
    dst = e[1]
    
    tweet_net.add_node(src, src, title = src)
    tweet_net.add_node(dst, dst, title = dst)
    tweet_net.add_edge(src, dst)
    tweet_net.inherit_edge_colors(True)
    
neighbor_map = tweet_net.get_adj_list()

for node in tweet_net.nodes:
    node["title"] += " Neighbors:<br>" + "<br>".join(neighbor_map[node["id"]])
    node["value"] = len(neighbor_map[node["id"]])

If using jupyter notebooks, we can add a widget here to control the physics directly:

tweet_net = Network(height = 1000, width = 1400, bgcolor="#222222", font_color="white", heading = "Tweet Network", notebook = True)

tweet_net.barnes_hut(gravity=-50000,
        central_gravity=8,
        spring_length=5,
        spring_strength=0.00001,
        damping=0.09,
        overlap=1)

sources = tweets_df2['Label']
targets = tweets_df2['user_mentions']

edge_data = zip(sources,targets)

for e in edge_data:
    src = e[0]
    dst = e[1]
    
    tweet_net.add_node(src, src, title = src)
    tweet_net.add_node(dst, dst, title = dst)
    tweet_net.add_edge(src, dst)
    tweet_net.inherit_edge_colors(True)
    
neighbor_map = tweet_net.get_adj_list()

for node in tweet_net.nodes:
    node["title"] += " Neighbors:<br>" + "<br>".join(neighbor_map[node["id"]])
    node["value"] = len(neighbor_map[node["id"]])

tweet_net.show_buttons(filter_=['physics']) 

By using these physic parameters you can drastically change the layout of the graph, which may lead to different interpretations of the data.

Let’s remap the Trump and Biden words using this technique and see if there is any difference.

def tweet_network(tweet_data, filename):
    tweets_df = tweet_data.get_df("tweet")    
    tweet_data2 = tweets_df[['Label', 'user_mentions']]
    tweet_data2 = tweet_data2.explode('user_mentions')
    tweet_data2.dropna(inplace = True)

    tweet_net = Network(height = 800, width = 1000, bgcolor="#222222", font_color="white", heading = "Tweet Network", notebook = True)
    tweet_net.barnes_hut(gravity=-50000,
        central_gravity=9,
        spring_length=5,
        spring_strength=0.0001,
        damping=0.09,
        overlap=1)
    sources = tweet_data2['Label']
    targets = tweet_data2['user_mentions']

    edge_data = zip(sources,targets)

    for e in edge_data: 
        src = e[0]
        dst = e[1]
        
        tweet_net.add_node(src, src, title = src)
        tweet_net.add_node(dst, dst, title = dst)
        tweet_net.add_edge(src, dst)
        tweet_net.inherit_edge_colors(True)
        
    neighbor_map = tweet_net.get_adj_list()

    for node in tweet_net.nodes:
        node["title"] += " Neighbors:<br>" + "<br>".join(neighbor_map[node["id"]])
        node["value"] = len(neighbor_map[node["id"]])

    return tweet_net.show("assets/tweet_network_{}.html".format(filename))

bwords = ['Healthcare', 'Person', 'Basement']
twords = ['Economy', 'America', 'Spread']

for word in bwords:
    data = pd.read_pickle('assets/b_{}.data'.format(word))
    tweet_network(data, 'b_{}'.format(word))
    
for word in twords:
    data = pd.read_pickle('assets/t_{}.data'.format(word))
    tweet_network(data, 't_{}'.format(word)) 

Finally, we will look at followers and how to visualize this information

For this set we will be using the example data only, and will limit how much data is called from twitter. Due to restrictions on the free API, this process is a bit taxing on the service and causes call suspension. With a paid account or smaller/more frequent calls, one could produce more substantial results.

Now I would like to see the number of followers that are connected to user “cliffordlevy” from the original example.

Let’s see how this tweet is being expanded in that network, now that we know how primary edges and connections are made. We will set the max follower limit to 200 in order to demonstrate volume.

follower_limit = 200
follower_graph = twitter.fecth_followers_by_screenname(graphipy.create_graph(), 'nytimes', follower_limit)

nx_graph3 = exporter.create_from_pd(follower_graph)
g3 = Network(height = 800, width = 1000, bgcolor="#222222", font_color="white", heading = "",notebook = True)
g3.toggle_hide_edges_on_drag(False)
g3.from_nx(nx_graph3)

g3.show('assets/cliffordlevy.html') 

Not really helpful…

This is really just a 3-dimensional list.

We are going to take the followers from our follower_graph, of user cliffordlevy. Then we will take the first 5 followers, then their 5 followers, and then the subsequent 3.

def create_follower_df(user, limit=6): #note there are 6 as the first row is the user we searched, this will be discarded
    followers = twitter.fecth_followers_by_screenname(graphipy.create_graph(), user, limit)
    df = followers.get_df("user")
    df.drop(0, axis=0, inplace=True)
    df['source'] = user
    df = df[['source', 'screen_name']]
    return df

Now let’s build a follower’s network:

cliffordlevy = create_follower_df('cliffordlevy', 10)

follower_results = cliffordlevy

for ind, row in cliffordlevy.iterrows():
    data = create_follower_df(row['screen_name'], 5)
    follower_results = pd.concat([follower_results, data])

for ind, row in follower_results[11:].iterrows():
    data = create_follower_df(row['screen_name'], 3)
    follower_results = pd.concat([follower_results, data])

pickled('follower_results.data',follower_results)

f_results = pd.read_pickle('assets/follower_results.data')

follower_net = Network(height = 900, width = 1100, bgcolor="#222222", font_color="white", heading = "Follower Network", notebook = True)
sources = f_results['source']
targets = f_results['screen_name']
edge_data = zip(sources,targets)
for e in edge_data:
    src = e[0]
    dst = e[1]
    follower_net.add_node(src, src, title = src)
    follower_net.add_node(dst, dst, title = dst)
    follower_net.add_edge(src, dst)
neighbor_map = follower_net.get_adj_list()
for node in follower_net.nodes:
    node["title"] += " Neighbors:<br>" + "<br>".join(neighbor_map[node["id"]])
    node["value"] = len(neighbor_map[node["id"]])

follower_net.show("assets/follower_net.html")

Much better! This visualization would allow us to analyze follower and friend networks across a variety of topics retweeted. With an established data set, this could be compared to tweet influence on other graphs to see how distribution occurs on both large and small follower networks.

Conclusion

GraphiPy, pyvis, and NetworkX are lightweight but powerful tools. With minimal code it is easy to extract data from Twitter to create visuals. These visuals, will allow us as data scientists, to focus on the right content to track and analyze. There are limitations to these tools, where the ease of use is traded with the control of data. We cannot easily extract geographical data, specific time frames of tweets, or more complicated user-follower-friend data frames. Despite these drawbacks, the libraries can provide meaningful content.

3 thoughts on “Twitter and Network Data Visualizations: A Python Tutorial

  1. Excellent examples including the technical inserts! It is amazing how data analysis can show trends and things we might not have ever thought about. The scary part is that so called “AI” algorithms might actually pick things up that could ruin someone’s life pretty quick if not careful!

    1. Great point, there are already models built where the “mood” of someone is being monitored as well. These are systems used to look at extremist behavior, but the validation of the models are anyone’s guess. I hope they built in some good sarcasm understanding!

Comments are closed.

Comments are closed.