Satisfaction Observer

How positive (or negative) are tweets that people write about
Tweet Type
Recent Mixed


This application performs sentiment analysis on tweets posted with respect to different categories (such as heads of governments). The corresponding sentiment scores range on a scale from 0 to 100. One data point is generated for each day and each category item as plotted in the chart above. If you hover above/tap on one of the data points, one positive and one negative example tweet will be shown below the chart.

How Does It Work Behind the Scenes?

The twitter data that are used for sentiment analysis are queried from the Twitter Search API once a day. Up to 1000 tweets per item are obtained in one cycle.

Subsequently, the tweets are cleaned: certain text fragments such as links and twitter handles are removed. Furthermore, the names of the items are replaced with generic tokens (such as "president" for a specific government leader) in order to eliminate systematic bias that might be originating from the items' names.

In the next step, the cleaned tweets are put through a text classification pipeline. Concretely, we utilize the TextClassificationPipeline from the awesome Hugging Face Transformers framework. For sentiment analysis, the deep neural natural language processing model BERT is utilized. More specifically, we use the nlptown/bert-base-multilingual-uncased-sentiment model.

The pipeline outputs sentiment labels between 1 and 5 with a respective confidence score for each tweet. We normalize labels on a scale between 0 and 100 and calculate the average among all tweets for an item weighted by the confidence scores. For this calculation, only labels with a confidence score of at least 0.65 are considered.

What Is the Difference Between 'Mixed' and 'Recent'?

The Twitter Search API gives the opportunity to specify whether one wants to receive real-time ("recent") tweets at the time of the search request or the most popular results (as determined by a Twitter algorithm). In order to get a sufficient number of tweets, when choosing the "mixed" option above, the results include tweets obtained from a query with the "mixed" keyword, that is, both real-time as well as popular tweets. More information here.

What Are the Limitations?

The charts should be taken with a big grain of salt! For now, the results are only for educational purposes and do not claim to represent the sentiment in a realistic way. Sometimes tweets that are clearly negative are classified as positive and vice versa. This might be due to the following non-exhaustive set of limitations:

  • The model was fine-tuned on product reviews and not tweets. Thus, there might be linguistic discrepancies between the data the model was fine-tuned on and the data that is now used for classifications. I am currently working on a labeled dataset for tweets to further fine-tune the model and to investigate whether this will improve the predictive quality
  • The model is particularly bad at classifying sarcasm – a common characteristic of tweets. The identification of sarcasm has been researched into (for example here) but is still in its early stages.
  • A very common observation is that tweets that are conveying positive sentiment might not necessarily incorporate a positive attitude towards the item mentioned and vice versa. For example, an author might compose a happy tweet because of a potential failure of a government leader. In this case, the sentiment would be positive, however the attitude towards the leader would be negative.

When Are the Charts Updated?

The charts are updated once a day with tweets of the previous day (the exact point in time depends on how long the server takes to predict the sentiment labels).


If you have any questions or suggestions, don't hesitate to drop a line at