Explorations in Next Generation Internet
As part of the NGI Forward project, DELab UW is supporting the European Commission’s Next Generation Internet initiative with identifying emerging technologies and social issues related to the Internet. Our team has been experimenting with various text mining methods to discover trends and hidden patterns in different types of online media.
Forward's three key pillars:

Developing a cutting-edge data-driven methodology for identifying early signals of new trends & technologies.

Mapping the ecosystems & networks surrounding these key topics, evaluating their social, legal, technological, ethical & economic contexts.

Creating a value-driven vision for what the future internet could and should look like, involving a wide variety of voices across Europe.


Unique terms: 0+

Media articles:

Analysis period: five years and four months plus separately first half of 2020 (covid)

Trend analysis

  • Analysis based on the frequency of terms (unigrams and bigrams) in the texts
  • Average monthly change in the analysed term's frequency is calculated by OLS regressions
  • Qualitative analysis of the top 1000 trending terms

Co-occurrence analysis

  • Exploring the relationship between topics
  • E.g. which trending terms are mentioned together with the term “fake news”
  • The number of articles containing both terms is divided by the number of articles including the main term of analysis (e.g. "fake news")

Sentiment analysis

  • VADER, an open-source lexicon and rule-based sentiment analysis tool
  • Sentiment score: between -1 (extremely negative) and 1 (extremely positive)
  • Calculated for paragraphs containing analysed terms
  • Track changes in sentiment over time
  • Identify most positive and negative co-occurring terms

Topic modelling

  • Each article is a mixture of topics, and each topic is a collection of characteristic terms
  • Latent Dirichlet Allocation: a popular method to discover the topics and terms
  • Unsupervised machine learning: it is enough to specify some parameters (e.g. the number of topics) to receive results

Issue mapping

  • Articles are categorised across two dimensions: geography (EU vs US) and covered topic (social vs technological)
  • Words are ranked based on their frequency in articles classified as social and non-social (technological)

Main Programming Tools

Topic identification

most trending NGI related keywords are identified

Grouped into wider areas
The size of the bubble is based on the regression coefficient
Bigger bubble: more robust trend

100 most trending terms over the full analysed period: are they trending also over shorter timespans?

Umbrella topics

Based on the trending terms, we identified 6 key NGI topics

Hover on a topic to show keywords and a short description

& Justice
& Consequences
Power & Building
Access, Inclusion & Justice

This umbrella topic deals with such issues as access to the Internet, control over information and ICT infrastructure, principles of social justice in the tech industry and the Internet’s ethical challenges. It is motivated by the belief that the Internet should be culturally inclusive, representative and accessible to all. Moreover, technologies should be designed and used following principles of ethics.

Coronavirus Pandemic & Consequences

The COVID-19 pandemic has brought about years of change in the way companies and institutions in all sectors and regions use digital technologies. On the one hand, the pandemic has accelerated the digitization of our societies. We have largely adopted and adapted to new remote work and online communication tools. On the other hand, the pandemic brought to light new contentious tech issues such as contact tracing or digital immunity passports.

Decentralising Power & Building Alternatives

Decentralisation has been long regarded as a transformative process with large disruptive potential for market competition. Blockchain technologies may play a central role in the future of social media, financial services and in other intermediation services. As of today, the most widespread implementation of blockchain is related to cryptocurrencies. As an emerging technology, blockchain raises pressing regulatory issues.

Environment, Sustainability & Resilience

Climate change remains humanity’s top challenge, with great impact on technological and social development. Besides already available consumer products, emerging technologies such as AI and quantum computing may play a significant role in reducing the harmful effects of global warming. However, the content crisis on social media divides society by popularising fake news. Therefore, Internet services play a greater role in the fight against climate change that is beyond the carbon footprint of using them. Reducing the spread of fake news and propaganda will be key to build a global consensus in the necessity to take more significant steps.

Privacy, Identity & Data Governance

This umbrella topic deals with the conundrum of online privacy faced by the tech sector and regulators. It covers legislative efforts in the EU and US for improving privacy protection, various digital identity verification methods, security of online personal data and tensions between the comfort of online services provided by tech giants and user privacy.

Trustworthy Information Flows

The state of public debate is heavily influenced by social media, the spread of fake news and conspiracy theories. Since the Russian interference scandal in the US elections, and the Cambridge Analytica campaign during the Brexit referendum, there are increasing warning signs that elections can be manipulated through social media.

& Resilience
Identity & Data
Choose topic for deep dive (or skip, if you want to browse through all topics)


The goal is to explore the relationship between trending terms
The figures reveal which terms were mentioned frequently in the same article

Access, Inclusion & Justice

Coronavirus Pandemic & Consequences

Decentralising Power & Building Alternatives

Environment, Sustainability & Resilience

Privacy, Identity & Data Governance

Trustworthy Information Flows


Aim: to track the public perception of issues and identify the positive and negative news stories related to the analysed terms
Sentiment score: between -1 (extremely negative) and 1 (extremely positive) calculated for paragraphs containing analysed terms
hate speech
hate speech

Most positiveMost negative
openai (0.15)8chan (-0.11)
dmwf europe (0.14)antidefamation league (-0.09)
roblox (0.11)netzdg (-0.07)
gpt3 (0.1)section 230 (-0.03)
factcheck (0.04)parler (-0.03)

social distancing
contact trace
social distancing

Most positiveMost negative
video conferencing (0.2)qanon (-0.04)
remote work (0.2)conspiracy (0.03)
twitch (0.19)protest (0.03)
zoom (0.16)misinformation (0.05)
stay home (0.08)amazon warehouse (0.05)

contact trace

Most positiveMost negative
test-and-trace (0.2)disinformation (0.03)
qr (0.16)world health (0.03)
exposure notification (0.16)herd immunity (0.04)
covidsafe app (0.15)conspiracy (0.05)
privacy-preserving (0.14)lockdown (0.1)

digital sovereignty

Most positiveMost negative
quantum circuit (0.29)puigneró (-0.1)
dapp (0.27)conspiracy (-0.07)
nonfungible (0.22)sim swap (-0.07)
cryptokitties (0.21)cryptocurrency scam (-0.06)
digital identity (0.19)disinformation (0.03)

digital sovereignty

Most positiveMost negative
collabora (0.4)pensacola (-0.2)
e2e (0.29)huawei equipment (0.03)
decentralised (0.29)facial recognition (0.04)
libra (0.29)palantir (0.06)
resilience (0.28)chinese government (0.07)

climate change
climate change

Most positiveMost negative
electrification (0.25)conspiracy (-0.05)
netzero (0.21)wildfires (-0.01)
ev (0.21)disinformation (0.05)
supply chains (0.19)misinformation (0.07)
carbon capture (0.19)deforestation (0.09)

digital identity
digital identity

Most positiveMost negative
pictfor (0.45)content moderation (-0.34)
mail-in ballot (0.41)sim swap (-0.14)
autodelete (0.35)puigneró (-0.1)
self-sovereign (0.32)facial recognition (0.16)
tokenisation (0.29)chinese government (0.17)

content moderation
content moderation

Most positiveMost negative
roblox (0.26)hate content (-0.09)
pretrain (0.23)farright (-0.05)
openai (0.18)incite violence (-0.04)
ai-based (0.08)netzdg (-0.03)
media literacy (0.06)qanon (-0.01)


Topic modelling assumes that each article is a mixture of topics, and each topic is a collection of characteristic terms
You can explore the most characteristic words for the topics
The size of the bubbles corresponds to the size of the topic, while the location suggests how similar the various topics are to each other

Access, Inclusion & Justice

Coronavirus Pandemic & Consequences

Decentralising Power & Building Alternatives

Environment, Sustainability & Resilience

Privacy, Identity & Data Governance

Trustworthy Information Flows

Issue mapping

Articles are classified in two dimensions: EU/US, social issue/technology

EU axis: articles from European sources or concerning Europe
Social issues axis: articles containing a sufficient number of words from a pre-defined list of social topics
Mapping trending words with article type based on number of occurrences
Top right corner: EU articles on social issues
Bottom left corner: US articles on technology


Interactively explore relevant trending keywords

Go back to topic choice


NGI Forward has received funding from the European Union's Horizon 2020 research and innovation programme under the Grant Agreement no 825652. The content of this website does not represent the opinion of the European Union, and the European Union is not responsible for any use that might be made of such content.

Zenodo: data GitLab: codes
Icons attribution:

Toggle presentation mode
Click to show extended description