Chapter 8: Information Cascades and Virality
About This Chapter
On the evening of 9 April 2017, a passenger was forcibly dragged from a United Airlines flight at O’Hare Airport. A fellow passenger filmed the incident on a smartphone. By midnight that video had been retweeted hundreds of thousands of times. By morning it had been viewed more than 100 million times on Chinese social media platform Weibo, and United’s stock had shed approximately $1.4 billion in market capitalization before the trading day was half over. Three days later the CEO’s initial defensive statement itself became the subject of a second cascade. The incident generated roughly 900,000 individual retweets across Twitter and Weibo before it faded from the trending charts, and it became a canonical case study in brand crisis management, platform virality, and the economics of social outrage.
What made that particular video travel so far and so fast? The content was undeniably provocative. But every day, thousands of comparably provocative videos are uploaded and die at a view count in the dozens. The difference is almost never content quality alone. It is the structural shape of how information propagates — which accounts retweet first, how many of their followers are themselves highly connected, how quickly the cascade jumps from a local cluster into a mass-media broadcast node. A video retweeted by three mid-size accounts with overlapping audiences will stall. The same video retweeted by one mid-size account that happens to sit at the junction between several distinct communities will explode.
This chapter formalizes that intuition. We define a cascade as a mathematical object — a rooted tree on a retweet graph — and introduce two complementary measures that capture its shape: size (how many people it reached) and structural virality (whether it reached them via broad broadcast or deep person-to-person chains). We then develop the two workhorse models for simulating how cascades arise from network structure: the Independent Cascade model and the Linear Threshold model. We show why empirical cascade sizes follow a heavy-tailed power law, why follower count is a surprisingly poor predictor of cascade reach, and how the time dynamics of viral content follow a characteristic explosive-then-decaying pattern.
Why this chapter belongs in a text-analytics course
Chapters 1 through 3 of this book focused on extracting signals from the content of text: what topics a corpus discusses, how positive or negative the tone is, and how a large language model can reason about both simultaneously. This chapter asks a different question: once a piece of text has been published, how does it move through the network? The two questions are deeply linked. Topic models (Chapter 1) and sentiment classifiers (Chapter 2) tell you what an individual message says; cascade analysis tells you how far and fast that message travels. A media analyst who only measures sentiment misses the amplification mechanism entirely. A cascade analyst who ignores content misses the triggers. The complete picture requires both.
This chapter also connects directly to the companion NetworkBook, which develops the underlying graph theory. NetworkBook Chapter 1 covers network construction and centrality; Chapter 2 introduces the Independent Cascade and Linear Threshold models as formal diffusion processes; Chapter 4 treats random-graph models that explain why real social networks are simultaneously sparse and short-diameter (the small-world property that enables rapid cascade propagation); and Chapter 5 covers Bayesian and social learning — the micro-foundations of why individuals adopt information from their neighbors. Readers who want the full theoretical derivations should consult those chapters. This chapter takes the applied perspective: given these models, what do they predict about virality that a practitioner can measure and act upon?
What you will build in this chapter
By the end, you will be able to:
- Represent a retweet cascade as a directed tree and compute its depth, size, and breadth.
- Calculate the Wiener-index structural virality of a cascade tree and interpret what it reveals about broadcast versus organic spread.
- Implement the Independent Cascade and Linear Threshold models on small social graphs and observe phase transitions in cascade size.
- Characterize the heavy-tailed power-law distribution of empirical cascade sizes and explain why most cascades die immediately and a tiny fraction dominate.
- Rank candidate seed nodes by their expected cascade reach — and understand why follower count is a poor proxy for that quantity.
- Describe the temporal dynamics of viral content and compute simple half-life and peak-time statistics.
All code runs live in your browser using NetworkX, NumPy, Pandas, and Matplotlib. No installations, no API keys, no files to download.
Table of Contents
- The Retweet Tree as a Cascade
- Structural Virality: The Wiener Index
- The Independent Cascade Model
- The Linear Threshold Model
- Empirical Power Laws in Cascade Size
- Influencer Detection Beyond Follower Count
- Time Dynamics of Virality
- Mini Case Study: A Tweet’s Life
The Retweet Tree as a Cascade
From timeline events to a mathematical object
When you publish a tweet, the platform records a timestamp and assigns the message a unique ID. If another user retweets your tweet, the platform logs the retweet ID, the retweeter’s user ID, and — crucially — the ID of the tweet being retweeted. These logs form a directed edge: from you (the source of the content) to the retweeter. If that retweeter is herself retweeted by three further users, three more edges appear in the log. The full collection of edges, traced back to the original tweet, forms a retweet cascade.
Formally, a cascade is a rooted directed tree \(T = (V, E)\) where:
- \(V\) is the set of users who saw and acted on the content (the original poster plus all retweeters).
- \(E\) is the set of retweet edges: \((u, v) \in E\) means user \(v\) retweeted the content from user \(u\).
- The root \(r \in V\) is the original poster.
- The tree is directed away from the root: every node except \(r\) has exactly one parent (the user they retweeted from), and the path from the root to any node traces the propagation chain.
Three scalar summaries characterize a cascade tree:
\[\text{size}(T) = |V| \quad \text{(total number of users who retweeted)}\]
\[\text{depth}(T) = \max_{v \in V} d(r, v) \quad \text{(maximum number of hops from the root)}\]
\[\text{breadth}(T) = \max_{k \geq 0} |\{v \in V : d(r, v) = k\}| \quad \text{(maximum width across any level)}\]
A cascade with high depth and low breadth propagates person-to-person down a long chain — the pattern associated with organic virality. A cascade with high breadth and low depth looks like a star: one influential account broadcasts directly to a massive follower base — the pattern associated with media broadcast. These two shapes, as we will see in the next section, have very different implications for content reach, speed, and durability.
Live cell: building and visualizing a cascade tree
The cell below constructs two small retweet trees as networkx.DiGraph objects, visualizes them side by side, and computes their depth, size, and breadth.
Reading the output. Both cascades have the same size — 11 nodes, including the root. But Cascade A is nearly flat: almost everyone received the content directly from the original poster. Cascade B has depth 4: some users are four hops removed from the source, having retweeted someone who retweeted someone who retweeted someone who retweeted the original post. In a real diffusion process, Cascade B’s structure tends to produce slower initial spread but can sustain momentum longer, because each new retweeter has her own follower network that has not yet seen the content.
The cascade tree in practice: what the data actually looks like
Twitter’s internal research team (Cheng et al., 2014) analyzed 150,000 Twitter cascades and found that the majority — over 95% — reach fewer than 10 nodes and have depth 1. They are small broadcast cascades: the original poster’s followers see the tweet, a handful retweet it, and the process stops there. The rare cascade that achieves depth 5 or more is almost always associated with a piece of content that crosses community boundaries — it escapes the local echo chamber of the original poster’s follower network and is picked up by accounts in structurally distant communities.
The Independent Cascade Model
Review from NetworkBook Chapter 2
The Independent Cascade (IC) model, introduced by Kempe, Kleinberg, and Tardos (2003), is the standard model for simulating information diffusion on social networks. This chapter assumes familiarity with the basic formulation from NetworkBook Chapter 2; we briefly restate it and then extend to heterogeneous edge probabilities.
In the IC model, the network is a directed graph \(G = (V, E)\). Each directed edge \((u, v)\) carries an activation probability \(p_{uv} \in (0, 1)\). The process unfolds in discrete rounds:
Round 0: A seed set \(S \subseteq V\) is activated. All other nodes are inactive.
Round \(t\): For each node \(u\) activated in round \(t-1\), and for each inactive neighbor \(v\) with edge \((u, v) \in E\), \(u\) attempts to activate \(v\) with probability \(p_{uv}\). Each attempt is an independent Bernoulli trial. If \(v\) receives at least one successful activation attempt from any of its newly-active neighbors, \(v\) becomes active in round \(t\). Activated nodes remain active; they do not deactivate.
The process terminates when no new activations occur. The cascade size is \(|V_{\text{active}}|\) at termination.
The probability that node \(v\) is eventually activated — starting from a single seed \(u\) — can be expressed as:
\[P(v \text{ activated} \mid S = \{u\}) = 1 - \prod_{(w,v) \in E} (1 - p_{wv} \cdot \mathbf{1}[w \text{ eventually activated}])\]
This formula is recursive and generally intractable in closed form, which is why simulation is the standard tool.
Heterogeneous edge probabilities
A homogeneous IC model (all edges share the same probability \(p\)) is a useful theoretical baseline but a poor description of real Twitter networks. The activation probability on edge \((u, v)\) depends on:
- Topical alignment: how closely \(u\)’s content matches \(v\)’s interests.
- Social tie strength: how often \(u\) and \(v\) interact (direct messages, mentions, mutual follows).
- Temporal recency: how recently \(u\) posted content that \(v\) engaged with.
In practice, edge probabilities are estimated from observed cascade logs: if user \(u\) has posted 100 tweets and \(v\) has retweeted 12 of them, a reasonable estimate is \(\hat{p}_{uv} = 0.12\). This data-driven IC model is used by platforms and researchers for influence maximization (selecting seed users for a campaign) and cascade size forecasting.
Live cell: IC model on a Twitter-style follower graph
The cell below builds a directed follower graph with 50 nodes, assigns heterogeneous edge probabilities drawn from a Beta distribution, and runs 200 Monte Carlo simulations of the IC process from a fixed seed. It then plots the distribution of cascade sizes and shows that it is heavy-tailed.
Reading the output. The left panel shows the cascade-size distribution: a majority of cascades die at a single node (the seed failed to activate anyone), and a small fraction grow to 10, 20, or more activations. The mean substantially exceeds the median — a hallmark of a heavy-tailed distribution. The right panel demonstrates the phase transition: below a critical probability \(p^*\), average cascade size is small and stable; above it, cascades suddenly reach a large fraction of the network. This threshold behavior — studied formally by Watts (2002) in the context of global cascades on random graphs — is a key structural result. For Twitter-like networks with heterogeneous degree distributions, the threshold is lower than a random-graph baseline would predict, because high-degree hubs function as super-spreaders that can ignite cascades even when average edge probability is low.
Empirical Power Laws in Cascade Size
The 99-1 rule and why it matters
A robust empirical finding, replicated across Twitter, Facebook, Weibo, Reddit, and YouTube, is that cascade sizes follow a power-law distribution. If \(P(s)\) denotes the probability that a randomly chosen cascade reaches exactly \(s\) nodes, then:
\[P(s) \propto s^{-\alpha}, \quad \alpha \approx 1.5\text{–}2.5\]
The equivalent statement for the complementary cumulative distribution — the probability that a cascade reaches at least \(s\) nodes — is:
\[P(S \geq s) \propto s^{-(\alpha-1)}\]
On a log-log plot, both are straight lines with slope \(-\alpha\) and \(-(\alpha-1)\) respectively. Goel et al. (2016) reported \(\alpha \approx 2.0\) for Twitter cascades, with slight variation by content category. Cheng et al. (2014) reported similar values. On Weibo, Wu et al. (2011) found \(\alpha\) between 1.8 and 2.2 depending on content type.
The practical meaning is severe: most cascades are tiny, and the distribution has no characteristic scale. The expected cascade size is dominated by the rare events at the right tail. A media team that measures its “typical” campaign by the average cascade size is measuring a quantity that a few outlier viral moments shift dramatically. The median is more representative, but even the median understates the skewness of the distribution — most cascades die at size 1, and the median is often 1 or 2.
Power-law exponent and the Bass diffusion connection
The power-law tail of cascade sizes has a deep connection to the Bass (1969) diffusion model, which is the workhorse model for new product adoption in marketing. Bass modeled the fraction of the population that has adopted a product by time \(t\) as:
\[\frac{dN(t)}{dt} = \left(p + q \frac{N(t)}{M}\right)(M - N(t))\]
where \(M\) is the total market, \(p\) is the coefficient of innovation (adoption driven by external media), and \(q\) is the coefficient of imitation (adoption driven by word-of-mouth). The IC model with heterogeneous probabilities can be seen as a stochastic, network-based generalization of this deterministic mean-field equation. The coefficient \(q\) maps directly to the average activation probability in the IC model; the power-law exponent \(\alpha\) is approximately \(1 + 1/q\) in the mean-field limit — consistent with empirical estimates of \(q \approx 0.3\)–\(0.5\) for viral social media content.
Live cell: simulating power-law cascade sizes and fitting the distribution
Reading the output. The linear-scale histogram confirms the extreme skewness: a large majority of cascades die at size 1 or 2, while a tiny fraction grow large. The CCDF on a log-log plot reveals the power-law tail: the points follow a straight line with slope close to \(-(\alpha-1)\), consistent with the empirical literature value of \(\alpha \approx 2.0\).
The gap between the mean and the median is diagnostically important. Whenever you see a social media report claiming an “average reach” of 10,000 users, ask whether that is the mean or median. For power-law distributions with \(\alpha < 2\), the mean is infinite in the population limit — making it a poor summary statistic in finite samples. The median is robust but low; the 95th or 99th percentile tells you more about the campaign’s upside potential.
Weibo trending topics and heavy-tailed cascades. Research by Zhang, Zhu, and colleagues (2013, 2014) analyzed several million cascades on Weibo — China’s dominant microblogging platform — and found power-law exponents between 1.8 and 2.2, depending on content category. Political content and celebrity gossip had the heaviest tails (lowest \(\alpha\), meaning larger cascades were more common); commercial advertising content had lighter tails (higher \(\alpha\), cascades rarely escaped the seed’s immediate neighborhood). This explains why Weibo’s trending algorithm weights organic engagement signals heavily: a brand that buys promoted posts generates a star cascade with \(\alpha > 3\) (most cascades tiny, a few large ones from the promotion); a celebrity moment generates a power-law cascade with \(\alpha \approx 1.9\). Weibo’s commercial team uses cascade shape, not just total views, to price content categories for advertisers.
Influencer Detection Beyond Follower Count
The follower-count fallacy
The conventional definition of an “influencer” in marketing practice is almost always operationalized as follower count. An account with 5 million followers is called a mega-influencer; one with 100,000 is a macro-influencer; 10,000–100,000 is a micro-influencer. This taxonomy is convenient, widely used, and empirically misleading.
Bakshy, Hofman, Mason, and Watts (2011) analyzed 1.6 million Twitter users and their retweet cascades over a six-month period. Their central finding: follower count explains less than 10% of the variance in cascade size for a given user. The correlation between log-follower-count and log-cascade-size, while statistically significant, has an \(R^2\) of roughly 0.09. Even accounting for a rich set of structural features — follower count, following count, age of account, past retweet rate, degree centrality, betweenness centrality, PageRank — the \(R^2\) only rises to about 0.25. Most of the variance in cascade size is simply not predictable from ex ante network features.
Why is follower count such a poor predictor? Three reasons:
Passive followers. A large fraction of followers on any account are passive: they scroll past content without engaging. The effective audience — the set of followers who will actually see and consider retweeting a given tweet — is a small fraction of the nominal follower count, and it varies by content type, timing, and phrasing.
Structural position, not size. What matters is not how many people follow an account, but whether those followers sit at the boundaries between distinct communities. An account with 50,000 highly clustered followers in a single topic community will generate smaller cascades than an account with 10,000 followers strategically positioned at the intersection of several communities — because the second account’s retweets can bridge into multiple distinct audiences simultaneously.
Content alignment. Cascade size depends critically on the match between the content’s topic and the seed node’s follower base. A fashion influencer retweeting a political opinion generates a small cascade; the same political opinion retweeted by a news journalist generates a large one. No structural measure captures this content-audience alignment.
Measuring influence from observed cascades
The correct approach, emphasized by Bakshy et al. (2011) and subsequent work, is to estimate triggered centrality from observed cascade data: for each candidate seed node, compute its empirical mean cascade size across a large sample of past content. This is a data-driven, outcome-based measure rather than a structural proxy.
When cascade logs are not available — for a new account or a new platform — simulation provides the alternative. Run the IC model many times from each candidate seed, record the mean cascade size, and rank candidates by this metric. The live cell below compares this simulation-based ranking to the degree-based ranking on a small graph.
Before running the next cell, predict: will the node with the highest degree (follower count) in the graph also achieve the largest mean cascade size? If not, which structural property do you expect the top cascade node to possess?
Reading the output. The scatter plot reveals the degree-cascade rank discrepancy: many nodes that rank highly by out-degree do not rank highly by cascade reach, and vice versa. The Spearman correlation between degree rank and cascade rank is typically in the range 0.3–0.6 for scale-free graphs — significantly above zero (degree is not useless), but far below the 1.0 that the “follower count = influence” heuristic implicitly assumes.
The nodes that outperform their degree ranking tend to occupy bridging positions in the network: they have followers in multiple communities, so their retweets can cross community boundaries and ignite secondary cascades. This is the network analog of Granovetter’s (1973) famous finding that weak ties — connections between structurally distant nodes — are the conduits for novel information. PageRank, which weights in-links by the importance of the linking node, captures this better than raw degree, but even PageRank underperforms simulation-based cascade estimation because it does not account for the stochastic nature of activation.
The BTS Army and community-seeded cascades. The global fanbase of K-pop group BTS — self-organized as the “ARMY” across Twitter, Weibo, and TikTok — provides one of the most studied examples of coordinated cascade seeding. Yoo, Kim, and Lee (2021) analyzed BTS-related Twitter cascades over a 12-month period and found that the top 0.1% of cascade-generating accounts were not mega-influencers by follower count: they were mid-size accounts (50,000–500,000 followers) that occupied bridging positions between the K-pop fan community, English-speaking general music audiences, and Asian-language social media spheres. When these bridge accounts retweeted BTS content, cascades consistently crossed language and community boundaries, reaching audiences that BTS’s own official accounts could not directly reach. Big Hit Entertainment (now HYBE) reportedly identified these bridge accounts through cascade-log analysis and provided them with early content access — a practical implementation of simulation-based influencer selection over follower-count selection.
Mini Case Study: A Tweet’s Life
Synthetic cascade dataset
We close this chapter with a mini case study that integrates the concepts from all previous sections. We generate a synthetic dataset of 20 cascades, each representing a distinct piece of content published on a hypothetical social platform. The cascades span a range of sizes, shapes, and structural viralities, and we compute the summary statistics that would be the inputs to a real-world virality-prediction model.
The analysis produces two diagnostic plots. The first plots size against depth for each cascade, allowing us to classify cascades into the broadcast quadrant (large size, low depth), the viral quadrant (moderate size, high depth), the organic micro-cascade quadrant (small size, moderate depth), and the failed cascade quadrant (small size, low depth). The second plots structural virality against final cascade size, showing the relationship between cascade geometry and ultimate reach.
Reading the output. The left panel — size versus depth — reveals the cascade taxonomy cleanly. Broadcast cascades (orange) cluster in the bottom-right quadrant: large size, low depth. Viral cascades (blue) occupy the top half: high depth regardless of size. Organic cascades (green) scatter across the interior, reflecting their mixed shapes.
The right panel — structural virality versus size — reveals a finding that surprises practitioners: some small cascades have high structural virality. A chain of 12 nodes has higher structural virality than a star of 50 nodes. From a content-spread perspective, this small high-SV cascade represents 12 people who each personally shared the content with someone new — the kind of person-to-person advocacy that brand managers prize. The large broadcast cascade (50 nodes, low SV) represents one account broadcasting to 49 passive followers. Both might look equivalent on a simple “reach” dashboard, but their implications for sustained engagement, brand trust, and future cascade potential are radically different.
Virality prediction as a machine learning problem. The summary statistics computed in this case study — size, depth, breadth, structural virality, cascade type classification — are the raw features used by platform ML teams to predict whether a cascade will continue to grow or has peaked. Cheng et al. (2014) at Facebook showed that the early cascade shape (measured at size 50) predicts whether a cascade will double in size with approximately 80% accuracy — a substantial improvement over chance (50%), and one that relies almost entirely on structural features rather than content features. Twitter’s internal virality-prediction system, described in Zhao et al. (2015), uses a Hawkes process model — a self-exciting point process — to forecast future retweet arrival rates from the observed time series of retweets in the first hour of a cascade. The structural virality computed at hour 1 is one of the strongest individual predictors in that model.
Chapter Summary
This chapter developed a complete framework for understanding how information propagates through social networks, from the formal definition of a cascade tree to practical virality-prediction workflows.
The retweet tree (Section 1) is the fundamental object: a rooted directed tree where edges represent retweet relationships. Its three scalar summaries — size, depth, and breadth — characterize its shape but do not fully distinguish cascade types.
Structural virality (Section 2), defined via the Wiener index \(v(T) = \frac{1}{n(n-1)}\sum_{i \neq j} d(i,j)\), is the measure that distinguishes broadcast cascades (low \(v\), star-shaped) from organic viral cascades (high \(v\), chain-like). Two cascades of identical size can have structural virality differing by a factor of \(n/6\) — the difference between a celebrity broadcast and grassroots person-to-person advocacy.
The Independent Cascade model (Section 3) simulates diffusion by independent Bernoulli trials on each edge. The cascade size distribution is heavy-tailed, and the mean cascade size exhibits a sharp phase transition at a critical edge probability \(p^*\) related to the basic reproduction number \(R_0 = p \cdot \bar{k}\).
The Linear Threshold model (Section 4) models cumulative social pressure: a node adopts when the weighted fraction of active neighbors crosses its threshold \(\theta_v\). This model captures the social-proof dynamics that platforms actively engineer through feed ranking and notification design.
Power-law cascade sizes (Section 5) follow \(P(s) \propto s^{-\alpha}\) with \(\alpha \approx 2\), implying that most cascades are trivially small and a tiny fraction dominate total reach. The mean is a poor summary statistic; the 99th percentile tells you more about a platform’s viral potential.
Influencer detection (Section 6) requires simulation-based cascade estimation, not follower count. The Spearman correlation between degree rank and cascade rank is typically 0.3–0.6 for scale-free networks — degree is informative but far from sufficient. Bridging position and content-audience alignment explain the residual variance that structural measures miss.
Time dynamics (Section 7) follow a bell-shaped pulse consistent with the SIR epidemic model. The half-life of a typical viral tweet is 2–36 hours; content above this window benefits from paid amplification but cannot be resurrected to its organic peak.
Chapter 3 (LLMs) showed you how to extract meaning from a single message. This chapter showed you how that message travels once published. NetworkBook Chapter 5 (social learning) provides the micro-foundations: why each individual at each node of the cascade tree makes the adoption decision it does. Together, the three chapters form a complete theory of social media information flow: from content (what is said) to cascade (how far it travels) to learning (why each node decides to forward it).
- A cascade has size 200 and depth 1. What does this tell you about the structural virality of the cascade? What type of account most likely posted the original content?
- You observe two cascades of size 100. Cascade A has structural virality 2.1; Cascade B has structural virality 8.7. Which one is more likely to still be growing 12 hours after the original post? Why?
- In the IC model, a node \(v\) has 20 in-neighbors, each with activation probability \(p = 0.1\). What is the probability that \(v\) is activated in a single round, given all 20 neighbors are active simultaneously?
- The LT model predicts that a hashtag campaign will fail to achieve global adoption despite starting from a well-connected seed set. What property of the threshold distribution is the most likely cause?
- You work for a brand agency and your client asks you to identify the 10 influencers who will maximize cascade size for a product launch. Describe the simulation-based methodology you would use, and explain why you would not rely solely on follower count.
- A tweet’s cumulative retweet count at \(t=1\) hour is 400. At \(t=2\) hours it is 420. What does the growth rate suggest about where the tweet is on its epidemic curve, and what action would you recommend?
Prof. Xuhu Wan · HKUST · Foundations of Network and Text Data · 2026 Edition