# Unknown Title
## How citation boosts promote scientific paradigm shifts and Nobel Prizes
Amin Mazloumian, 1 Young-Ho Eom, 2 Dirk Helbing, 1 Sergi Lozano, 1 and Santo Fortunato 2
ETH Z¨ urich, CLU E1, Clausiusstrasse 50, 8092 Z¨ urich, Switzerland 2 Complex Networks & Systems Lagrange Laboratory, ISI Foundation, Turin, Italy
Nobel Prizes are commonly seen to be among the most prestigious achievements of our times. Based on mining several million citations, we quantitatively analyze the processes driving paradigm shifts in science. We find that groundbreaking discoveries of Nobel Prize Laureates and other famous scientists are not only acknowledged by many citations of their landmark papers. Surprisingly, they also boost the citation rates of their previous publications. Given that innovations must outcompete the rich-gets-richer effect for scientific citations, it turns out that they can make their way only through citation cascades. A quantitative analysis reveals how and why they happen. Science appears to behave like a self-organized critical system, in which citation cascades of all sizes occur, from continuous scientific progress all the way up to scientific revolutions, which change the way we see our world. Measuring the 'boosting effect' of landmark papers, our analysis reveals how new ideas and new players can make their way and finally triumph in a world dominated by established paradigms. The underlying 'boost factor' is also useful to discover scientific breakthroughs and talents much earlier than through classical citation analysis, which by now has become a widespread method to measure scientific excellence, influencing scientific careers and the distribution of research funds. Our findings reveal patterns of collective social behavior, which are also interesting from an attention economics perspective. Understanding the origin of scientific authority may therefore ultimately help to explain, how social influence comes about and why the value of goods depends so strongly on the attention they attract.
PACS numbers: 89.75.-k
## I. INTRODUCTION
Ground-breaking papers are extreme events [1] in science. They can transform the way in which researchers do science in terms of the subjects they choose, the methods they use, and the way they present their results. The related spreading of ideas has been described as an epidemic percolation process in a social network [2]. However, the impact of most innovations is limited. There are only a few ideas, which gain attention all over the world and across disciplinary boundaries [3]. Typical examples are elementary particle physics, the theory of evolution, superconductivity, neural networks, chaos theory, systems biology, nanoscience, or network theory.
It is still a puzzle, however, how a new idea and its proponent can be successful, given that they must beat the rich-gets-richer dynamics of already established ideas and scientists. According to the Matthew effect [4-7], famous scientists receive an amount of credit that may sometimes appear disproportionate to their actual contributions, to the detriment of younger or less known scholars. This implies a great authority of a small number of scientists, which is reflected by the big attention received by their work and ideas, and of the scholars working with them [8].
Therefore, how can a previously unknown scientist establish at all a high scientific reputation and authority, if those who get a lot of citations receive even more over time? Here we shed light on this puzzle. The following results for 124 Nobel Prize Laureates in chemistry, economics, medicine and physics suggest that innovators can gain reputation and innovations can successfully spread, mainly because a scientist's body of work overall enjoys a greater impact after the publication of a landmark paper. Not only do colleagues notice the ground-breaking paper, but the latter also attracts the attention to older publications of the same author (see Fig. 1). Consequently, future papers have an impact on past papers, as their relevance is newly weighted.
We focus here on citations as indicator of scientific impact [9-13], studying data from the ISI Web of Science, but the use of click streams [14] would be conceivable as well. It is well-known that the relative number of citations correlates with research quality [15-17]. Citations are now regularly used in university rankings [18], in academic recruitments and for the distribution of funds among scholars and scientific institutions [19].
## II. RESULTS
We evaluated data for 124 Nobel Prize Laureates that were awarded in the last two decades (1990-2009), which include an impressive number of about 2 million citations. For all of them and other internationally established experts as well, we find peaks in the changes of their citation rates (Figs. 2 and 3).
Moreover, it is always possible to attribute to these peaks landmark papers (Fig. 4), which have reached hundreds of citations over the period of a decade. Such landmark papers are rare even in the lives of the most excellent scientists, but some authors have several such peaks.
Technically, we detect a groundbreaking article a published at time t = t a by comparing the citation rates
Figure 1: Illustration of the boosting effect. Typical citation trajectories of papers, here for Nobel Prize Laureate John Bennett Fenn, who received the award in chemistry in 2002 for the development of the electrospray ionization technique used to analyze biological macromolecules. The original article, entitled Electrospray ionization for mass spectrometry of large biomolecules , coauthored by M. Mann, C. K. Meng, S. F. Wong and C. M. Whitehouse, was published in Science in 1989 and is the most cited work of Fenn, with currently over 3 , 000 citations. The diagram reports the growth in time of the total number of citations received by this landmark paper (blue solid line) and by six older papers. The diagram indicates that the number of citations of the landmark paper has literally exploded in the first years after its appearance. However, after its publication in 1989, a number of other papers also enjoyed a much higher citation rate. Thus, a sizeable part of previous scientific work has reached a big impact after the publication of the landmark paper. We found that the occurrence of this boosting effect is characteristic for successful scientific careers.
<details>
<summary>Image 1 Details</summary>

### Visual Description
## Line Chart: Cumulative Citations Over Time
### Overview
This image presents a line chart illustrating the cumulative number of citations over time, spanning from 1980 to 2009. The chart displays multiple lines, each representing a different research area or publication series, showing how their citations have accumulated over the years. A vertical dashed line highlights the year 1990, and a text annotation points to a specific publication.
### Components/Axes
* **X-axis:** Year, ranging from 1980 to 2009, with tick marks every 5 years.
* **Y-axis:** Cumulative number of citations, ranging from 0 to 1000, with tick marks every 200.
* **Lines:** Seven distinct colored lines representing different citation trends. No explicit legend is provided, so line identification relies on visual differentiation.
* **Annotation:** "J.B. Fenn et al. (1989) Science 246, 4929" positioned near the top-left of the chart.
* **Vertical Dashed Line:** Located at the year 1990.
### Detailed Analysis
Let's analyze each line's trend and approximate data points.
* **Dark Blue Line:** This line exhibits the steepest upward slope, indicating rapid citation growth. It starts at approximately 0 citations in 1980 and reaches nearly 1000 citations by 2009.
* (1985): ~20 citations
* (1990): ~200 citations
* (1995): ~600 citations
* (2000): ~850 citations
* (2005): ~950 citations
* (2009): ~990 citations
* **Yellow Line:** This line shows a steady, but slower, increase in citations compared to the dark blue line.
* (1985): ~0 citations
* (1990): ~50 citations
* (1995): ~250 citations
* (2000): ~500 citations
* (2005): ~700 citations
* (2009): ~850 citations
* **Light Green Line:** This line has a moderate growth rate, starting later than the dark blue and yellow lines.
* (1985): ~0 citations
* (1990): ~0 citations
* (1995): ~100 citations
* (2000): ~300 citations
* (2005): ~500 citations
* (2009): ~650 citations
* **Cyan Line:** This line shows a slower growth rate, with a plateau around 400-500 citations.
* (1985): ~0 citations
* (1990): ~50 citations
* (1995): ~200 citations
* (2000): ~350 citations
* (2005): ~450 citations
* (2009): ~450 citations
* **Black Line:** This line exhibits a relatively slow and steady growth, remaining below 300 citations throughout the period.
* (1985): ~0 citations
* (1990): ~20 citations
* (1995): ~80 citations
* (2000): ~150 citations
* (2005): ~200 citations
* (2009): ~250 citations
* **Magenta Line:** This line shows the slowest growth, remaining below 200 citations.
* (1985): ~0 citations
* (1990): ~0 citations
* (1995): ~20 citations
* (2000): ~80 citations
* (2005): ~120 citations
* (2009): ~150 citations
* **Red Line:** This line shows very slow growth, remaining below 100 citations.
* (1985): ~0 citations
* (1990): ~0 citations
* (1995): ~0 citations
* (2000): ~20 citations
* (2005): ~50 citations
* (2009): ~80 citations
### Key Observations
* The dark blue line clearly dominates the citation landscape, indicating a highly influential publication or research area.
* The vertical dashed line at 1990 may represent a significant event or turning point in the field, as several lines show a noticeable increase in slope after this year.
* The annotation "J.B. Fenn et al. (1989) Science 246, 4929" suggests this publication is related to the dark blue line, potentially being the source of its high citation count.
* There is a wide range of citation accumulation rates among the different lines, indicating varying levels of impact and recognition.
### Interpretation
The chart demonstrates the evolution of citations over time for different research areas. The dominance of the dark blue line, coupled with the annotation, suggests that the work of J.B. Fenn et al. (1989) had a profound and lasting impact on the field, leading to a substantial increase in citations. The vertical line at 1990 could signify a breakthrough or a shift in research focus that spurred further development and citation growth. The varying slopes of the lines indicate that some research areas have gained more traction and recognition than others. The chart provides a visual representation of the scientific impact and influence of different publications and research areas over a 29-year period. The data suggests a clear hierarchy of influence, with the Fenn et al. publication standing out as a pivotal contribution. The slower growth of other lines may indicate niche areas or research that has not yet reached widespread recognition.
</details>
before and after t a for the earlier papers. The analysis proceeds as follows: Given a year t and a time window w , we take all papers of the studied author that were published since the beginning of his/her career until year t . The citation rate R <t,w measures the average number of citations received per paper per year in the period from t -w + 1 to t . Similarly, the citation rate R >t,w measures the average number of citations received by the same publications per paper per year between t +1 and t + w (or 2009, if t + w exceeds 2009). The ratio R w ( t ) = R >t,w /R <t,w , which we call the 'boost factor', is a variable that detects critical events in the life of a scientist: sudden increases in the citation rates (as illustrated by Fig. 1) show up as peaks in the time-dependent plot of R w ( t ).
In our analysis we used the generalized boost factor R ′ w ( t ), which reduces the influence of random variations
Figure 2: Typical time evolutions of the boost factor. Temporal dependence of R ′ w ( t ) for Nobel Laureates [here for (a) Mario R. Capecchi (Medicine, 2007), (b) John C. Mather (Physics, 2006), (c) Roger Y. Tsien (Chemistry, 2008) and (d) Roger B. Myerson (Economics, 2007)]. Sharp peaks indicate citation boosts in favor of older papers, triggered by the publication and recognition of a landmark paper. Insets: The peaks even persist (though somewhat smaller), if in the determination of the citation counts c p,t , the landmark paper is skipped (which is defined as the paper that produces the largest reduction in the peak size, when excluded from the computation of the boost factor). We conclude that the observed citation boosts are mostly due to a collective effect involving several publications rather than due to the high citation rate of the landmark paper itself.
<details>
<summary>Image 2 Details</summary>

### Visual Description
## Chart: Boost Factor vs. Year for Four Researchers
### Overview
The image presents four separate line charts, arranged in a 2x2 grid. Each chart displays the "Boost Factor" over "Year" for a different researcher: MR Capecci, JC Mather, RY Tsien, and RB Myerson. Each chart also includes an inset plot showing a zoomed-in view of the Boost Factor over the same year range. The y-axis scales differ significantly between the charts, indicating varying magnitudes of Boost Factor for each researcher.
### Components/Axes
* **X-axis:** "Year", ranging from approximately 1965 to 2005.
* **Y-axis:** "Boost factor". The scale varies for each chart:
* Chart a (MR Capecci): 0 to approximately 100.
* Chart b (JC Mather): 0 to approximately 100.
* Chart c (RY Tsien): 0 to approximately 10000.
* Chart d (RB Myerson): 0 to approximately 1500000.
* **Lines:** Each chart contains two lines:
* A green line representing the primary Boost Factor trend.
* A blue line representing a secondary Boost Factor trend, displayed in an inset plot.
* **Inset Plots:** Each chart has a smaller plot in the top-right corner, showing a zoomed-in view of the blue line's data. The y-axis scale of the inset plots are:
* Chart a: 0 to approximately 80.
* Chart b: 0 to approximately 350.
* Chart c: 0 to approximately 2500.
* Chart d: 0 to approximately 1400.
* **Labels:** Each chart is labeled with the researcher's name: MR Capecci (a), JC Mather (b), RY Tsien (c), and RB Myerson (d).
### Detailed Analysis or Content Details
**Chart a (MR Capecci):**
* The green line shows a Boost Factor that starts at approximately 5 in 1965, rises to a peak of around 25-30 around 1985-1990, and then declines to approximately 10 by 2005.
* The blue line (inset) shows a series of peaks between 1965 and 2005, with the highest peak around 1995 at approximately 70-80.
**Chart b (JC Mather):**
* The green line starts at approximately 5 in 1965, rises to a peak of around 60-70 around 1985-1990, and then declines to approximately 10 by 2005.
* The blue line (inset) shows a single, prominent peak around 1990 at approximately 300-350.
**Chart c (RY Tsien):**
* The green line starts at approximately 500 in 1965, rises dramatically to a peak of around 8000-9000 around 1990, and then declines to approximately 2000 by 2005.
* The blue line (inset) shows a series of peaks between 1965 and 2005, with the highest peak around 1995 at approximately 2000-2500.
**Chart d (RB Myerson):**
* The green line starts at approximately 300000 in 1965, rises to a peak of around 1300000-1400000 around 1985-1990, and then declines to approximately 400000 by 2005.
* The blue line (inset) shows a series of peaks between 1965 and 2005, with the highest peak around 1995 at approximately 1200-1400.
### Key Observations
* All four researchers exhibit a similar trend: a rise in Boost Factor peaking around 1985-1990, followed by a decline.
* The magnitude of the Boost Factor varies significantly between researchers, with RB Myerson having the highest values and MR Capecci having the lowest.
* The blue lines (inset plots) represent more sporadic or secondary Boost Factor events, showing multiple peaks over the time period.
* The inset plots provide a more detailed view of the blue line's fluctuations, which are less visible on the main charts due to the different y-axis scales.
### Interpretation
The charts likely represent the impact or recognition (Boost Factor) of research contributions made by each scientist over time. The peak around 1985-1990 could correspond to a period of significant breakthroughs or publications for all four researchers. The differing magnitudes of Boost Factor suggest varying levels of impact or recognition. The inset plots, representing secondary Boost Factor events, could indicate specific publications, awards, or collaborations that generated additional recognition. The overall pattern suggests that scientific impact tends to rise with experience and then decline as researchers move later in their careers or as their fields evolve. The large difference in scale between the researchers suggests that the "Boost Factor" is a relative measure, and the absolute values are not directly comparable. The data suggests that RB Myerson's work had a significantly larger impact (as measured by Boost Factor) than the work of the other three researchers.
</details>
in the citation rates (see Materials and Methods).
Figure 2 shows typical plots of the boost factors R ′ w ( t ) of four Nobel Prize Laureates. Interestingly, peaks are even found, when those papers, which mostly contribute to them, are excluded from the analysis (see insets of Fig. 2). That is, the observed increases in the citation rates are not just due to the landmark papers themselves, but rather to a collective effect, namely an increase in the citation rates of previously published papers. This results from the greater visibility that the body of work of the corresponding scientist receives after the publication of a landmark paper and establishes an increased scientific impact ('authority'). From the perspective of attention economics [20], it may be interpreted as a herding effect resulting from the way in which relevant information is collectively discovered in an information-rich environment. Interestingly, we have found that older papers receiving a boost are not always works related to the topic of the landmark paper.
Traditional citation analysis does not reveal such crucial events in the life of a scientist very well. Figure 3 shows the time history of three classical citation indices: the average number of citations per paper 〈 c ( t ) 〉 , the cumulative number C ( t ) of citations, and the Hirsch in-
Figure 3: Dynamics of the boost factor R ′ w ( t ) versus traditional citation variables. Each panel displays the time histories of four variables: the boost factor R ′ w ( t ), the average number of citations per paper 〈 c ( t ) 〉 , the cumulative number of citations C ( t ), and the H -index earned until year t [21]. The panels refer to the same Nobel Laureates as displayed in Fig. 2. The classical indices have relatively smooth profiles, i.e. they are not very sensitive to extreme events in the life of a scientist like the publication of landmark papers. An advantage of the boost factor is that its peaks allow one to identify scientific breakthroughs earlier.
<details>
<summary>Image 3 Details</summary>

### Visual Description
## Chart: Normalized Index Value Over Time for Four Researchers
### Overview
The image presents four separate line charts, arranged in a 2x2 grid, displaying the normalized index value over time for four researchers: Mario R. Capecchi (a), John C. Mather (b), Roger Y. Tsien (c), and Robert B. Myerson (d). Each chart plots three different metrics against the year (t) from approximately 1965 to 2005. A vertical green line appears in each chart around the year 1980, representing a "Boost factor".
### Components/Axes
* **X-axis:** Year t, ranging from approximately 1965 to 2005.
* **Y-axis:** Normalized index value, ranging from 0 to 1.0.
* **Legend:** Located in the top-right corner of the entire image, defining the colors and metrics:
* Blue: Avg. Citations per paper
* Pink: h-Index
* Red: Cumulative citations
* **Titles:** Each subplot is labeled with the researcher's name:
* a: MR Capecchi
* b: JC Mather
* c: RY Tsien
* d: RB Myerson
### Detailed Analysis or Content Details
**Chart a: MR Capecchi**
* **Avg. Citations per paper (Blue):** Starts around 0.1 in 1965, fluctuates significantly, peaking around 0.7 in 1985, then declines to approximately 0.4 in 2005.
* **h-Index (Pink):** Begins at approximately 0.2 in 1965, increases steadily to around 0.8 in 2005.
* **Cumulative citations (Red):** Starts near 0 in 1965, rises to approximately 0.6 in 1985, and continues to increase to around 0.9 in 2005.
**Chart b: JC Mather**
* **Avg. Citations per paper (Blue):** Starts around 0.1 in 1965, shows a large spike around 1980 (boost factor), then fluctuates, increasing to approximately 0.6 in 2005.
* **h-Index (Pink):** Begins at approximately 0.1 in 1965, increases steadily to around 0.9 in 2005.
* **Cumulative citations (Red):** Starts near 0 in 1965, rises to approximately 0.4 in 1980, and continues to increase to around 0.8 in 2005.
**Chart c: RY Tsien**
* **Avg. Citations per paper (Blue):** Starts around 0.1 in 1975, fluctuates, peaking around 0.6 in 1995, and declines to approximately 0.4 in 2005.
* **h-Index (Pink):** Begins at approximately 0.1 in 1975, increases steadily to around 0.8 in 2005.
* **Cumulative citations (Red):** Starts near 0 in 1975, rises to approximately 0.5 in 1995, and continues to increase to around 0.8 in 2005.
**Chart d: RB Myerson**
* **Avg. Citations per paper (Blue):** Starts around 0.1 in 1975, increases steadily to approximately 0.5 in 2005.
* **h-Index (Pink):** Begins at approximately 0.1 in 1975, increases rapidly after 1990, reaching approximately 0.9 in 2005.
* **Cumulative citations (Red):** Starts near 0 in 1975, increases steadily to approximately 0.7 in 2005.
### Key Observations
* The h-Index consistently shows a positive trend for all researchers, indicating increasing impact over time.
* The "Boost factor" (green line) around 1980 appears to correlate with a temporary increase in average citations per paper for Mather.
* Cumulative citations generally increase over time, but the rate of increase varies between researchers.
* Myerson's h-Index shows a particularly steep increase after 1990, suggesting a period of significant recognition.
* Capecchi and Tsien show more fluctuation in average citations per paper compared to Mather and Myerson.
### Interpretation
The charts illustrate the scholarly impact of four Nobel laureates over their careers, as measured by three different bibliometric indicators. The consistent upward trend in h-Index for all researchers suggests a general increase in their influence and recognition over time. The "Boost factor" around 1980 likely represents a significant publication or event that increased their visibility. The differences in the trajectories of the three metrics (average citations, h-index, and cumulative citations) provide a nuanced view of their impact. For example, a high cumulative citation count indicates a large body of work, while a high h-index suggests a concentration of highly cited papers. The variations between the researchers highlight the different patterns of scholarly achievement and recognition. The data suggests that the h-index is a robust metric for tracking long-term scholarly impact, while average citations per paper can be more sensitive to short-term fluctuations. The steep increase in Myerson's h-index after 1990 could be attributed to the impact of his work on game theory and economics gaining wider recognition.
</details>
dex [21] ( h -index) H ( t ) in year t . For comparison, the evolution of the boost factor R ′ w ( t ) is depicted as well. All indices were divided by their maximum value, in order to normalize them and to use the same scale for all. The profiles of the classical indices are rather smooth in most cases, and it is often very hard to see any significant effects of landmark papers. However, this is not surprising, as the boost factor is designed to capture abrupt variations in the citation rates, while both C ( t ) and H ( t ) reflect the overall production of a scientist and are therefore less sensitive to extreme events.
To gain a better understanding of our findings, Figs. 4 and 5 present a statistical analysis of the boosts observed for Nobel Prize Laureates. Figure 4 demonstrates that pronounced peaks are indeed related to highly cited papers. Furthermore, Fig. 5 analyzes the size distribution of peaks. The distribution looks like a power law for all choices of the parameters w and k (at least within the relevant range of small values). This suggests that the bursts are produced by citation cascades as they would occur in a self-organized critical system [22]. In fact, power laws were found to result from human interactions also in other contexts [23-25].
The mechanism underlying citation cascades is the discovery of new ideas, which colleagues refer to in the references of their papers. Moreover, according to the rich-gets-richer effect, successful papers are more often cited, also to raise their own success. Innovations may
Figure 4: Correlation between papers and the local maxima ('peaks') of R ′ w ( t ). We first determined the ranks of all papers of an author based on the total number of citations received until the year 2009 inclusively. We then determined the rank of that particular publication, which had the greatest contribution to the peak. This was done by measuring the reduction in the height of the peak, when the paper was excluded from the calculation of the boost factor (as in the insets of Fig. 2). The distribution of the ranks of 'landmark papers' is dominated by low values, implying that they are indeed among the top publications of their authors.
<details>
<summary>Image 4 Details</summary>

### Visual Description
\n
## Histogram: Distribution of Scientists in Top Percentages
### Overview
The image presents a histogram illustrating the distribution of scientists across different top percentage categories. The x-axis represents the "Top percentage" and the y-axis represents the "Percentage of scientists". The data is presented as a series of bars, each representing the percentage of scientists falling within a specific top percentage range.
### Components/Axes
* **X-axis Label:** "Top percentage"
* Scale: 0 to 50, with increments of 10.
* **Y-axis Label:** "Percentage of scientists"
* Scale: 0 to 60, with increments of 10.
* **Data Series:** A single series of bars representing the distribution.
* **No Legend:** There is no legend present in the image.
### Detailed Analysis
The histogram shows a heavily right-skewed distribution. The highest concentration of scientists falls within the 0-5% top percentage range, with approximately 58% of scientists represented. The percentage of scientists decreases rapidly as the top percentage increases.
Here's a breakdown of approximate values based on bar heights:
* 0-5%: ~58%
* 5-10%: ~14%
* 10-15%: ~8%
* 15-20%: ~5%
* 20-25%: ~3%
* 25-30%: ~2%
* 30-35%: ~1%
* 35-40%: ~0.5%
* 40-45%: ~0.3%
* 45-50%: ~0.2%
The bars are of varying widths, each representing a 5-unit range on the x-axis. The height of each bar corresponds to the percentage of scientists within that range.
### Key Observations
* The distribution is strongly skewed to the right, indicating that a large proportion of scientists are concentrated in the lower top percentage ranges.
* The percentage of scientists rapidly declines as the top percentage increases.
* Very few scientists are represented in the higher top percentage ranges (above 30%).
### Interpretation
The data suggests a hierarchical structure within the scientific community, where a relatively small number of scientists achieve very high rankings (top percentages). The steep decline in the percentage of scientists as the top percentage increases indicates a competitive landscape, with diminishing returns in terms of representation at higher levels. This could reflect factors such as research impact, publication rates, or citation counts. The histogram provides a visual representation of the distribution of success or recognition within the scientific field. It is important to note that the data does not specify *what* constitutes being in the "top percentage" – it could be based on various metrics. The data also does not provide information about the total number of scientists represented in the sample.
</details>
even cause scientists to change their research direction or approach. Apparently, such feedback effects can create citation cascades, which are ultimately triggered by landmark papers.
Finally, it is important to check whether the boost factor is able to distinguish exceptional scientists from average ones. Since any criteria used to define 'normal scientists' may be questioned, we have assembled a set of scientists taken at random. Scientists were chosen among those who published at least one paper in the year 2000. We selected 400 names for each of four fields: Medicine, Physics, Chemistry and Economy. After discarding those with no citations, we ended up with 1361 scientists. In Fig. 6 we draw on a bidimensional plane each scientist of our random sample (empty circles), together with the Nobel Prize Laureates considered (full circles). The two dimensions are the value of the boost factor and the average number of citations of a scientist. A cluster analysis separates the populations in the proportions of 79% to 21%. The separation is significant but there is an overlap of the two datasets, mainly because of two reasons. First, by picking a large number of scientists at random, as we did, there is a finite probability to choose also outstanding scholars. We have verified that this is the case. Therefore, some of the empty circles deserve to sit on the top-right part of the diagram, like many Nobel Prize Laureates. The second reason is that we are considering
Figure 5: Cumulative probability distribution of peak heights in the boost factor curves of Nobel Prize Laureates. The four panels correspond to different choices of the parameters k and w . The power law fits (lines) are performed with the maximum likelihood method [26]. The exponents for the direct distribution (of which the cumulative distribution is the integral) are: 3 . 63 ± 0 . 16 (top left), 2 . 93 ± 0 . 16 (bottom left), 1 . 63 ± 0 . 05 (top right), 1 . 41 ± 0 . 05 (bottom right). The best fits have the following lower cutoffs and values of the KolmogorovSmirnov (KS) statistics: 1 . 06, 0 . 0289 (top left), 1 . 15, 0 . 0264 (bottom left), 13 . 1, 0 . 038 (top right), 24 . 7, 0 . 0462 (bottom right). The KS values support the power law ansatz for the shape of the curves. Still, we point out that on the left plots the data span just one decade in the variable, so one has to be careful about the existence of power laws here.
<details>
<summary>Image 5 Details</summary>

### Visual Description
\n
## Chart: Cumulative Probability vs. Peak Size for Different k and W Values
### Overview
The image presents four separate log-log plots, each displaying the cumulative probability of peak size. The plots are arranged in a 2x2 grid, with varying values of 'k' (1 and 4) and 'W' (2 and 5). Each plot shows a scatter of blue data points fitted with a red line. The x-axis represents "Peak size" and the y-axis represents "Cumulative probability". Both axes are on a logarithmic scale.
### Components/Axes
* **X-axis Label:** "Peak size" (logarithmic scale)
* **Y-axis Label:** "Cumulative probability" (logarithmic scale)
* **Titles:** Each subplot is labeled with 'k' and 'W' values:
* Top-left: k=1, W=2
* Top-right: k=4, W=2
* Bottom-left: k=1, W=5
* Bottom-right: k=4, W=5
* **Data Points:** Blue circles representing observed data.
* **Fitted Lines:** Red lines representing the fitted cumulative distribution.
* **Axis Scales:** Both axes range from approximately 10^-1 to 10^6 on the logarithmic scale.
### Detailed Analysis
Each subplot will be analyzed individually.
**1. k=1, W=2 (Top-left)**
* **Trend:** The data points show a clear downward trend, indicating that as peak size increases, cumulative probability decreases. The line is approximately linear on the log-log scale.
* **Data Points (approximate):**
* Peak size ≈ 10^-1, Cumulative probability ≈ 10^1
* Peak size ≈ 10^0, Cumulative probability ≈ 10^0.5 (approximately 3)
* Peak size ≈ 10^1, Cumulative probability ≈ 10^-0.5 (approximately 0.3)
* Peak size ≈ 10^2, Cumulative probability ≈ 10^-1.5 (approximately 0.03)
**2. k=4, W=2 (Top-right)**
* **Trend:** Similar downward trend as the first plot, but the decrease is less steep. The line is also approximately linear on the log-log scale.
* **Data Points (approximate):**
* Peak size ≈ 10^-1, Cumulative probability ≈ 10^1
* Peak size ≈ 10^0, Cumulative probability ≈ 10^0.7 (approximately 5)
* Peak size ≈ 10^1, Cumulative probability ≈ 10^-0.3 (approximately 0.5)
* Peak size ≈ 10^4, Cumulative probability ≈ 10^-1.5 (approximately 0.03)
**3. k=1, W=5 (Bottom-left)**
* **Trend:** Downward trend, similar to k=1, W=2, but the data points are more scattered. The line is approximately linear on the log-log scale.
* **Data Points (approximate):**
* Peak size ≈ 10^-1, Cumulative probability ≈ 10^1
* Peak size ≈ 10^0, Cumulative probability ≈ 10^0.5 (approximately 3)
* Peak size ≈ 10^1, Cumulative probability ≈ 10^-0.5 (approximately 0.3)
* Peak size ≈ 10^2, Cumulative probability ≈ 10^-1.5 (approximately 0.03)
**4. k=4, W=5 (Bottom-right)**
* **Trend:** Downward trend, similar to k=4, W=2, but the decrease is less steep. The line is also approximately linear on the log-log scale.
* **Data Points (approximate):**
* Peak size ≈ 10^-1, Cumulative probability ≈ 10^1
* Peak size ≈ 10^0, Cumulative probability ≈ 10^0.7 (approximately 5)
* Peak size ≈ 10^1, Cumulative probability ≈ 10^-0.3 (approximately 0.5)
* Peak size ≈ 10^5, Cumulative probability ≈ 10^-1.5 (approximately 0.03)
### Key Observations
* All four plots exhibit a power-law relationship between peak size and cumulative probability, as evidenced by the approximately linear trend on the log-log scale.
* Increasing 'k' (from 1 to 4) results in a less steep slope, indicating a slower decrease in cumulative probability with increasing peak size.
* Increasing 'W' (from 2 to 5) appears to have a minor effect on the slope, but the data is more scattered for W=5.
* The data points are relatively well-fitted by the red lines, suggesting a good model fit.
### Interpretation
The plots demonstrate the cumulative distribution of peak sizes for different parameter settings (k and W). The power-law behavior suggests that large peaks are relatively rare, while small peaks are more common. The parameter 'k' appears to control the rate at which the probability of observing a peak decreases with increasing peak size. A higher 'k' value implies a slower decay, meaning larger peaks are more likely to occur. The parameter 'W' may influence the overall distribution shape, but its effect is less pronounced and the data is more variable. These plots could be used to characterize the distribution of events in a system where peak sizes are important, such as earthquake magnitudes, financial market fluctuations, or network traffic bursts. The differences in the slopes for different 'k' values suggest that the underlying process generating these peaks is sensitive to the 'k' parameter. The fact that the lines are approximately linear on a log-log scale indicates that the distribution follows a power law, which is often observed in complex systems.
</details>
scholars from different disciplines, which generally have different citation frequencies. This affects particularly the average number of citations of a scientist, but also the value of the boost factor. In this way, the position in the diagram is affected by the specific research topic, and the distribution of the points in the diagram of Fig. 6 is a superposition of field-specific distributions. Nevertheless, the two datasets, though overlapping, are clearly distinct. Adding further dimensions could considerably improve the result. In this respect, the boost factor can be used together with other measures to better specify the performance of scientists.
## III. DISCUSSION
In summary, groundbreaking scientific papers have a boosting effect on previous publications of their authors, bringing them to the attention of the scientific community and establishing their 'authority'. We have provided the first quantitative characterization of this phenomenon by introducing a new variable, the 'boost factor', which is sensitive to sudden changes in the citation
Figure 6: Two-dimensional representation of our collection of Nobel Prize Laureates and a set of 1361 scientists, which were randomly selected. On the x-axis we report the average number of citations of a scientist, on the y-axis his/her boost factor. It can be seen that, on average, Nobel Prize winners clearly perform better. However a Nobel Prize is not solely determined by the average number of citations and the boost factor, but also by further factors. These may be the degree of innovation or quality, which are hard to quantify.
<details>
<summary>Image 6 Details</summary>

### Visual Description
## Scatter Plot: Effect Size vs. Citation Impact of Nobel Laureates and Random Sample
### Overview
This image presents a scatter plot comparing the effect size and average citation per paper for two groups: Nobel laureates and a random sample of researchers. The plot uses a logarithmic scale for both axes. The data points are color-coded to distinguish between the two groups, with additional annotations indicating confidence intervals.
### Components/Axes
* **X-axis:** Log (Average citation per paper). Scale ranges from approximately -2 to 3.
* **Y-axis:** Log (Effect size). Scale ranges from approximately -4 to 8.
* **Data Series 1:** Nobel laureates (represented by red dots).
* **Data Series 2:** Random sample (represented by green dots).
* **Annotations:**
* "Nobel laureates" label positioned in the top-left quadrant.
* "Random sample" label positioned in the bottom-right quadrant.
* "79%" and "21%" labels with dashed blue lines indicating confidence intervals for Nobel laureates.
* "79%" and "21%" labels with dashed blue lines indicating confidence intervals for the random sample.
### Detailed Analysis
The scatter plot shows a clear distinction between the two groups.
**Nobel Laureates (Red Dots):**
* The data points are generally clustered towards the upper-right portion of the plot, indicating higher effect sizes and citation counts.
* The distribution is somewhat elongated, with a tail extending towards higher effect sizes.
* The trend is generally upward sloping, meaning that as citation counts increase, effect sizes also tend to increase.
* Approximate data points (estimated from visual inspection):
* ( -1.5, 1.5): A few points are present.
* (0, 1.5): A cluster of points.
* (0, 4): A few points.
* (1, 2): A dense cluster of points.
* (1, 5): A few points.
* (2, 3): A few points.
* (2.5, 6): One outlier.
**Random Sample (Green Dots):**
* The data points are more dispersed and concentrated towards the lower-left portion of the plot, indicating lower effect sizes and citation counts.
* The distribution is more circular, with no strong directional trend.
* Approximate data points (estimated from visual inspection):
* (-2, -2): A few points.
* (-1, 0): A cluster of points.
* (0, 0): A dense cluster of points.
* (1, 0): A cluster of points.
* (1, 1): A few points.
* (2, 0): A few points.
**Confidence Intervals (Blue Dashed Lines):**
* For Nobel laureates, the 79% confidence interval line is approximately at x=0.7, y=2.2. The 21% confidence interval line is approximately at x=0.3, y=1.5.
* For the random sample, the 79% confidence interval line is approximately at x=0.2, y=0.5. The 21% confidence interval line is approximately at x=-0.2, y=0.2.
### Key Observations
* Nobel laureates consistently exhibit higher effect sizes and citation counts compared to the random sample.
* The confidence intervals suggest that the difference between the two groups is statistically significant.
* There is a positive correlation between citation count and effect size for Nobel laureates.
* The random sample shows a wider spread of data points, indicating greater variability in effect sizes and citation counts.
### Interpretation
The data strongly suggests that Nobel laureates, as a group, produce research with both greater impact (measured by citations) and larger effect sizes. This is not surprising, as the Nobel Prize is awarded for groundbreaking work. The confidence intervals provide statistical support for this observation, indicating that the observed difference is unlikely to be due to chance. The positive correlation between citation count and effect size for Nobel laureates suggests that their highly cited work also tends to have a substantial impact on their field. The wider spread of data points in the random sample indicates that research quality and impact vary more widely among researchers who have not received a Nobel Prize. The logarithmic scales used for both axes likely compress the distribution of the data, making it easier to visualize the differences between the two groups. The use of a random sample as a baseline allows for a comparison of the Nobel laureates' performance against a broader population of researchers.
</details>
rates. The fact that landmark papers trigger the collective discovery of older papers amplifies their impact and tends to generate pronounced spikes long before the paper receives full recognition. The boosting factor can therefore serve to discover new breakthroughs and talents more quickly than classical citation indices. It may also help to assemble good research teams, which have a pivotal role in modern science [27-29].
The power law behavior observed in the distribution of peak sizes suggests that science progresses through phase transitions [30] with citation avalanches on all scales-from small cascades reflecting quasi-continuous scientific progress all the way up to scientific revolutions, which fundamentally change our perception of the world. While this provides new evidence for sudden paradigm shifts [31], our results also give a better idea of why and how they happen.
It is noteworthy that similar feedback effects may determine the social influence of politicians, or prices of stocks and products (and, thereby, the value of companies). In fact, despite the long history of research on these subjects, such phenomena are still not fully understood. There is evidence, however, that the power of a person or the value of a company increase with the level
of attention they enjoy. Consequently, our study of scientific impact is likely to shed new light on these scientific puzzles as well.
## IV. MATERIALS AND METHODS
The basic goal is to improve the signal-to-noise ratio in the citation rates, in order to detect sudden changes in them. An effective method to reduce the influence of papers with largely fluctuating citation rates is to weight highly cited papers more. This can be achieved by raising the number of cites to the power k , where k > 1. Therefore, our formula to compute R ′ w ( t ) looks as follows:
$$R _ { t } ( t ) = \frac { \sum _ { p } \sum _ { t ' } t ^ { + w } } { \sum _ { p } \sum _ { t ' } t ^ { - w } }$$
Here, c p,t ′ is the number of cites received by paper p in year t ′ . The sum over p includes all papers published
- [1] Albeverio S, Jentsch V, Kantz H, eds. (2006) Extreme Events in Nature and Society. Berlin, Germany: Springer.
- [2] Bettencourt LM, Cintr´ on-Arias A, Kaiser DI, CastilloCh´ avez C (2006) The power of a good idea: Quantitative modeling of the spread of ideas from epidemiological models. Physica A 364: 513 - 536.
- [3] Davenport TH, Beck JC (2001) The Attention Economy : Understanding the New Currency of Business Boston, USA: Harvard Business School Press.
- [4] Merton RK (1968) The Matthew effect in science: The reward and communication systems of science are considered. Science 159: 56-63.
- [5] Merton RK (1988) The Matthew effect in science, ii: Cumulative advantage and the symbolism of intellectual property. ISIS 79: 606-623.
- [6] Scharnhorst A (1997) Characteristics and impact of the matthew effect for countries. Scientometrics 40: 407-422.
- [7] Petersen AM, Jung WS, Yang JS, Stanley HE (2011) Quantitative and empirical demonstration of the Matthew effect in a study of career longevity. Proc Natl Acad Sci USA 108: 18-23.
- [8] Malmgren RD, Ottino JM, Nunes Amaral LA (2010) The role of mentorship in protege performance. Nature 465: 622-626.
- [9] Garfield E (1955) Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas. Science 122: 108-111.
- [10] Garfield E (1979) Citation Indexing. Its Theory and Applications in Science, Technology, and Humanities. New York, USA: Wiley.
- [11] Egghe L, Rousseau R (1990) Introduction to Informetrics: Quantitative Methods in Library, Documentation and Information Science. Amsterdam, The Netherlands: Elsevier.
- [12] Amsterdamska O, Leydesdorff L (1989) Citations: indicators of significance. Scientometrics 15: 449-471.
before the year t ; w is the time window selected to compute the boosting effect. For k = 1 we recover the original definition of R w ( t ) (see main text). For the analysis presented in the paper we have used k = 4 and w = 5, but our conclusions are not very sensitive to the choice of smaller values of k and w .
## V. ACKNOWLEDGMENTS
We acknowledge the use of ISI Web of Science data of Thomson Reuters for our citation analysis. A.M., S.L. and D.H. were partially supported by the Future and Emerging Technologies programme FP7-COSI-ICT of the European Commission through the project QLectives (grant no.: 231200). Y.-H. E. and S. F. gratefully acknowledge ICTeCollective, grant 238597 of the European Commission.
- [13] Petersen AM, Wang F, Stanley HE (2010) Methods for measuring the citations and productivity of scientists across time and discipline. Phys Rev E 81: 036114.
- [14] Bollen J, de Sompel HV, Smith JA, Luce R (2005) Toward alternative metrics of journal impact: A comparison of download and citation data. Information Processing & Management 41: 1419 - 1440.
- [15] Trajtenberg M (1990) A penny for your quotes: Patent citations and the value of innovations. RAND Journal of Economics 21: 172-187.
- [16] Aksnes DW (2006) Citation rates and perceptions of scientific contribution. J Am Soc Inf Sci Technol 57: 169185.
- [17] Moed HF (2005) Citation Analysis in Research Evaluation. Berlin, Germany: Springer.
- [18] Van Raan AJF (2005) Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods. Scientometrics 62: 133-143.
- [19] Boyack KW, B¨ orner K (2003) Indicator-assisted evaluation and funding of research: visualizing the influence of grants on the number and citation counts of research papers. J Am Soc Inf Sci Technol 54: 447-461.
- [20] Wu F, Huberman BA (2007) Novelty and collective attention. Proc Natl Acad Sci USA 104: 17599-17601.
- [21] Hirsch JE (2005) An index to quantify an individual's scientific research output. Proc Natl Acad Sci USA 102: 16569-16572.
- [22] Bak P, Tang C, Wiesenfeld K (1987) Self-organized criticality: An explanation of the 1/f noise. Phys Rev Lett 59: 381-384.
- [23] Barab´ asi AL (2005) The origin of bursts and heavy tails in human dynamics. Nature 435: 207-211.
- [24] Oliveira JG, Barab´ asi AL (2005) Human dynamics: The correspondence patterns of Darwin and Einstein. Nature 437: 1251.
- [25] Malmgren RD, Stouffer DB, Campanharo ASLO, Amaral LAN (2009) On Universality in Human Correspondence
Activity. Science 325: 1696-1700.
- [26] Clauset A, Shalizi CR, Newman MEJ (2007) Power-law distributions in empirical data. SIAM Reviews 51: 661703.
- [27] Guimer` a R, Uzzi B, Spiro J, Amaral LAN (2005) Team Assembly Mechanisms Determine Collaboration Network Structure and Team Performance. Science 308: 697-702.
- [28] Wuchty S, Jones BF, Uzzi B (2007) The Increasing Dominance of Teams in Production of Knowledge. Science 316: 1036-1039.
- [29] Jones BF, Wuchty S, Uzzi B (2008) Multi-University Research Teams: Shifting Impact, Geography, and Stratification in Science. Science 322: 1259-1262.
- [30] Stanley HE (1987) Introduction to Phase Transitions and Critical Phenomena. New York, USA: Oxford University Press.
- [31] Kuhn TS (1962) The Structure of Scientific Revolutions. Chicago, USA: University of Chicago Press.