Redundancy, Publication Overlap, and Other Forms of Duplication

Although the prevalence of blatant duplicate publications varies across disciplines, its overall prevalence is relatively low (see Larivière & Gingras, 2010) and their impact on the integrity of science is likely minor, particularly in instances when the published papers are truly identical (i.e., same title, abstract, author list). However, other forms of duplication exist and these are often classified with terms such as redundant publication or overlapping publication (see p 148 of Iverson, et al., 2007 for additional descriptive terms). As indicated earlier, these types of self-plagiarism are more prevalent and likely more detrimental to science because they involve the dissemination of earlier published data that are presented as new data, thereby skewing the scientific record. Bruton (2014) and others (e.g., von Elm, Poglia, Walder & Tramer, 2004) have discussed various other types of duplication. Below are some of the most common forms.

Data Aggregation/Augmentation. In this type of duplication, data that have already been published are published again with some additional new data (see Smolčić & Bilić-Zulle, 2013. The resulting representation of the aggregated data is likely to be conceptually consistent with the original data set, but it will have different numerical outcomes (i.e., means and standard deviations), figures, and graphs (see Bonnell, Hafner, Hersam, Kotov, Buriak, Hammond, Javey, Nordlander, Parak, Schaak, Wee, Weiss, Rogach, Stevens & Willson, 2012 for an example). This type of publication is highly problematic when the author presents the data in a way that misleads the reader into believing that the entire data set is independently derived from the data that had been originally published. That is, the reader is never informed that a portion of the data being described had already been published or perhaps the presentation is ambiguous enough for the reader to be unable to discern the true nature of the data.

Data Disaggregation. As the label suggests, data disaggregation occurs when data from a previously published study are published again minus some data points and with no indication or, at best, ambiguous indication as to their relationship to the originally published paper. The new study may consist of the original data set minus a few data points now considered outliers, or perhaps data points at both ends of their range that happen to lie outside a newly established criterion for inclusion in the new analyses, or perhaps some other procedure that results in the exclusion of some of the data points appearing in the original study. As with data augmentation, the new publication with the disaggregated data will contain different numerical outcomes (i.e., means and standard deviations), figures, and graphs, however, the underlying data are largely the same as the previously published data, but are presented in a way that misleads the reader into interpreting the ‘new’ data as having been independently collected.

Data segmentation. Also known as Salami Publication or Least Publishable Unit, data segmentation is a practice that is often subsumed under the heading of self-plagiarism, but which, technically is not necessarily a form of duplication or of redundancy as Bruton, 2014 has correctly pointed out. It is usually mentioned in the context of self-plagiarism because the practice often does include a substantial amount of text overlap and possibly some data as well, with earlier publications by the same author/s. Consider the examples provided by Kassirer and Angell (1995), former editors of The New England Journal of Medicine:

Several months ago, for example, we received a manuscript describing a controlled intervention in a birthing center. The authors sent the results on the mothers to us, and the results on the infants to another journal. The two outcomes would have more appropriately been reported together. We also received a manuscript on a molecular marker as a prognostic tool for a type of cancer; another journal was sent the results of a second marker from the same pathological specimens. Combining the two sets of data clearly would have added meaning to the findings.

(p. 450).

In some cases, the segmenting of a large study into two or more publications may, in fact, be the most meaningful approach to reporting the results of that research. Longitudinal studies are an example of this type of situation. However, dividing a study into smaller segments must always be done with full transparency, showing exactly how the data being reported in the later publication are related to the earlier publication. An often stated rationale used by some authors for not disclosing the relationship between related publications or for other forms of covert overlap between publications is that both reports are prepared and submitted simultaneously to different journals (see, for example, Katsnelson, 2015). However, this should not be considered an acceptable excuse for not disclosing any overlap between studies, especially to the editors of the journals. Authors should describe how the study data being described are related to a larger project. They can always provide a footnote, author note or some other indication that manuscripts describing the other portions of the data set are in preparation or under consideration, etc., which ever the case may be. The important point is that readers need to be made aware that the data being reported were collected in the context of a larger study. As with other forms of redundancy and actual duplication, salami slicing can lead to a distortion of the literature by leading unsuspecting readers to believe that data presented in each salami slice (i.e., journal article) are independently derived from a different data collection effort or subject sample.

Guideline 9: Authors of complex studies should heed the advice previously put forth by Angell & Relman (1989). If the results of a single complex study are best presented as a ‘cohesive’ single whole, they should not be partitioned into individual papers. Furthermore, if there is any doubt as to whether a paper submitted for publication represents fragmented data, authors should enclose other papers (published or unpublished) that might be part of the paper under consideration (Kassirer & Angell, 1995).

Other forms of redundancy with or without text or data duplication.

Reanalysis of the same data. There may be occasions in which previously published data can be analyzed using a novel technique not available at the time of publication. Or perhaps the authors thought of a new way to analyze the data using an existing technique. Both of these scenarios and still others perhaps may warrant a re-examination of the data. However, it should be obvious that authors need to be fully transparent with their readers by indicating the fact that earlier analyses of the data have already been published.

Same data; different conclusions. von Elm, et al, (2004) described various other forms of redundancy. For example, a related practice occurs when authors publish the same data, with a somewhat different textual slant within the body of the paper and, again, with ambiguous or non existent acknowledgment of the earlier publication. Such redundant papers may contain a slightly different interpretation of the data or the introduction to the paper may be described in a somewhat different theoretical, empirical, or perhaps subject sample context. Sometimes, additional data or somewhat different analyses of the same, previously published data are reported in the redundant paper.