Summary and Info
Plagiarism is a problem with far-reaching consequences for the sciences. However, even today’s best software-based systems can only reliably identify copy & paste plagiarism. Disguised plagiarism forms, including paraphrased text, cross-language plagiarism, as well as structural and idea plagiarism often remain undetected. This weakness of current systems results in a large percentage of scientific plagiarism going undetected. Bela Gipp provides an overview of the state-of-the art in plagiarism detection and an analysis of why these approaches fail to detect disguised plagiarism forms. The author proposes Citation-based Plagiarism Detection to address this shortcoming. Unlike character-based approaches, this approach does not rely on text comparisons alone, but analyzes citation patterns within documents to form a language-independent "semantic fingerprint" for similarity assessment. The practicability of Citation-based Plagiarism Detection was proven by its capability to identify so-far non-machine detectable plagiarism in scientific publications.