|
|||||||
|
|
|
|||||
|
|
|||||||
![]() ![]() ![]() ![]() ![]() |
|
Detecting Plagiarism M. Kandan Kandan works in the NIIT-Centre for Research in Cognitive Systems (NIIT-CRCS). He is interested in the psychology of on-line learning. Internet evangelists tell us that if only we were younger (like, say, born yesterday), then all our schooling, education etc. may happen via the Internet. However, to make this a reality, several new technologies will have to be developed. For one thing, at present exams have to be taken under human supervision. Otherwise, couldn't a student ask his smart sister to take his math tests? Further, the Internet already offers so many resources -- on any topic whatsoever -- that make it easy for students to copy others' work. Human instructors can still make out if a student has plagiarized. But online education will only work if human work is minimized. The work described in this article gives a nice review of the ideas and the software systems being developed to detect this kind of plagiarism. In conjunction with other technologies, these systems may also solve the impersonation problem to make online testing a real possibility. The Problem: In a literary work, if someone's work is reproduced without acknowledging the source, it amounts to plagiarism. The level of plagiarism can vary from copying the whole work to even copying a sentence. This problem is most prevalent in the academic institutions, when students submit assignments, test papers, software programs, etc. With the advent of Internet, it is easy for students to find material on any subject, and to cut and paste it into their own document, and submit it as their own work. The problem is to detect whether it is the student's own work or whether parts of it have been copied. The Approach: Clough reviews the work of several researchers, who have proposed systems to tackle this problem. The approach followed in these systems is known as stylometry. Stylometry employs descriptive statistics to quantify the similarity/variance between two documents. If the variance is large, then the probablity is high that the documents have been written by two different authors. Essentially, suppose that person A claims to have written a document, and we want to verify this. The idea is to compare it with other documents written by A for style and consistency. For example, we can compare two documents based on various heuristics such as: vocabulary used, phrases used, sentence length preferred by A, etc. By comparing plots based on the occurrences of these features in the documents, we can judge whether the document submitted by A is actually written by A or not. The accuracy of the judgment depends on the number of variables used, and the amount of the data compared, and the statistical techniques used. These techniques are embedded in software tools that can be used to detect plagiarism. The author mentions several software tools to detect plagiarism, such as CopyCatch, Glatt screening program, and WordCheck keyword software. There are also systems which detect plagiarism in software codes. Students copy code in various ways: changing the comments, renaming variables, changing data types, changing the structure of selection statements, etc. The idea for detection of code plagiarism is the same as that used for text, but the heuristics may be different. Consequences: Such systems may help in detecting plagiarism in exams conducted online. They may also help resolve disputes involving copyright infringement. One of the reasons why online assessment is not valued is because of the difficulty in certifying that the person who has taken the exam is the same person who has enrolled for the course. By using these techniques in conjunction with other techniques, some cases of impersonation can also be detected. This will allow reputed institutions and boards to administer their exams over the Internet. Thus the value of the degree/diploma that is available through on-line education will increase. Reference: P. Clough, Plagiarism in natural and programming languages: an overview of current tools and technologies, preprint, 2000. Available at: http://www.dcs.shef.ac.uk/~cloughie/plagiarism/HTML_Version/ |
| |
| Best viewed at Resolution 800* 600 Contact Us Copyright © 2000 NIIT Ltd. All trademarks acknowledged. Site last updated on Wednesday, January 10, 2001 |