Grade Inflation or Compression?

figure1Back in December, the Dean of Undergraduate Education at Harvard was quoted from a meeting of the Faculty of Arts & Sciences saying, “The median grade in Harvard College is indeed an A-. The most frequently awarded grade in Harvard College is actually a straight A.” This statistic was highly shocking to the general public (or at least the general media). Yale itself moved last year to address the problem when it turned out that 62% of grades given to undergraduates in a two-year period were A-minuses. Just a few weeks ago, the Teaching Center at Yale hosted a day-long seminar entitled, Are All Yale Students ‘A’ Students? A Forum on Grading.” Most recentlyRebecca Schuman published a piece on grading at Slate entitled, “Confessions of a Grade Inflator.” However, rather than only seeing what has happened as the inflation of individual students’ grades, we should also see it—from the instructor’s perspective—as a compressing of the grading scale itself. Doing so reveals multiple repercussions for both students and faculty that the individualized, student-centered notion of “grade inflation” misses. We need to keep in mind that grade inflation or compression does not just benefit unworthy students; it actually has negative effects on both students and faculty, which should be the real causes for wanting to address the problem.

Aside from the general outrage people feel for undeserving students receiving “high” grades, the practical effect of the grading problem has been to compress the grading scale. The traditional grading scale had 12 degrees (not counting “A+”) including three each at the B, C, and D levels. Generally speaking, the most common understanding was that an average or merely sufficient performance earned a “C,” while the B-level was reserved for those who performed better than average. And, of course, the A-level was reserved for those who performed exceptionally.

Now, however, we have a situation (seemingly in many schools) where 50-60% of grades fall within only the top two degrees and almost all of the rest fall within the next two degrees. This means that the grading scale has been compressed by at least two-thirds for many faculty (see graphic below for one take on the relationship between traditional and compressed grade scales). The traditional scale allowed faculty to make subtle distinctions in their evaluations of student performance, making grades more meaningful in what they represent and also making the process of final evaluation or grading more meaningful for faculty.

Grade-scale compression

Compressed grading scale with traditional scale equivalents.

This grading problem has significantly changed the relationship between grades and both students’ performance and the faculty’s process of evaluation. Of course it has had a distorting effect on students’ ability to subtly self-evaluate their own work, but I imagine the final evaluation process of a course seems not to be a very important aspect of teaching a course. I am not arguing that the evaluation process should be central to what faculty want to get out of teaching a course. However, because the new grading scale is so compressed, faculty have lost the opportunity to meaningfully evaluate students, which, even if not central, is still a relatively important part of teaching a course. For many, I suspect, grade compression has also alienated faculty from the evaluation process.

But the loss of subtlety to the evaluation process has not only fundamentally changed the way faculty approach an important part of teaching, it also has negatively affected two certain groups of students more than others, i.e., those at the upper and lower echelons of the grading scale. In her article, Schuman wrote, “Of my current 33 students, 20 are getting either A’s or A-minuses.” For those at the lower end of the scale, their “B+” doesn’t seem all that far off from the “A” that the best student in the class received. Hence, it can make it impossible for a student to get a genuine sense of their own progress (or lack thereof).

This process also works in reverse. When half of students are getting a straight “A,” the value and worth of that “A” can vary widely, so, for those at the very top, the top grade of “A” is failing to deliver the distinction that their performance deserved. It makes it harder for those looking at comparisons “on paper” to figure out which students performed exceptionally as opposed to those who performed well enough to get better than the A-minus the majority were getting. So the effects of grade compression on students overall has been to weaken the ability for students to have documented proof of exceptional performance and to make valuable self-evaluation harder for students at the bottom of the grading scale (i.e, the B to B-minus range).

Talking about the grading problem in terms of “grade inflation” foregrounds the end-result of higher grades for undeserving students. However, if we frame the problem in terms of “grade compression,” we can begin to think about its effects on faculty and the non-majority of students at the top and the bottom who are affected the most.

22 responses

  1. Michael, I’m sure we’d all love to hear your solutions! Maybe another post soon. But meanwhile, here’s my question (as a fortunate outsider who doesn’t yet have to deal with much of this) — what’s the role of student evaluations (i.e. students evaluating the teacher/teaching), both formal and informal (rate my professor), on this issue of grading?

    It’s hard for me to see where the pressure towards inflation is coming from. Are student evals one answer? Or is it more to do with competition between institutions? Do US universities measure their overall grade point averages against each other? Do people take those measurements seriously?

    I can see how increasing student fees/debts would relate to both these issues. Students might want better grades for their increased investment, and punish teachers who don’t give them. Or they might be more discerning at the application stage, and prefer institutions with higher over-all GPAs.

    • Rebecca Schuman’s article discusses a lot of plausible culprits, including the draft during the Vietnam War. My hunch, though, is that the biggest problem today is that tenured full-time faculty are now (as of 2011) just 17% of American educators. Almost everybody else is keeping student evaluations at least dimly in mind when grading, since there’s no other widely used measure of teaching effectiveness in American higher education.

      I don’t think it has much to do with cross-institution comparisons. When schools make those comparisons, they generally look at things like graduation rates (which might provide incentive to inflate Ds, but not necessarily Bs), law school admissions, job placement rates, etc., which don’t have a really clear connection to average grades. But it might have something to do with the generally explosive growth of jobs for college administrators whose sole purpose is to keep customers happy; the overall atmosphere of the institution doesn’t provide much moral support for demanding professors.

      • I think you’re spot on again Jonathan about faculty contingency playing a role. That said, at Yale at least, students have to first submit their course evals before they can see their grade for the course. That gets around the problem to some extent but not completely.

        • But presumably they have already been graded on quizzes, or papers, or midterms, or at least *something* to give them some idea of how they are doing?

          Mt first year in the US, I was shocked by how big a ‘swing’ it would have taken on the final exam, even when it was a significant chunk of the garage, to alter the final grade by even a + or a minus.

            • Indeed – but students are evaluating you after they’ve seen *something* of your grading. And so in some ways they are anticipating their grade.

              That said, the one thing I found interesting in evalutions is when students are asked to rate their own preparation/motivation for the course. Almost invariably the answer is somewhere around a 4 out of 5. That’s been the same across institutions that have very different evaluating criteria.

  2. My own preferred grading curve in large survey courses yielded an average grade of B-. I’ve handed out F grades when appropriate.

    But this has caused a host of problems. Although my grading curve was typical in that department, it was clearly not typical at that university. My students felt, with some justification, that they were being penalized for taking my course. And I felt under pressure to raise their grades or have enrollment in my classes drop, which, my not having tenure, would have been a bad thing.

    My own experience has been that my students fell into four groups, sorted by success on exams and papers. I’ve often wished for a proper 4-point scale. However, to use the existing 4.0 scale common in many universities would have exacerbated the problems because of how those values are interpreted.

    • A commitment to high standards for grades really does have to be made at the department level at least; I don’t think there’s anything to be gained (for anyone) when an individual instructor tries to ignore the conventions of the rest of a school. If grades don’t at least roughly compare across different sections of the same course, they’re pretty much useless except as a fear tactic.

      • Right. I pulled a few sentences on this from the final paragraph just before posting it. It’s an institutional problem in the broadest sense. For me, making myself feel better about grading is not worth handicapping a student and making them less competitive. I didn’t address potential solutions, but I tend to think that there’s no going back to a traditional grading-scale. So I would be in favor of the development of new forms of evaluation and final assessment that go beyond just a symbolic letter, the value of which no one really knows.

  3. I don’t know what’s going on in the rarified atmosphere of the Ivies, but here is a thought from the provinces… We probably assign more “gimme” points now than in the past. Those relatively easy points boost the overall grade.

    In the old days, we assigned a midterm, final, and a “term paper” with, perhaps, a small percentage for “participation”. The exams, esp. at the junior/senior level, were essay. Students had to perform consistently at a high level in order to obtain the top grades. One average assignment could “blow” the grade.

    Now, the education experts tell us, we should provide more feedback and more frequent feedback. Some of those extra assignments still require analysis and good writing skills (ex: reflection papers). Many assignments, however, only straight memorization skills or moderate diligence to complete (ex Map quizzes, worksheets, etc). These are the “gimme” points. Depending on the weights in the overall grades, a student with “B” exam grades can move up to the B+ or A- range. Sometimes exams/research papers are no more than half or 2/3 of the overall grade.

    One more comment: As someone who has sat on tenure committees, I can tell you that, yes, student evaluations count but not as much as you might think (unless your committee has a scoring rubric it has to follow). A careful tenure committee can spot the anomaly of high student evaluations when the other materials indicate an average or poor teacher.

    • Another thought: How many of those A’s at the Ivies are really scores awarded by TA’s on behalf of the faculty?

      • This is pure anecdote, but in my limited, non-Ivy experience, I’ve generally found TAs inclined to be more rigorous than professors, especially untenured professors. TAs often have less to fear from poor student evaluations than untenured professors do, and they tend to transfer the high standards they have for their own work to the standards they have for other students. In almost every case I remember seeing, professors who revised TA grades in any way revised them upward.

        • Interesting. I stand corrected. I was never a TA for a faculty member. My program threw me in the deep end by giving me a course after I passed my prelims.

        • The same goes for my experience at Yale. How much grading Teaching Fellows do here depends on the professor and the size of the class. Some professors have TF’s do all the grading including the final grades, which makes sense if they are the ones grading the exams and papers. Others delegate the grading of papers and exams but then determine the final grades themselves. I’m sure it’s the same at non-Ivies of similar size.

        • That was my experience. I even had a supervising professor tell me to give a plagiarizing student a “C”–said student had plagiarized the preface to John Demos’ Unredeemed Captive, including the acknowledgements section.

  4. Our department has a stricter grading scale than the rest of the university (7-pt. vs. 10-pt. scale), so students already know that they are being held to a higher standard.

    Honestly, though, most of the students who don’t do well in my classes are the ones who simply don’t do the work. I even show students data at the beginning of each course: every student who has not passed my course has either plagiarized, had excessive absences, not turned in work, not purchased the required texts, or has not studied their notes for in-class quizzes/exams. (Usually, it’s a combination.) That warning doesn’t help some students, though.

  5. The UK system has been interesting because although we’re still on a 100-point scale, it’s rare to get above a 70 and almost impossible to get an 80. The guidelines are also different; over an 80 indicates that the student has submitted an essay that is publishable. There is also, however, the same tendency to compress. Most marks don’t fall below a 40, so in effect the scale stretches from 40 to 70, with 60-69 signalling very good work. Averages usually end up in the 63 range.

    I’m pasting our marking criteria here because I’ve found the divergence between the two systems so interesting:

    1st A successful combination of direct focus on the question and discussion of the wider issues raised and implied by the question; secure grasp of context and fluidity and flair in presenting an argument. A mark of 80% or over signifies exceptional work judged against all criteria; near-flawless performance of task with significant originality of approach which challenges existing historiography and pushes the boundaries of the course material, and near-faultless presentation. A mark of 75-79% signifies outstanding work; a task completed with originality and attention to detail, exceptional research and near-flawless argument and presentation. A mark of 70-74% signifies excellent work judged by all criteria; some originality of approach and secure argument supported by significant bibliographical research.

    2.1 (60-69) Directly focuses on the question and has some awareness of the wider issues raised; linking central argument with context in places.

    2.2 (50-59) Recognises issues raised by question, but does not maintain its focus and may drift into narrative or neglect the context.

    3rd (40-49) Misses some key points raised by question; substitutes generalisation for accurate focus on the problem set.

    Fail Neglect of the question, or approach so confused as to be unintelligible. A mark of 30-39 acknowledges that there is some material relevant to the subject, but that the work is not structured sufficiently, or offers unsubstantiated opinion in place of evidence-based argument. A mark of 29 and below will be given if the essay contains major errors or omissions, or substantially irrelevant material, or uncritically relies on its material, verging at worst on plagiarism. Expression may be in part unintelligible and sources are unacknowledged.

    • I tend to find that the UK and US grading scales are *similar*, with a couple of caveats. One is that what would be a good 2.i (67 or more) in the UK is considered to be work that is A standard in the US.

      The other is that, as has been noted above, there aren’t as many ‘gimme’ points in a UK examination. (Hence the rigorous focus on answering the question in the grading criteria above), Thus there’s more of an ability to fudge a little on, say, participation, to reward the student who made class more productive, even if they weren’t performing as well at writing.

      I’d argue that’s a good thing – it allows college professors to reward a greater level of skills than the UK system does. But it comes at the risk of not finishing skills as much as they might otherwise be.

      In general, though, my rule of thumb is to think of what I’d give a UK student, and add about 25 to the overall grade. In the US, the scale of acceptable work is essentially 75-100; in the UK, it’s 50-75. I never looked at a paper given a 3rd class without thinking that really, it deserved to be a failure, but had enough relevance to the question through happenstance to justify a pass on the criteria given. The downside is the ability to reward truly excellent work in the US.

  6. The average for most of my classes is somewhere around 82 or 83. We do not have plus or minus grades at UNC Charlotte so it is just an A, B, etc. I would say I grade similar to the way my colleagues likely did in the 1970s. Most of my students get Bs or Cs, a couple get As and Ds, and one or two usually fail each semester. I don’t think I’ve ever had half my students get As, there is usually a fairly even distribution outside of the majority right in the middle. I feel a little pressure from individual students to “bump them up” every semester (which I usually do if they participated and are within 0.5 points), but little pressure from my department or institution. And on the whole, most of my students seem to be fairly happy with a B or C. Those who do worse, as Mark notes above, generally understand why.

    • My institution is the same – only whole grades for final grades. And yes, I find I have a similar low B average. I tend to give a higher proportion of Bs in my ULs meanwhile I wind with a pretty even distribution of B/Cs in the survey (with a couple As, some Ds, and always a couple Fs).

      On the one hand, it’s insanely frustrating including a 70 and 79 into the same grade bracket. And yet I have found it oddly freeing — especially when it comes to the 88-91 bracket. I have colleagues who only give As to a 92.5+ and I have followed suit. This means that an A is an A. The B becomes the largest grade bracket (79.5-92). I also do some bumping and sometimes decide at the end of the term, when I see the overall grade distribution, to expand the A bracket to include 90-91 (something I’m more likely to do in ULs). And yes, most students are pretty happy with a B or C. The problem are the high B students who think they deserve and A, but you point out that 92.5 cut off and remind them that an A is for exceptional work (though we do get complaints on evals that that is an unreasonably high cutoff).

      • “I have colleagues who only give As to a 92.5+ and I have followed suit.”

        Don’t you think it’s a bit dubious for a professor to decide this on his or her own authority? Shouldn’t the number be the same college-wide? What gives a particular professor the right to change the criteria?

  7. There are often slight variations to grading scales across a faculty. The most important thing is that you are upfront with your students about the scale — that it is established from day 1 on the syllabus so that students know how final grades will be both assessed and assigned. Not all courses are designed the same way or assessed the same way.

Engage

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: