In a modest classroom tucked away in southern France, a seasoned philosophy teacher named Stéphane Bonnery faced an unusual challenge. Before him lay a handwritten baccalauréat (final high school exam) essay. There was only one twist: it wasn’t written by a student, but by ChatGPT—OpenAI’s powerful language model. Bonnery, a veteran examiner used to scrutinizing adolescent logic and prose, approached the task with a mix of curiosity and skepticism. Could an artificial intelligence really pass the famously challenging French philosophy exam?
France’s baccalauréat philosophy exam is not just a test; it’s a rite of passage. Known for its abstract prompts and rigorous demands for logic, critical thinking, and structured argumentation, it intimidates even the brightest minds. For those unfamiliar, the exam typically lasts four hours and asks students to respond to broad prompts like “Is desire the mark of our imperfection?” or “Does culture make us more human?” Crafting a compelling, coherent, and deeply reasoned answer is no small feat—even for a human.
Bonnery decided to test ChatGPT with this daunting challenge—to evaluate how well artificial intelligence could mimic a high school graduate’s reasoning, philosophical awareness, and writing skills. The results, while fascinating, raise pressing questions about the future of education and machine intelligence. Are we entering an age where machines can write essays indistinguishable from human submissions? Or are there fundamental elements of human thinking that AI still cannot grasp?
Summary of ChatGPT’s baccalauréat philosophy exam performance
| Subject | Philosophy (French Baccalauréat) |
| Exam Type | Handwritten essay on classic philosophical prompt |
| Evaluator | Stéphane Bonnery, Philosophy Teacher & Examiner |
| Score | 11 out of 20 |
| Notable Strengths | Linguistic fluidity, grammatical accuracy, linear logic |
| Major Weaknesses | Lack of real-world contextual reasoning, depth of philosophical thinking |
Why this unusual test is gaining attention
In many countries, using AI to generate exam essays might be dismissed as novelty. But in France, where philosophy is considered an essential component of national identity, the implications are deeply resonant. Bonnery’s experiment isn’t merely academic—it’s a cultural litmus test. It invites a broader dialogue on what constitutes original thinking and whether machines can emulate it authentically.
Stéphane Bonnery submitted ChatGPT’s handwritten essay without disclosing its source. He simply presented it as another batch of anonymous student submissions. His response upon discovering its performance? A perplexed nod to both achievement and failure: the essay scored an 11 out of 20—not a failing grade, but not strong either. This result places it squarely within the “average” zone—a stark reminder that AI can write, but perhaps not yet think like us.
“It was clear, well-written, and followed logical structure. But it lacked nuance. It lacked contradiction—something that makes us human.”
— Stéphane Bonnery, Philosophy Teacher & Baccalaureate Examiner
How the AI-generated essay was crafted and presented
To add realism to the test, Bonnery employed another educator to handwrite ChatGPT’s answer. This step was crucial: submitting a typed document would have been an immediate red flag. The prompt given to ChatGPT was identical to those used in the 2023 French Baccalauréat. Its performance would hinge not only on content, but on how convincingly human it could appear when stripped of digital fonts and predictive spell checkers.
ChatGPT’s response was stylistically impeccable. It displayed sentence fluency, vocabulary breadth, and tidy transitions. However, the content—though coherent—betrayed its artificial origin. According to Bonnery, it read like a series of connected Wikipedia entries: encyclopedic, factually arranged, but devoid of the soul-searching that defines good philosophy.
What ChatGPT got right in the philosophy essay
AI’s performance wasn’t without merit. Bonnery acknowledged that the essay would absolutely be taken seriously in a real examination setting. It might have even stood out as better than some weaker human submissions. There were clear strengths:
- Grammatical precision – Zero spelling or syntax issues.
- Clear logic – Progression of ideas followed a structured path.
- Appropriate vocabulary – Use of technical philosophy terms and correct definitions.
These traits reflect how well-training data and algorithmic design can replicate form. For example, ChatGPT referenced Immanuel Kant and Plato accurately—something that many students struggle with. Yet as Bonnery noted, knowledge isn’t the only currency in philosophy. The true test is interpretation, not memorization.
Where ChatGPT fell short in philosophical reasoning
Despite the superficial fluency, Bonnery’s evaluation zeroed in on what was missing beneath the polish. His core criticism was the AI’s inability to engage in real philosophical argumentation. The essay didn’t wrestle with the inherent contradictions within the prompt, nor did it offer examples grounded in lived experience. It refrained from personal consideration, which, ironically, is what the French system prizes.
Bonnery described the AI’s performance as “assembled from fragments,” comparing it to a collage of ideas rather than a thesis-driven exploration. It lacked reflection. It lacked synthesis. It lacked, in essence, the mental messiness that marks genuine philosophical inquiry.
“ChatGPT reasons cleanly, almost too cleanly. But philosophy is not accounting. It demands struggle, exception, uncertainty.”
— Stéphane Bonnery
Who wins and who loses with AI-written student essays?
| Winners | Losers |
|---|---|
| Students who use AI as a study tool or writing assistant | Teachers assessing critical thinking and originality |
| Developers building educational AI models | Admissions and evaluation systems relying on essays |
| Average-performing students using AI for writing tips | Curriculum frameworks based on creativity and introspection |
Why this experiment matters for the future of education
This test is not an isolated curiosity—it’s a glimpse into the rapidly changing educational landscape. Using AI to produce academic work is no longer futuristic; it is current reality. But Bonnery’s results show something equally important: while machines can execute syntax, they cannot yet emulate soul.
The real lesson lies in rethinking how essays are evaluated. If fluency and logical cohesion can be outsourcable to machines, then higher-order thinking—self-reflection, contradiction, moral reasoning—becomes more valuable than ever before. Teachers, then, must evolve into evaluators of human thought, not just of form. As education adapts, this line between mechanical writing and authentic thinking will define our pedagogical future.
What schools, teachers, and parents should consider next
What can institutions do in the wake of this revelation? First, educators need better tools to identify machine-generated content. Second, evaluation methods may require redesign: oral defenses, spontaneous assignments, or real-time debates may take center stage. Thirdly, schools should educate students on *how* to use AI thoughtfully—training them not just in consumption, but in discernment.
Much like calculators changed math education, language models may change the way we write and think. But just as calculators didn’t eliminate the need to understand arithmetic, AI won’t replace deep thought—it only raises the bar. The words may come from a machine, but the ideas must still come from the mind.
Frequently asked questions about AI in academic evaluations
Can ChatGPT pass high school-level exams?
ChatGPT can pass certain exams at a basic level, especially those that primarily test grammar and logical structure. However, it struggles with deep reasoning, personal reflection, and nuance, particularly in subjects like philosophy.
Does French education consider AI-written papers cheating?
Yes. Submitting AI-written work as a student’s own would be considered academic dishonesty. That said, using AI as a study aid or brainstorming tool is not explicitly banned—yet.
What does an average score of 11/20 mean in the French Baccalauréat?
It means the essay is average but passes the minimum quality bar. It suggests competence in structure and clarity, but deficiencies in insight and critical reasoning.
How did the teacher ensure the AI’s essay was anonymous?
He had the essay handwritten by a colleague, eliminating traces of digital authorship and making it indistinguishable from student submissions.
What limitations does ChatGPT still face in academia?
Major limitations include lack of real-world context, inability to reflect or contradict itself meaningfully, and poor performance on highly subjective tasks.
Could AI ever match human philosophical reasoning?
Currently, no. Philosophical reasoning requires internal contradictions, emotional resonance, and subjective interpretation—things AI cannot yet replicate authentically.
How can educators adapt to AI in classrooms?
Educators can prioritize oral exams, in-class assessments, spontaneous writing tasks, and collaboration to gauge true student understanding and deter AI misuse.
Is using GPT to study ethical?
Using AI as a tool for learning, idea generation, or editing can be ethical if properly cited or disclosed. Problems arise when students present generated content as their own thinking.