Research Analysis

Voices
from the
Page

A data-driven study of award-winning essays from Vietnam's National Excellent Student Competition in Literature — Literary Essays & Social Essays — tracing patterns across six decades, 1961 to 2024.About this competition First held in 1961, this national competition is for high school students (grades 10–12). Participants must go through intense training and advance successively through school, district, provincial, and national rounds. A standard exam has two parts: Social Essay (8 points) – on societal issues and Literary Essay (12 points) – on literary topics. Award-winning students are nationally recognized and often receive direct university admission.

Phạm Hoàng Linh Anh & Vũ Duy Tùng
VinUniversity
Our team members previously participated in provincial teams for the National Excellent Student Examination, and share a deep love for literature — from the fairy tales of Thạch Sanh and Tấm Cám told by our mothers in childhood, to the gentle romanticism of Thạch Lam, the harsh realities in Nam Cao’s works, and the vibrant, life-filled poetry of Xuân Diệu.
63 Years of data
114 Essays analyzed
~650 A4 pages
77/37 Literary / Social
✦   ✦   ✦
"Trăm năm trong cõi người ta,
Chữ tài chữ mệnh khéo là ghét nhau."
Nguyễn Du — Truyện Kiều

In this project, we decided to analyze the winning essays from the National Competition for Excellent Students in Literature. Unlike other subjects, Literature has no fixed grading rubric; it demands deep critical thinking and advanced analytical skills hence the evaluation is often qualitative in nature. Therefore, we became curious: over the 63 years of this competition, what patterns and differences emerge when these top-scoring national essays are analyzed through a combination of both qualitative and quantitative methods?

This analysis not only helps students preparing for the competition gain deeper insights and better anticipate exam topics, but also serves as a resource for teachers and the academic community by revealing which authors and works are most frequently cited and what themes tend to appear again and again in award-winning essays. These patterns can help educators understand what the exam values over time, and offer a starting point for deeper discussions about how literary merit is assessed in the national contest.

We analyze high-scoring essays across two complementary dimensions: horizontal (across all years) and vertical (temporal trends).

Breadth

Horizontal Analysis

  • Most frequently cited authors, literary works, and themes in Literary Essays and Social Essays
  • Differences between first-, second-, and third-prize essays
Time

Vertical Analysis

  • Changes in cited authors, works, and literary traditions in Literary Essays over time
  • Shifts from textbook or Vietnamese sources to non-textbook or international references
  • Evolution of themes, perspectives, and social concerns in Social Essays over the years

Most Cited Authors,
Figures & Works

Hover over any name to see why it appears so frequently in award-winning literary essays.

Authors
Nam Cao Nam Cao A major realist writer whose works focus on peasants and marginalized intellectuals, highlighting their psychological struggles and social conditions. Widely regarded as one of the most important figures in modern Vietnamese literature.
✦ Why cited: His writing captures universal human conditions—dignity, alienation, and suffering—while reflecting historical realities, making it extremely versatile across essay topics.
Nam Cao201
Nguyễn Du Nguyễn Du A classical Vietnamese poet best known for The Tale of Kiều, a foundational work of Vietnamese literature. His writing reflects deep humanism and explores themes of fate, morality, and compassion.
✦ Why cited: As a central figure in the literary canon, his work offers rich philosophical depth and universal themes, allowing students to demonstrate cultural literacy.
Nguyễn Du196
Tố Hữu Tố Hữu A revolutionary poet whose works reflect Vietnam's socialist ideology and wartime experiences. His poetry combines political commitment with strong emotional expression.
✦ Why cited: His works are commonly taught and provide clear, accessible themes such as patriotism and collective identity, easy to integrate into essays on national history.
Tố Hữu131
Figures
Hồ Chí Minh Hồ Chí Minh – "Bác" The founding leader of modern Vietnam and also a writer and poet, with works like Prison Diary. His writing blends simplicity with strong moral and ideological messages.
✦ Why cited: Multiple of his works are in the curriculum. Citing "Bác" adds moral authority and strengthens arguments related to patriotism, ethics, and resilience.
Hồ Chí Minh – "Bác"113
Liên — Hai đứa trẻ Liên — Hai đứa trẻ The central character in Hai đứa trẻ, representing a quiet, observant perspective on rural life. Her character reflects subtle emotions and a deep awareness of surrounding stagnation.
✦ Why cited: Useful for analyzing mood, symbolism, and inner consciousness. Her perspective allows exploration of nostalgia, hope, and quiet despair.
Liên — Hai đứa trẻ51
Kim Trọng — Truyện Kiều Kim Trọng — Truyện Kiều Kiều's first love and the ideal Confucian scholar — loyal, moral, and devoted. He symbolizes enduring love within a turbulent society.
✦ Why cited: Often used to support arguments about fidelity, virtue, and moral responsibility. Familiar and adaptable for discussions on ethics and human relationships.
Kim Trọng — Truyện Kiều51
Works
Truyện Kiều Truyện Kiều The Tale of Kiều tells the tragic story of Thúy Kiều and explores themes of fate, sacrifice, and human dignity. It is considered the most important work in Vietnamese literature.
✦ Why cited: Taught through multiple excerpts across grades. Its thematic richness makes it applicable to a wide range of essay topics.
Truyện Kiều170
Chí Phèo Chí Phèo A realist short story about a peasant who is dehumanized by society. It critically portrays social injustice and loss of identity.
✦ Why cited: Provides clear and powerful evidence for themes like class oppression and morality. Its narrative clarity makes it easy to analyze.
Chí Phèo66
Tây Tiến Tây Tiến A poem about soldiers during the anti-French resistance, blending romanticism with realism. It captures both hardship and heroism.
✦ Why cited: Vivid imagery and emotional tone make it effective for patriotism and sacrifice themes. Its brevity allows efficient quotation under exam constraints.
Tây Tiến53
Overall Pattern: Across all categories, there is a strong tendency to cite works and figures from the official curriculum. This suggests that high-performing essays rely heavily on familiar, institutionalized sources that are both accessible and culturally valued.

Most Written
Theoretical Themes

Numbers reflect how many essays feature each theme. Multiple themes can appear within a single essay, as prompts typically require students to address more than one argumentative dimension.

The Writer and the Creative Process70 essays
Literature Reflects Life68 essays
Literary Reception65 essays
The Relationship between Form and Content65 essays
Functions of Literature50 essays
Authorial Style45 essays
Literature as the Art of Language27 essays
"The Writer and the Creative Process" is the most versatile theme — applicable broadly across texts, contexts, and genres. "Literature as the Art of Language" appears least frequently, reflecting exam design that favors thematic and contextual analysis over purely stylistic focus.
Self – Family – Community37 essays
Creativity – Innovation – Initiative24 essays
Positive Lifestyle – Cultural Conduct20 essays
Honesty – Self-respect19 essays
Environment – Nature Protection18 essays
Technology – Social Media – Information7 essays
"Self – Family – Community" dominates, reflecting Vietnam's collectivist and socialist orientation. "Technology – Social Media – Information" appears least frequently, suggesting exams still prioritize enduring moral values over rapidly evolving contemporary issues.

Essay Length by
Prize Level

Does length equal quality? The answer differs dramatically between literary and social essays.

Literary Essays
Average word count by prize level
Social Essays
Average word count by prize level
"More words" doesn't always mean a better grade.

For literary essays, lengths are consistently high across all prize levels — approximately 2,500–2,900 words. Depth, detailed textual analysis, and extensive evidence are essential regardless of ranking.

For social essays, a striking contrast emerges: First and Second prize winners average only ~1,500 words, while Third prize winners average ~2,850 words — nearly double.

Social essays show a sharp jump at the Third prize level (≈2,850 words), while First and Second prizes remain much shorter (≈1,500 words). This implies that longer essays in Social Essays do not necessarily guarantee higher quality; instead, top-performing essays tend to be more concise, focused, and selective in argumentation.

Trends Across
Six Decades

How cited authors, literary traditions, and intellectual frameworks evolved from 1961 to 2024.

1961
Cold War alignment: References to the Soviet Union (USSR) reflect strong ideological alignment of the era; citing a now non-existent state was contextually appropriate and politically meaningful.
1978
Ideology + identity: Dominance of Marxism–Leninism alongside Vietnamese folklore (Thạch Sanh, Thánh Gióng), showing a blend of political ideology and national cultural identity.
1980
Authority consolidation: Marked increase in citations of Hồ Chí Minh, signaling consolidation of political and moral authority in literary discourse.
1984
Global openness: Frequent references to foreign authors, indicating early openness to global literary influences beyond national borders.
1987
Post-Đổi Mới reappraisal: Reappearance of Tự Lực Văn Đoàn, suggesting a shift toward reevaluating pre-revolution literary movements previously marginalized.
1990
Historicization: Growth of interdisciplinary knowledge, with increased mention of early modernist figures (Nhất Linh, Khái Hưng, Hoàng Đạo), reflecting deeper literary historicization.
1992
Expanding frameworks: Emergence of religious references (God, the Bible), indicating expanding intellectual and cultural frameworks beyond socialist orthodoxy.
2008
Modern poetry: Continued diversification with mentions of poets such as Hoàng Cầm and Tế Hanh, reinforcing the prominence of modern Vietnamese poetry in the canon.
2023
Interdisciplinary fusion: Inclusion of references such as "Sonata Ánh Trăng" (Moonlight Sonata) reflects a modern trend of integrating literature with other art forms, signaling more flexible and creative analytical approaches.
Constant through the decades: Canonical works like Chí Phèo and The Tale of Kiều remain consistently cited across all periods, highlighting their enduring centrality. Students also show a persistent preference for poetry over prose — poetry's brevity enables efficient quotation under exam time constraints, while its emotional and symbolic flexibility allows it to be applied across a wider range of argumentative prompts.

Digitalization &
Preprocessing Steps

114 award-winning essays (77 literary, 37 social) — collected, digitized, and analyzed across 63 years.

Primary Sources

Since competition solutions are unofficial and non-public, only selected high-award essays have been preserved by educational organizations as teaching materials for future generations of competition participants. In this project, we keep all excerpts in the original language to preserve the integrity of the authors' intended meaning and to avoid translation inaccuracies caused by cultural and linguistic differences.

Source 3 — Image-Based
Because we crawled data from multiple Vietnamese sources—both text-based and image-based—with diverse noise levels and structures, we developed a custom digitalization pipeline to accurately extract meaningful text before performing statistical analysis.
Digitalization & Preprocessing Pipeline Diagram
01
Line Bounding Boxes

Tesseract detects word-level boxes per page. Invalid detections are filtered, then merged into line-level regions, sorted top-to-bottom, and padded to avoid character clipping at edges.

02
VietOCR Recognition

Each line region is cropped from the page image and passed to VietOCR for text recognition. Empty or invalid predictions are discarded to ensure clean output.

03
Concatenation

Recognized lines are joined with newlines per page, then appended in document order across all pages to produce the complete extracted text.

04
Noise Removal

Watermarks, page numbers, and embedded links are stripped from the OCR output — e.g. "Chuyên trang ôn Văn", "TaiLieuOnThi.Net", residual page numbers.

05
Structured Data

Each cleaned document is converted into structured JSON entries with four fields: year, problem, solution, award.

Scanned original document
Scanned image
Tesseract word-level bounding boxes
Tesseract detects word-level bboxes
Merged line-level bounding boxes for VietOCR
Merge into line-level bbox for VietOCR