AI-assisted Exam Analysis

Feb 2025 – Jun 2025 · HU University of Applied Sciences Utrecht

Overview

A tool to support exam quality review at HU University. Lecturers export a graduation assessment form as structured JSON, upload it, and the system uses a locally running LLM to evaluate each feedback sheet: grading tone, assigning a quality score, and comparing it against the teacher's own grade. The goal was to strengthen the "Check" phase of the PDCA cycle without relying on external APIs or sending student data to third-party services.

Approach

I built the entire model service in Python. On first launch it downloads DeepSeek-R1-Distill-Llama-8B from HuggingFace, converts it to GGUF via llama.cpp, and starts a WebSocket server (Litestar + uvicorn). When the frontend sends a JSON payload, the service splits it into individual assessment sheets, filters out non-gradeable ones (instructions, rubric, session form), and groups each sheet's rows into a single prompt. An optional rubric-injection step matches sheet criteria codes like "1a" or "2d" and appends the relevant rubric rows to each prompt for more grounded scoring. Each sheet is dispatched as an async task; results are collected, cleaned, and then merged in a second summarisation query that produces the final tone, grade comparison, and a summary of the model's reasoning. Progress updates stream back to the frontend over the same WebSocket connection throughout.

Highlights

The service is fully self-contained: model download, GGUF conversion, and server startup all happen in one main.py entry point, so there is no manual setup and no dependency on external inference APIs.
Rubric injection was a tricky feature to get right. The code regex-matches criteria codes from each sheet against the rubric sheet and appends only the matching rows, keeping prompt sizes manageable while giving the model the context it actually needs.
Blocking llama-cpp inference runs inside asyncio.to_thread, keeping the event loop free to handle WebSocket messages and send progress updates while the model is generating.

Outcome

The service ran fully locally, which satisfied the data-privacy requirement without any extra policy work. A full analysis of a typical exam form took around three minutes on CPU-only hardware. True concurrency was not achievable with one model instance, so requests are effectively serialised; that was an accepted trade-off given the hardware. If I had more time I would have added hash-based deduplication so re-uploading the same form is caught before it hits the model.

Stack

PythonWebSocketDockerGGUFllama.cppScrumGitPostgreSQL

Role

Backend Developer · Scrum Master

Links

Source code is private, but certain code can be shared on request.