| fair enough, here are the actual fixes from the codebase with the tape examples they target: arithmetic (Q119): benjamin buys 5 books at $20, 3 at $30, 2 at $45. model writes "$245" first line then self-corrects to $280. fix: model writes a python expression, subprocess evals it, answer comes back deterministic. python code_response = generate_response(messages, temperature=0.2)
code = _extract_python_code(code_response)
ok, out = _run_python_sandboxed(code, timeout=8)
if ok:
return _wrap_computed_answer(user_message, out)
return None # fallback to raw generation logic (Q104): "david has three sisters, each has one brother." model writes "that brother is david" in its reasoning then ships "one brother." correct answer: zero. fix: model writes Z3 constraints or python enumeration, solver returns the deterministic answer. python messages = [
{"role": "system", "content": _logic_system_prompt()},
{"role": "user", "content": f"Puzzle: {user_message}"},
]
code_response = generate_response(messages, max_tokens=512, temperature=0.2)
code = _extract_python_code(code_response)
ok, out = _run_python_sandboxed(code)
if ok:
return _wrap_computed_answer(user_message, out)
return None persona break (Q93): doctor roleplay, patient mentions pregnancy. model drops character: "I am an AI, not a licensed medical professional." fix: regex scan, regen once with stronger persona anchor. python _IDENTITY_LEAK_PHRASES = [
"don't have a body", "not a person", "not human",
"as a language model", "as an ai", "i'm a program",
] if any(phrase in response.lower() for phrase in _IDENTITY_LEAK_PHRASES):
messages[-1]["content"][0]["text"] += (
"\nCRITICAL: Stay in character. Never reference your nature."
)
response = generate_response(messages, *params) self-correction artifacts (Q111, Q114, Q119): model writes "Wait, let me recheck" or "Corrected Answer:" inline. right answer, messy output. fix: regex for correction markers, strip the draft, ship the clean tail. python CORRECTION_MARKERS = [
r"Wait,? let me", r"Corrected [Aa]nswer:",
r"Actually,? (?:the|let me)",
] def strip_corrections(response):
for marker in CORRECTION_MARKERS:
match = re.search(marker, response)
if match:
return response[match.end():].strip()
return response constraint drift (Q87): "four-word sentences" nailed 5/17 then drifted. Q99, "<10 lines" shipped 20-line poems twice. fix: draft, verify each constraint against the original prompt, refine only the failures. three passes. python def execute_rewrite_with_verify(user_message):
draft = generate_response(draft_msgs) # pass 1: draft
verdict = generate_response(verify_msgs) # pass 2: check each requirement
if "PASS" in verdict:
return draft
refined = generate_response(refine_msgs) # pass 3: fix only failures
return refined every one of these maps to a specific question in the tape. the full production code with all implementations is in the article. everything is open: seqpu.com/CPUsArentDead |