AI Agent Course - Build a Language‑Learning Agent with OpenAI, LangGraph, Ollama & MCP
One Sentence Summary
A hands-on guide to building a Langraph-based AI agent that learns languages, cleans data, translates, and creates Anki flashcards automatically.
Main Points
Langraph React agent overview: multi-step reasoning with tools and memory for language tasks.
Data cleaning workflow: lemmatization, Zip frequency, and JSON-ready final word lists.
Open-source vs proprietary models: compare GPT-4/Claude-style models with Olama-backed local LLMs.
Custom tools and prompts: design tools with dock strings and system prompts for reliable tool use.
Translation toolchain: integrate a translation model to convert word lists into target languages.
MCP integration: connect to external servers (e.g., Clanky) to generate Anki flashcards.
Debugging and observability: PyCharm AI Agents Debugger visualizes agent traces and graph structure.
Translation-enabled workflows: chain tasks (random words, translate, then flashcards) via MCP.
Security considerations: prompt injections, data leakage, and multi-agent sandboxing strategies.
Practical workflow: end-to-end from data preparation to flashcards and deployment readiness.
Takeaways
Start with clean, language-aware word lists before building agent workflows to avoid quality issues.
Use lemmatization and frequency-based filtering to reduce vocabulary to core, teachable terms.
Test models locally (Olama) to save costs and improve privacy, while benchmarking against paid models.
When adding tools, document inputs/outputs precisely and embed tool descriptions in the system prompt.
Plan for security: consider multi-agent architectures and sandboxing to mitigate prompt-injection risks.
Summary
This tutorial walks through building a language-learning AI agent from scratch that can generate vocabulary lists, optionally filter by difficulty, translate them with a dedicated translation model, and finally create Anki flashcards via an MCP server. The core stack is Python + LangGraph (ReAct agent) with both proprietary LLMs (OpenAI GPT-4o) and local open-source models via Ollama (reasoning: Qwen 3 8B, translation: Llama 3.2 3B). Data prep is done with spaCy lemmatization + wordfreq Zipf frequency binning and exported as JSON for agent consumption.
wordfreq.zipf_frequency for frequency + difficulty binning
Output format:
cleaned vocab stored as JSON for agent consumption
Flashcards:
Anki + AnkiConnect
Clanky MCP server
Agent state fields added over time
messages
source_language
number_of_words
word_difficulty
target_language
Tooling contracts
LangGraph tools require:
strict type hints
strong docstrings (agent uses these to decide tool usage)
stable argument names/types
Pro Tips
Use UV (or similarly fast installers) to speed dependency iteration.
In spaCy, disable unused pipeline components (parser, ner, textcat) to speed up lemmatization.
When forcing structured output from an LLM, always:
demand strict JSON
implement a fallback parser (regex brace extraction)
validate completeness (all inputs translated)
Put “valid values” directly in tool docstrings (e.g., beginner/intermediate/advanced) so the agent can map synonyms like “basic/average/hard”.
Use PyCharm’s AI Playground to compare model behavior (verbosity, correctness) before swapping into the agent runtime.
Prefer a specialized model for translation (Llama 3.2 3B) instead of a reasoning model to reduce hallucinations and cost.
For MCP workflows, provide few-shot examples with explicit tool order; ReAct agents can otherwise mis-order steps.
Keep the agent’s “working memory” explicit by adding state fields whenever the tool interface expands.
Potential Limitations/Warnings
Crowdsourced dataset quality varies wildly by language (missing data, over-generated inflections, typos). Word count sanity checks are necessary but not sufficient.
Lemmatization out of context can be wrong (spaCy errors on rare/proper nouns); expect occasional bad lemmas.
Filtering on zipf_frequency == 0 can be too aggressive for some languages (e.g., Polish ended up far below expected vocab size).
ReAct nondeterminism: the agent may call tools in the wrong way (e.g., using placeholder inputs) and then self-correct. Add validations/guardrails.
MCP introduces real security risk:
Agents can be vulnerable to prompt injection
“lethal trifecta” risk when an agent has sensitive data access + untrusted inputs + outbound capability
Never commit .env or API keys; rotate keys immediately if leaked.
Recommended Follow-Up Resources
Hugging Face Agents Course (ReAct + broader agent patterns; referenced in transcript)
LangGraph tutorials on structured workflows + productionization testing
spaCy language models + lemmatization docs (for model selection + pipeline tuning)
Below is a reproducible project blueprint (folder layout + files + commands) that matches what’s described in the transcript. I’m keeping it execution-first, with exact file paths, tool signatures, and CLI commands where the transcript provided them. Where the transcript didn’t give exact literals (e.g., the exact repo URL or exact Ollama model tag spelling), I put it under Assumption: Standard/Typical Setup.
Optional MCP integration for Anki flashcards via Clanky + AnkiConnect
Detailed Step-by-Step Breakdown
0) Project folder layout (final state)
Create/maintain this structure:
language_learning_agent/
main.py
tools.py
.env # secrets (DO NOT COMMIT)
raw_word_lists/ # copied from the GitHub dataset repo
English.csv # naming depends on repo; may be language-named files
Spanish.csv
German.csv
...
data/ # cleaned JSON outputs
English/
wordlist_clean.json
Spanish/
wordlist_clean.json
German/
wordlist_clean.json
notebooks/
clean_wordlists.ipynb
1) Create the PyCharm project + virtualenv
In PyCharm → New Project
Name: language learning agent
Interpreter: Project venv
Choose UV interpreter if available (as in transcript)
Assumption: Standard/Typical Setup
If you don’t have UV, use regular venv. Nothing else changes.
2) Clone dataset repo + copy into raw_word_lists/
In PyCharm Terminal:
cd ..
git clone <DATASET_REPO_URL>
cd language_learning_agent
mkdir -p raw_word_lists
Copy the dataset contents into your project’s raw_word_lists/.
The transcript describes using cp -r with curly-brace expansion to copy multiple language directories at once. Since the exact dataset layout/paths weren’t fully specified, here are two common patterns:
fallback: regex extract content between first { and last }
validate all input words translated
Here is a complete tools.py that matches the described behavior:
import json
import os
import random
import re
from typing import Dict, List
from langchain_core.tools import tool
from langchain_ollama import ChatOllama
# Translation model (local)
# Assumption: Standard/Typical Setup:
# - Ollama model tag is "llama3.2:3b" (tag spelling may vary; use `ollama list` to confirm).
TRANSLATION_LLM = ChatOllama(model="llama3.2:3b")
def _wordlist_path(language: str) -> str:
"""Build the path to the cleaned wordlist JSON for a given language."""
return os.path.join("data", language, "wordlist_clean.json")
@tool
def get_n_random_words(language: str, n: int) -> List[str]:
"""
Return a list of n random words in the specified language.
Args:
language: Language folder name under ./data (e.g., "English", "Spanish", "German").
n: Number of random words to return.
Returns:
A list of n random words (strings) from the cleaned wordlist JSON.
"""
path = _wordlist_path(language)
with open(path, "r", encoding="utf-8") as f:
data = json.load(f)
# The JSON is expected to contain entries with at least "word" and "word_difficulty".
# We will sample from all entries.
all_words = [row["word"] for row in data.values()] if isinstance(data, dict) else [row["word"] for row in data]
return random.sample(all_words, k=min(n, len(all_words)))
@tool
def get_n_random_words_by_difficulty_level(language: str, n: int, difficulty_level: str) -> List[str]:
"""
Return a list of n random words in the specified language filtered by difficulty level.
Valid difficulty_level values are: "beginner", "intermediate", "advanced".
Args:
language: Language folder name under ./data (e.g., "English", "Spanish", "German").
n: Number of random words to return.
difficulty_level: Difficulty filter. Must be one of "beginner", "intermediate", "advanced".
Returns:
A list of n random words (strings) from the cleaned wordlist JSON filtered by difficulty_level.
"""
difficulty_level = difficulty_level.strip().lower()
if difficulty_level not in {"beginner", "intermediate", "advanced"}:
raise ValueError('difficulty_level must be one of: "beginner", "intermediate", "advanced".')
path = _wordlist_path(language)
with open(path, "r", encoding="utf-8") as f:
data = json.load(f)
rows = list(data.values()) if isinstance(data, dict) else list(data)
filtered = [r["word"] for r in rows if str(r.get("word_difficulty", "")).lower() == difficulty_level]
if not filtered:
return []
return random.sample(filtered, k=min(n, len(filtered)))
@tool
def translate_words(random_words: List[str], source_language: str, target_language: str) -> Dict:
"""
Translate a list of words from a source language to a target language using a translation LLM.
The tool attempts to force the LLM to output strict JSON. If the LLM outputs extra text,
the tool extracts the first JSON object found and parses it.
Args:
random_words: List of words to translate.
source_language: Language of the input words.
target_language: Language to translate the words into.
Returns:
A dict of the form:
{"translations": [{"source": "<word>", "target": "<translation>"}, ...]}
"""
prompt = (
"You are a precise translation engine.\n"
f"You will be given a list of words to translate from {source_language} to {target_language}.\n\n"
"Only return valid JSON with exactly this structure:\n"
'{"translations": [{"source": "<SOURCE_WORD>", "target": "<TARGET_WORD>"}]}\n'
"No explanations, no extra fields, no markdown.\n\n"
f"Words: {random_words}\n"
)
raw = TRANSLATION_LLM.invoke(prompt).content
# First attempt: direct JSON parse
try:
parsed = json.loads(raw)
except Exception:
# Fallback: extract content between the first '{' and last '}'.
match = re.search(r"\{.*\}", raw, flags=re.DOTALL)
if not match:
return {"translations": []}
parsed = json.loads(match.group(0))
translation_list = parsed.get("translations", []) if isinstance(parsed, dict) else []
if not isinstance(translation_list, list):
translation_list = []
# Simplify and validate
simplified = {}
for item in translation_list:
if not isinstance(item, dict):
continue
src = item.get("source")
tgt = item.get("target")
if isinstance(src, str) and isinstance(tgt, str):
simplified[src] = tgt
# Ensure every word was translated
final_translations = []
for w in random_words:
if w in simplified:
final_translations.append({"source": w, "target": simplified[w]})
return {"translations": final_translations}
6) Implement agent runner: main.py
This file creates:
AgentState (messages + extracted fields)
assistant function + system prompt (includes tool descriptions + examples)
LangGraph graph: START → assistant ↔ tools
async invocation
Here is a complete main.py blueprint aligned to the transcript’s structure:
import asyncio
import os
from typing import Optional, TypedDict, List, Dict, Any
from dotenv import load_dotenv
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ollama import ChatOllama
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, START
from langgraph.prebuilt import ToolNode, tools_condition
from tools import (
get_n_random_words,
get_n_random_words_by_difficulty_level,
translate_words,
)
# Optional MCP (only if you enable Anki/Clanky integration)
# from langchain_mcp_adapters import MultiServerMCPClient
load_dotenv()
class AgentState(TypedDict, total=False):
messages: List[Any]
source_language: Optional[str]
number_of_words: Optional[int]
word_difficulty: Optional[str]
target_language: Optional[str]
# Choose one:
USE_OPENAI = False # set True to use GPT-4o
OPENAI_MODEL = "gpt-4o"
OLLAMA_REASONING_MODEL = "qwen3:8b" # Assumption: tag name; verify with `ollama list`
def _tool_text_descriptions() -> str:
# In the transcript, this is copy/pasted tool signature + docstring into prompt.
# Here we provide a compact summary string that still conveys name + args + purpose.
return "\n\n".join(
[
"Tool: get_n_random_words(language: str, n: int) -> List[str]\n"
"Returns n random words from data/<language>/wordlist_clean.json.",
"Tool: get_n_random_words_by_difficulty_level(language: str, n: int, difficulty_level: str) -> List[str]\n"
'Returns n random words filtered by difficulty_level. Valid difficulty_level: "beginner", "intermediate", "advanced".',
"Tool: translate_words(random_words: List[str], source_language: str, target_language: str) -> Dict\n"
'Translates a list of words and returns {"translations":[{"source":..., "target":...}, ...]}.',
]
)
def _system_prompt() -> str:
tools_desc = _tool_text_descriptions()
return f"""
You are a helpful language learning assistant.
You have access to the following tools:
{tools_desc}
Your job is to:
1) Identify the source language.
2) Identify the number of words.
3) Identify whether the user wants a specific difficulty level (beginner/intermediate/advanced) or just random words.
4) Identify whether the user wants the words translated to a target language.
Examples:
Input: "get 20 random words in Spanish"
- source_language: Spanish
- number_of_words: 20
- word_difficulty: None
- target_language: None
Tool workflow: get_n_random_words
Input: "get 10 hard words in German"
- source_language: German
- number_of_words: 10
- word_difficulty: advanced
- target_language: None
Tool workflow: get_n_random_words_by_difficulty_level
Input: "get 15 easy words in English and translate them to Spanish"
- source_language: English
- number_of_words: 15
- word_difficulty: beginner
- target_language: Spanish
Tool workflow: get_n_random_words_by_difficulty_level -> translate_words
Input: "get 50 random words in German and translate them to English"
- source_language: German
- number_of_words: 50
- word_difficulty: None
- target_language: English
Tool workflow: get_n_random_words -> translate_words
When solving the task, decide which tool(s) to call and in what order.
""".strip()
def _llm():
if USE_OPENAI:
# Requires OPENAI_API_KEY in .env
return ChatOpenAI(model=OPENAI_MODEL)
return ChatOllama(model=OLLAMA_REASONING_MODEL)
def assistant(state: AgentState) -> Dict[str, Any]:
# LangGraph will pass state in; we respond with updated state keys
llm = _llm()
messages = state.get("messages", [])
sys = SystemMessage(content=_system_prompt())
model_input = [sys] + messages
# Bind tools each time (simple blueprint approach)
tools = state.get("_bound_tools", []) # internal stash if you want
if not tools:
# local tools only (MCP can be appended in setup_tools)
tools = [get_n_random_words, get_n_random_words_by_difficulty_level, translate_words]
llm = llm.bind_tools(tools)
resp = llm.invoke(model_input)
# Append assistant response to message history
new_messages = messages + [resp]
# Return updated state
return {
"messages": new_messages,
"source_language": state.get("source_language"),
"number_of_words": state.get("number_of_words"),
"word_difficulty": state.get("word_difficulty"),
"target_language": state.get("target_language"),
"_bound_tools": tools,
}
async def setup_tools():
# Local tools
local_tools = [get_n_random_words, get_n_random_words_by_difficulty_level, translate_words]
# MCP tools (optional)
# Assumption: Standard/Typical Setup:
# - You built Clanky and have build/index.js path.
#
# clanky_js = "/absolute/path/to/clanky/build/index.js"
# client = MultiServerMCPClient(
# {
# "clanky": {
# "command": "node",
# "args": [clanky_js],
# "transport": "stdio",
# }
# }
# )
# mcp_tools = await client.get_tools()
# return local_tools + mcp_tools
return local_tools
async def build_graph():
tools = await setup_tools()
graph = StateGraph(AgentState)
graph.add_node("assistant", assistant)
graph.add_node("tools", ToolNode(tools))
graph.add_edge(START, "assistant")
graph.add_conditional_edges("assistant", tools_condition)
graph.add_edge("tools", "assistant")
return graph.compile()
async def main():
app = await build_graph()
user_prompt = "Please get 10 basic words in German and translate them to English."
result = await app.ainvoke(
{
"messages": [HumanMessage(content=user_prompt)],
"source_language": None,
"number_of_words": None,
"word_difficulty": None,
"target_language": None,
}
)
# Print final agent message content (last message)
final_msg = result["messages"][-1]
print(final_msg.content if hasattr(final_msg, "content") else final_msg)
if __name__ == "__main__":
asyncio.run(main())
7) Local model setup with Ollama
Install Ollama (per transcript). Then:
ollama list
Pull models:
ollama pull qwen3:8b
ollama pull llama3.2:3b
Assumption: Standard/Typical Setup
Model tags may differ slightly (e.g., qwen3:8b vs qwen3:8b-instruct). Use ollama list and the Ollama library page variant name.
8) OpenAI model setup (optional)
If USE_OPENAI=True:
Create .env in project root:
OPENAI_API_KEY=your_key_here
Then main.py can use ChatOpenAI(model="gpt-4o").
9) MCP + Anki flashcards (optional integration)
From transcript, prerequisites:
Install Anki desktop app
Install AnkiConnect add-on
Clone Clanky MCP repo and build:
git clone <CLANKY_REPO_URL>
cd clanky
npm install
npm run build
Find:
clanky/build/index.js
Then in main.py uncomment the MCP block in setup_tools() and set:
Clanky + AnkiConnect documentation for detailed setup/debugging
If you run into anything that doesn’t match your dataset’s exact file naming (CSV filenames, folder names), the only thing you’ll need to adjust is the raw ingestion path in the notebook and the language folder names under data/. Everything else stays the same.
FAQ
Do I need strong NLP background before starting this project?
No. Basic Python and API usage is enough to start, then you can deepen NLP knowledge as you build.
Why combine OpenAI with Ollama in the same workflow?
It gives you flexibility: hosted reasoning quality when needed, and local model control for cost/privacy-sensitive tasks.
What part fails most often in practice?
Data quality and tool orchestration. Clean your dataset early and log tool inputs/outputs during debugging.