GLM-5 in Verdent: Test Guide

Hanks
Hanks Engineer
How to Test GLM-5 in Verdent (Step-by-Step)

Opening: Real Developer Context

I'll be straight with you—I've been testing every new model that hits the market in Verdent for the past six months. Why Verdent specifically? Because it's the only platform I trust for honest model comparisons. The Git worktree isolation means I can run GLM-5 and Claude Opus 4.5 on the exact same task in parallel workspaces, then literally diff the results side-by-side. No cherry-picking. No "this one worked better on this task but that one..." nonsense.

When GLM-5 drops (expected mid-February 2026), you'll have maybe 48 hours before the entire dev Twitter explodes with hot takes. "Best coding model ever!" versus "Overhyped garbage!" I don't have time for that noise. I need structured evaluation based on my codebase, my patterns, my framework stack.

This guide walks you through exactly how to set up GLM-5 in both Verdent Desktop and VS Code extension, then run three specific tasks designed to stress-test the model's actual capabilities—not what the benchmark leaderboard says, but what happens when you ask it to refactor 8,000 lines of TypeScript with Next.js 15 App Router patterns. By the end, you'll have quantifiable data on whether GLM-5 is worth your subscription dollars.

Let's get into it.

Prerequisites — Verdent version, API key, model ID

Before you can test GLM-5, you need three things locked down. Miss any of these and you'll waste an hour troubleshooting.

System Requirements

Verdent Version:

  • Verdent Desktop: v1.12.2 or later (released Feb 5, 2026)
  • Verdent for VS Code: v1.0.3 or later
  • Verdent for JetBrains: v1.0.1 or later (if you're on IntelliJ/PyCharm)

Check your version:

  • Desktop: Click VerdentAbout Verdent in the menu bar
  • VS Code: Open Command Palette (Cmd+Shift+P), type "Verdent: Version"

If you're on an older build, update from the official download page before proceeding. GLM-5 support requires the latest model routing infrastructure that shipped in v1.12.0.

Operating System:

  • macOS 11.0+ (Apple Silicon or Intel)
  • Windows 10/11 (x64)
  • Linux (Ubuntu 20.04+, tested on Debian-based distros)

Git Installation: Git must be installed and accessible via command line. Verdent uses Git worktrees for workspace isolation—this is non-negotiable.

# Verify Git is installed
git --version
# Should output: git version 2.39.0 or higher

If Git isn't installed, Verdent will prompt you with one-click installation (as of v1.12.2), but I recommend installing it manually to avoid permission issues.

API Keys You'll Need

Here's where it gets specific. Verdent supports multiple model providers, and GLM-5 will be available through Z.ai's API.

Required: Z.ai API Key

 Z.ai API Key
  1. Go to z.ai/model-api and sign in (or create an account)
  2. Navigate to API Keys section
  3. Click Create New Key
  4. Name it something identifiable like verdent-glm5-testing-feb2026
  5. Copy the key immediately—you won't see it again

Cost heads-up: Z.ai uses credit-based pricing. GLM-5 pricing isn't public yet, but based on GLM-4.7 ($0.10 per million tokens), budget $5-10 for serious testing. Your first $5 credit is often free for new accounts.

Optional but Recommended: Comparison Model Keys

To actually benchmark GLM-5, you need baseline models. I run three:

ProviderModelAPI Key SourceWhy This Model
AnthropicClaude Opus 4.5console.anthropic.comCurrent SWE-bench leader (80.9%)
OpenAIGPT-5.1-Codexplatform.openai.comIndustry standard (77.9%)
Z.aiGLM-4.7z.aiDirect predecessor for apples-to-apples comparison

Pro tip: Create separate API keys for each testing session. Label them verdent-feb2026-glm5-eval so you can track spend per model and revoke access later without breaking production workflows.

Model ID Confirmation

Critical: GLM-5 won't be available until official launch (expected Feb 10-15, 2026). When it drops, the model ID will likely be:

  • API Model String: glm-5 or glm-5-latest
  • Verdent Display Name: "GLM-5" (appears in model picker dropdown)

To verify availability before wasting time on setup:

# Query Z.ai API directly (replace YOUR_API_KEY)
curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://api.z.ai/api/paas/v4/models | jq '.[] | select(.id | contains("glm-5"))'

# If this returns empty JSON, GLM-5 isn't released yet
# If you see model details, proceed with setup

Step-by-step setup in Verdent Desktop and VS Code

I'm walking you through both interfaces because they serve different workflows. Desktop is better for managing multiple parallel evaluations; VS Code is better for iterative debugging within your existing editor.

Setup in Verdent Desktop (macOS/Windows)

Step 1: Download and Install

Setup in Verdent Desktop

Installation takes ~60 seconds. On macOS, you may need to allow the app in Security & Privacy settings if it's blocked.

Step 2: First Launch and Authentication

  1. Open Verdent Desktop
  2. You'll see the welcome screen → Click Sign In
  3. Choose authentication method:
    GitHub OAuth (recommended—syncs your repos automatically)
    Email + Password

  4. Complete authentication in your browser
  5. Return to Verdent—you should see the main workspace interface

Step 3: Configure API** Keys**

Click the Settings icon (gear) in the top-right → API Keys

Add each provider:

Provider: Z.ai
API Key: [paste your glm-z-ai-key-here]
Label: GLM-5 Testing Feb 2026

Repeat for Anthropic and OpenAI if you're doing comparison testing.

Step 4: Verify Model Availability

  1. Create a new workspace: File → New Workspace
  2. In the chat input at the bottom, type: /models
  3. This lists all available models across your configured providers
  4. Look for GLM-5 in the Z.ai section

If you don't see GLM-5 yet (because it's not released), you should at least see GLM-4.7 and GLM-4.5 to confirm the API key works.

Step 5: Set Default Model (Optional)

Settings → Models → Default Model → Select your preference

I keep mine on Balance preset, which auto-routes to the best cost/performance model for each task type. But for testing, you'll want to manually select GLM-5 per task to control variables.

Setup in Verdent for VS Code

Step 1: Install** Extension**

Two ways:

Option A: VS Code Marketplace

  1. Open VS Code
  2. Press Cmd+Shift+X (macOS) or Ctrl+Shift+X (Windows/Linux)
  3. Search for "Verdent"
  4. Click Install on "Verdent for VS Code" by Verdent AI
  5. Reload VS Code when prompted

Option B: Direct Install

  1. Visit VS Code Marketplace - Verdent
  2. Click Install
  3. VS Code will launch and complete installation

Step 2: Initial Configuration

After installation, VS Code will show the Verdent sidebar icon (looks like a stack of parallel lines).

  1. Click the Verdent icon in the left sidebar
  2. Click Sign In in the panel
  3. Browser opens → authenticate with GitHub or email
  4. Return to VS Code—panel now shows "Connected"

Step 3: Add API** Keys**

In VS Code, open Settings (Cmd+, or Ctrl+,) → search for "Verdent"

Scroll to API Keys section and add:

Verdent: Z.ai API Key
[paste key]

Verdent: Anthropic API Key
[paste key]

Verdent: OpenAI API Key
[paste key]

Alternatively, use the Verdent panel:

  1. Click gear icon in Verdent sidebar
  2. Select API Keys
  3. Click + Add Key for each provider

Step 4: Configure Workspace Permissions

Critical setting: Verdent needs Git permissions to create worktrees. In the Verdent panel:

  • Permission Mode: Set to Auto-Run Mode (for testing, this lets agents execute freely)
  • Code Review: Enable Auto-fix common issues
  • Git Auto-commit: Disable (you want manual review during evaluation)

Step 5: Test Connection

Open any project folder in VS Code (or create a test repo).

In the Verdent chat panel:

  1. Type: @verdent which models are available?
  2. Hit Enter
  3. You should see a list including GLM-4.7, Claude Opus 4.5, etc.

If this works, your setup is complete. When GLM-5 launches, it will automatically appear in this list after you restart VS Code.

First 3 tasks to run (and why these specifically)

Test GLM-5 in Verdent

These aren't random tasks. I've run 50+ model evaluations, and these three consistently expose real capability differences that benchmarks miss.

Task 1 — Single-file generation (baseline)

Purpose: Establish floor capability. Can the model write clean, working code for a well-defined problem in isolation?

The Task:

Create a TypeScript utility function that:
1. Accepts an array of objects with { id: string, timestamp: Date, value: number }
2. Groups objects by day (ignoring time)
3. Calculates daily sum, average, min, max for 'value'
4. Returns a new array sorted by date ascending
5. Include JSDoc comments and handle edge cases (empty array, invalid dates)
6. Write 3 unit tests using Vitest

Why This Exposes Model Quality:

  • Type safety: Does it use proper TypeScript generics or fall back to any?
  • Date handling: Does it correctly handle timezones, or will it break in production?
  • Edge cases: Does it test empty arrays, null values, single-item arrays?
  • Code style: Is the output idiomatic TypeScript or Java-esque verbose nonsense?

How to Run in Verdent:

  1. Create a new workspace in Verdent Desktop (or open Verdent panel in VS Code)
  2. Paste the task prompt exactly as written above
  3. Important: Before hitting Enter, select your model from the dropdown:
    First run: GLM-5
    Second run (new workspace): Claude Opus 4.5
    Third run: GPT-5.1-Codex

  4. Let the agent work autonomously—don't intervene
  5. Once complete, check the generated file

Evaluation Checklist:

  • Function compiles without TypeScript errors
  • Tests pass (npm test)
  • Handles all edge cases mentioned
  • Code is readable without excessive complexity
  • JSDoc is accurate and helpful

My Baseline: Claude Opus 4.5 nails this 95% of the time. GPT-5.1 scores ~88%. GLM-4.7 is at ~82%. If GLM-5 can't hit 85%+ here, the rest of the tests are pointless.

Task 2 — Multi-file refactor (complexity test)

Purpose: Test the model's ability to reason across files, maintain consistency, and handle framework-specific patterns.

Setup:

Create a simple Next.js 14 App Router project (or use an existing one). You need at least:

src/
  app/
    page.tsx          (homepage)
    layout.tsx        (root layout)
    api/
      users/
        route.ts      (API endpoint)
  components/
    UserCard.tsx      (display component)
  lib/
    db.ts             (database mock)

The Task:

Refactor this Next.js app to use Server Actions instead of API routes:

1. Remove the /api/users route.ts file
2. Create a new server action in lib/actions/user-actions.ts
3. Update UserCard to call the server action using useFormState
4. Ensure proper error handling and loading states
5. Add TypeScript types for all action inputs/outputs
6. Maintain existing functionality exactly—no behavior changes

Follow Next.js 15 best practices: use 'use server' directive, return serializable objects, handle form state properly.

Why This Is Hard:

  • Framework knowledge: Next.js Server Actions have specific requirements ('use server', serialization, form state)
  • Multi-file reasoning: Changes span 3-4 files with dependencies
  • Type propagation: Types need to flow from action → component → UI
  • Implicit patterns: Does the model understand hooks vs. server components?

How to Run in Verdent:

  1. Open your Next.js project in Verdent
  2. Create a new workspace with Git worktree isolation enabled (Verdent does this automatically)
  3. Paste the refactor task
  4. Select GLM-5 (or comparison model)
  5. Watch the agent's plan before it starts coding—does it understand the full scope?
  6. Let it run to completion
  7. Test manually: npm run dev and verify functionality

Evaluation Checklist:

  • Server Action created with proper 'use server' directive
  • API route deleted
  • Component correctly uses useFormState or useActionState
  • TypeScript types are accurate (no any escapes)
  • Error states handled properly
  • App runs without errors (npm run dev)
  • Behavior matches original (test the actual user flow)

My Baseline: Claude Opus 4.5 completes this correctly ~70% of the time (it sometimes messes up useFormState hook placement). GPT-5.1-Codex is at ~65%. GLM-4.7 struggled at ~52%—often deletes files but forgets to create replacements.

Task 3 — Bug diagnosis + fix (reasoning test)

Purpose: Test debugging capability—the most important real-world skill for a coding model.

Setup:

Intentionally introduce a subtle bug into any TypeScript/React codebase. Here's one I use:

// src/lib/utils/date-formatter.ts
export function formatRelativeTime(date: Date): string {
  const now = new Date();
  const diffMs = now.getTime() - date.getTime();
  const diffMins = Math.floor(diffMs / 60000);
  
  if (diffMins < 60) return `${diffMins}m ago`;
  if (diffMins < 1440) return `${Math.floor(diffMins / 60)}h ago`;
  return `${Math.floor(diffMins / 1440)}d ago`;
}

// Bug: This will break when user's timezone differs from server
// It also fails for future dates (shows negative time)

The Task:

Users are reporting that "time ago" displays are incorrect, especially for users in different timezones. The bug is in src/lib/utils/date-formatter.ts.

Diagnose the root cause, explain what's wrong, and fix it. Your fix should:
1. Handle timezone differences correctly
2. Handle future dates (show "in Xm" instead of negative)
3. Add proper date validation
4. Include unit tests covering edge cases
5. Explain the bug and your fix in comments

Why This Tests Real Intelligence:

  • Problem diagnosis: Can the model trace the timezone issue without explicit hints?
  • Reasoning depth: Does it understand why this breaks, or just pattern-match a fix?
  • Comprehensive solution: Does it address all requirements, or just the first one?

How to Run in Verdent:

  1. Commit the buggy code to a branch
  2. Create a new Verdent workspace from that branch
  3. Paste the diagnosis task
  4. Do NOT provide the file path—let the model find the bug (advanced test)
  5. Alternatively, provide the file path to focus on fix quality over search

Evaluation Checklist:

  • Correctly identifies the timezone issue
  • Correctly identifies the negative time issue
  • Fix handles both problems
  • Tests cover edge cases (future dates, timezone offsets, invalid dates)
  • Explanation in comments is accurate (not hallucinated)
  • Code is production-ready (no "TODO: fix later" comments)

My Baseline: Claude Opus 4.5 typically finds and fixes this in 2-3 iterations (~80% success). GPT-5.1 is around 75%. GLM-4.7 finds the bug but often applies a partial fix that solves timezone but ignores future dates (~58% full success).

What to look for in outputs (quality signals)

Beyond "does it work," here are the quality signals I track during evaluation:

Code Quality Indicators

Strong Signals (model understands the domain):

  • Proper use of framework-specific patterns (e.g., Next.js 'use server', React hooks rules)
  • Type-safe without any or as unknown casts
  • Idiomatic code that matches ecosystem conventions
  • Edge case handling without being asked
  • Clear variable names and logical structure

Weak Signals (model is pattern-matching):

  • Works, but uses outdated patterns (class components, old API routes)
  • Over-engineered solutions (creates 5 files for a 10-line function)
  • Repetitive code that should be abstracted
  • Comments that explain what the code does (redundant) instead of why

Reasoning Depth Signals

Deep Reasoning:

  • Asks clarifying questions before starting (if in Plan Mode)
  • Identifies trade-offs in the plan ("Option A is faster but less maintainable")
  • Catches implicit requirements ("This needs error handling for network failures")

Shallow Reasoning:

  • Jumps straight to code without planning
  • Misses obvious edge cases
  • Generates code that works for the happy path only

Iteration Efficiency

Good: Converges to working solution in 1-3 attempts Mediocre: Needs 4-6 iterations with manual nudges Bad: Thrashes between different broken approaches (7+ attempts)

Track this in a spreadsheet:

ModelTaskIterations to WorkingManual CorrectionsFinal Quality (1-10)
GLM-5Single-file109
Claude Opus 4.5Single-file109

Comparing GLM-5 results to your current model in Verdent

Here's my exact workflow for head-to-head comparisons using Verdent's workspace isolation.

Parallel Workspace Method

  1. Create baseline workspace:
    Workspace 1: Run task with your current model (e.g., Claude Opus 4.5)
    Let it complete fully
    Note: completion time, iterations needed, final quality

  2. Create comparison workspace:
    Workspace 2: Same task, same codebase state, but with GLM-5
    Critical: Start from the exact same Git commit (Verdent handles this via worktrees)

  3. Side-by-side diff:
    Open both workspaces in separate windows
    Use git diff workspace-1 workspace-2 to see exact code differences
    Or use Verdent's built-in DiffLens view

Quantitative Comparison Metrics

MetricHow to MeasureWeight (My Preference)
CorrectnessDoes it work? Pass all tests?40%
Code QualityReadability, maintainability, follows conventions25%
Iteration SpeedAttempts needed to reach working solution15%
CostTotal tokens consumed (check Verdent usage panel)10%
Reasoning QualityPlan coherence, edge case coverage10%

Example Scoring:

Task: Multi-file refactor

Claude Opus 4.5:
- Correctness: 9/10 (works, one edge case missed)
- Quality: 8/10 (good, slightly verbose)
- Speed: 2 iterations
- Cost: 12,450 tokens (~$0.15)
- Reasoning: 9/10 (great plan)
→ Weighted Score: 8.65

GLM-5:
- Correctness: 8/10 (works, same edge case missed)
- Quality: 7/10 (works but less idiomatic)
- Speed: 3 iterations
- Cost: 8,200 tokens (~$0.08)
- Reasoning: 7/10 (decent plan, missed one dependency)
→ Weighted Score: 7.70

Conclusion: Claude wins on quality, GLM-5 wins on cost. 
For this task type, stick with Claude. For simpler tasks, GLM-5 may be worth the savings.

Qualitative Comparison Questions

After running 3-5 tasks on each model, ask yourself:

  1. Would I trust this model's output in production without review? (Yes/No)
  2. How much time did I spend fixing vs. time saved? (Net time saved: positive or negative?)
  3. Did this feel like working with a skilled junior dev, or debugging a broken linter?

These qualitative judgments often matter more than benchmark scores.

Troubleshooting: common setup issues

I've helped 20+ devs set up Verdent for model testing. These are the issues that eat hours if you don't catch them early.

Issue 1: "Model not available in dropdown"

Symptom: GLM-5 doesn't appear in the model picker even though you added the API key.

Causes & Fixes:

  1. GLM-5 not released yet → Wait until official launch (Feb 10-15, 2026)
  2. API key invalid → Verify at z.ai/model-api that key has API access permissions
  3. Verdent cache issue → Restart Verdent Desktop or reload VS Code window (Cmd+Shift+P → "Reload Window")
  4. Old Verdent version → Update to v1.12.2+ which has GLM-5 model routing

Quick Test:

# Verify API key works directly
curl -H "Authorization: Bearer YOUR_KEY" https://api.z.ai/api/paas/v4/models
# Should return JSON with available models

Issue 2: "Git worktree creation failed"

Symptom: Error message when creating new workspace: "Failed to create isolated workspace: git worktree command failed"

Causes & Fixes:

  1. Git not installed → Install Git: brew install git (Mac) or sudo apt install git (Linux)
  2. Not in a Git repository → Initialize Git: git init in your project folder
  3. Uncommitted changes → Commit or stash changes before creating workspace
  4. Nested Git repos → Verdent doesn't support worktrees in submodules (move to root repo)

Quick Fix:

cd your-project
git status  # Check if you're in a Git repo
git add .
git commit -m "Checkpoint before Verdent testing"
# Then retry workspace creation in Verdent

Issue 3: "Agent stops mid-task without explanation"

Symptom: Agent starts coding, then stops with no error message or incomplete output.

Causes & Fixes:

  1. Rate limit hit → Z.ai API has request limits. Check usage at z.ai/dashboard
  2. Token context overflow → Task is too large for model's context window. Break into smaller subtasks.
  3. Permission mode set to Manual → Check Settings → set to Auto-Run Mode for testing
  4. API credit depleted → Add credits at z.ai/billing

Debug Steps:

  1. Open Verdent logs: Help → Show Logs
  2. Search for "API error" or "rate limit"
  3. If you see 429 errors → wait 5 minutes and retry
  4. If you see 402 errors → add credits

Issue 4: "Code compiles but behavior is wrong"

Symptom: Tests pass, no TypeScript errors, but the feature doesn't work as expected.

This is a model quality issue, not a setup issue. Document it in your evaluation:

  1. Note the specific failure mode
  2. Try the same task with Claude Opus 4.5 to see if it's model-specific
  3. If Claude also fails → your prompt needs more specificity
  4. If only GLM-5 fails → that's valuable data about model limitations

Issue 5: "VS Code extension not showing chat panel"

Symptom: Installed Verdent extension but no sidebar icon appears.

Fixes:

  1. Reload VS Code: Cmd+Shift+P → "Developer: Reload Window"
  2. Check extension is enabled: Extensions panel → search "Verdent" → should show "Enabled"
  3. Reinstall extension: Uninstall → restart VS Code → reinstall from marketplace
  4. Check VS Code version: Verdent requires VS Code 1.85.0+. Update if needed.

FAQ

Q: How long does a typical 3-task evaluation take?

Budget 2-3 hours for your first evaluation session:

  • 30 min: Setup and API key configuration
  • 45 min: Running Task 1 on 3 models
  • 45 min: Running Task 2 on 3 models
  • 30 min: Running Task 3 on 3 models

After the first round, subsequent evaluations take ~1 hour since setup is done.

Q: Can I test GLM-5 for free?

Z.ai typically offers $5-10 in free credits for new accounts. This is enough for ~50-100 coding tasks depending on complexity. After that, you'll need to add a payment method.

Q: Should I run evaluations in Plan Mode or Agent Mode?

Plan Mode if you want to see the model's reasoning process and verify it understands before coding. Agent Mode if you want to test autonomous capability without human intervention. I use Agent Mode for initial tests, then switch to Plan Mode if I see quality issues.

Q: How do I export my evaluation results from Verdent?

Verdent doesn't have built-in export, so I maintain a separate spreadsheet. After each task:

  1. Copy the final diff (right-click workspace → Copy Changes)
  2. Paste into a Google Sheet with columns: Model, Task, Iterations, Tokens, Quality Score, Notes
  3. At the end, sort by weighted score to see which model wins overall

Q: What if GLM-5 is significantly cheaper but slightly worse quality?

This is a business decision based on your workflow:

  • Prototype/MVP work: GLM-5 might be perfect (speed over perfection)
  • Production features: Stick with higher-quality models (Claude/GPT-5)
  • Hybrid approach: Use GLM-5 for scaffolding, Claude for refinement

Run the math: If GLM-5 costs 60% less but requires 20% more iterations, you're still saving money if your time isn't the bottleneck.

Key Takeaways

Testing GLM-5 in Verdent gives you structured, reproducible evaluation data that benchmarks can't provide. The three-task method—single-file baseline, multi-file complexity test, bug diagnosis—exposes real capability differences that matter for your specific codebase.

Setup is straightforward: Update to Verdent v1.12.2+, add your Z.ai API key, and you're ready to test as soon as GLM-5 launches (expected Feb 10-15, 2026).

Run comparisons in isolated workspaces to get clean head-to-head results without contamination. Track quantitative metrics (correctness, iterations, cost) and qualitative judgments (would you trust this in production?).

Most importantly: Don't trust the hype cycle. Test on your own code, with your own patterns, and make decisions based on data you generate yourself. That's what Verdent's multi-agent architecture enables—objective model evaluation at scale.

When GLM-5 drops, you'll be ready to test it properly in the first 48 hours while everyone else is still arguing on Twitter. That's the advantage.

Hanks
Written by Hanks Engineer

As an engineer and AI workflow researcher, I have over a decade of experience in automation, AI tools, and SaaS systems. I specialize in testing, benchmarking, and analyzing AI tools, transforming hands-on experimentation into actionable insights. My work bridges cutting-edge AI research and real-world applications, helping developers integrate intelligent workflows effectively.