Rihpig

Posted on Apr 8 • Originally published at apidog.com

GLM-5.1 API 사용법: 코드 예제 완벽 가이드

요약

GLM-5.1은 https://open.bigmodel.cn/api/paas/v4/의 BigModel API를 통해 사용할 수 있습니다. 이 API는 OpenAI와 완전히 호환되어, 동일한 엔드포인트 구조, 요청 형식, 스트리밍 패턴을 지원합니다. BigModel 계정, API 키, 그리고 모델 이름 glm-5.1만 준비하면 됩니다. 이 글에서는 인증 방법, 첫 요청, 스트리밍, 도구 호출, 그리고 Apidog를 활용한 통합 테스트까지 실전 위주로 다룹니다.

Apidog를 지금 사용해보세요

소개

GLM-5.1은 Z.AI의 주력 에이전트 모델입니다. 2026년 4월 출시 이후 SWE-Bench Pro에서 1위를 기록했으며, 주요 코딩 벤치마크를 모두 앞서고 있습니다. AI 코딩 도우미, 자율 에이전트, 장기 작업 실행 등 다양한 개발 시나리오에 바로 통합할 수 있습니다.

이미 GPT-4 또는 Claude 기반 코드를 보유하고 있다면, 기본 URL과 모델 이름만 변경하면 GLM-5.1로 손쉽게 전환할 수 있습니다. 추가 SDK 학습이나 응답 파싱 변경 없이 바로 적용 가능합니다.

💡 테스트는 에이전트 API 통합의 핵심입니다. 수백 개의 도구 호출을 포함하는 워크플로우를 실 API로 직접 테스트하면 할당량이 급격히 소모됩니다. Apidog의 테스트 시나리오 기능을 활용하면, 모든 응답 시퀀스를 모의(Mock)하여 실제 배포 전 스트리밍·도구호출·에러 시나리오를 완벽하게 검증할 수 있습니다. 아래의 테스트 섹션을 따라하려면 Apidog를 무료로 설치하세요.

사전 요구 사항

bigmodel.cn의 BigModel 계정 (무료 가입)
BigModel 콘솔의 API 키
Python 3.8+ 또는 Node.js 18+ (예제 모두 지원)
OpenAI SDK 또는 표준 requests/fetch 사용 (GLM-5.1 API는 OpenAI 호환)

API 키를 환경 변수로 저장:

export BIGMODEL_API_KEY="your_api_key_here"

API 키는 코드에 하드코딩하지 마세요.

인증

모든 요청 헤더에 Bearer 토큰을 포함해야 합니다:

Authorization: Bearer YOUR_API_KEY

BigModel API 키는 xxxxxxxx.xxxxxxxxxxxxxxxx처럼 점으로 구분된 2부 구조입니다. OpenAI의 sk- 형식과 다르지만 인증 방식은 동일합니다.

기본 URL

https://open.bigmodel.cn/api/paas/v4/

채팅 완성 엔드포인트:

POST https://open.bigmodel.cn/api/paas/v4/chat/completions

첫 번째 요청

curl 사용

curl https://open.bigmodel.cn/api/paas/v4/chat/completions \
  -H "Authorization: Bearer $BIGMODEL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1",
    "messages": [
      {
        "role": "user",
        "content": "Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes."
      }
    ],
    "max_tokens": 1024,
    "temperature": 0.7
  }'

Python (requests) 사용

import os
import requests

api_key = os.environ["BIGMODEL_API_KEY"]

response = requests.post(
    "https://open.bigmodel.cn/api/paas/v4/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "model": "glm-5.1",
        "messages": [
            {
                "role": "user",
                "content": "Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes."
            }
        ],
        "max_tokens": 1024,
        "temperature": 0.7
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])

OpenAI SDK 사용 (권장)

OpenAI와 완벽히 호환되므로 공식 OpenAI Python SDK를 바로 사용 가능합니다:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["BIGMODEL_API_KEY"],
    base_url="https://open.bigmodel.cn/api/paas/v4/"
)

response = client.chat.completions.create(
    model="glm-5.1",
    messages=[
        {
            "role": "user",
            "content": "Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes."
        }
    ],
    max_tokens=1024,
    temperature=0.7
)

print(response.choices[0].message.content)

OpenAI SDK는 재시도, 타임아웃, 응답 파싱까지 자동 처리합니다. 기본 URL만 BigModel로 지정하면 바로 적용됩니다.

응답 형식

OpenAI와 동일한 구조로 응답됩니다:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1744000000,
  "model": "glm-5.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "def sieve_of_eratosthenes(n):\n    ..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 215,
    "total_tokens": 247
  }
}

응답 텍스트는 result["choices"][0]["message"]["content"]에서 가져올 수 있습니다.

usage 필드로 토큰 사용량을 추적하세요. GLM-5.1은 피크 시간(UTC+8 14:00-18:00)에 3배 할당량을 소모하므로, 토큰 사용량을 항상 체크해야 합니다.

스트리밍 응답

대용량 코드 생성 시, 스트리밍을 이용하면 토큰이 생성되는 즉시 받을 수 있습니다.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["BIGMODEL_API_KEY"],
    base_url="https://open.bigmodel.cn/api/paas/v4/"
)

stream = client.chat.completions.create(
    model="glm-5.1",
    messages=[
        {
            "role": "user",
            "content": "Explain how a B-tree index works in a database, with a code example."
        }
    ],
    stream=True,
    max_tokens=2048
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

print()  # 스트리밍 종료 후 개행

각 청크는 새로운 토큰의 델타만 포함합니다. 마지막 청크의 finish_reason이 "stop" (또는 토큰 제한 시 "length")입니다.

원시 요청으로 스트리밍

OpenAI SDK 없이 직접 요청할 수도 있습니다:

import os
import json
import requests

api_key = os.environ["BIGMODEL_API_KEY"]

response = requests.post(
    "https://open.bigmodel.cn/api/paas/v4/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "model": "glm-5.1",
        "messages": [{"role": "user", "content": "Write a merge sort in Python."}],
        "stream": True,
        "max_tokens": 1024
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        line = line.decode("utf-8")
        if line.startswith("data: "):
            data = line[6:]
            if data == "[DONE]":
                break
            chunk = json.loads(data)
            delta = chunk["choices"][0]["delta"]
            if "content" in delta:
                print(delta["content"], end="", flush=True)

도구 호출

GLM-5.1은 대화 중간에 함수 실행을 요청할 수 있는 도구 호출을 지원합니다. 에이전트 워크플로우(코드 실행, DB 조회, 외부 API 호출 등)에 핵심적으로 활용 가능합니다.

도구 정의

import os
import json
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["BIGMODEL_API_KEY"],
    base_url="https://open.bigmodel.cn/api/paas/v4/"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "run_python",
            "description": "Execute Python code and return the output. Use this to test, profile, or benchmark code.",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "The Python code to execute"
                    }
                },
                "required": ["code"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read the contents of a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "File path to read"
                    }
                },
                "required": ["path"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="glm-5.1",
    messages=[
        {
            "role": "user",
            "content": "Write a function to compute Fibonacci numbers, test it for n=10, and show me the output."
        }
    ],
    tools=tools,
    tool_choice="auto"
)

message = response.choices[0].message
print(f"Finish reason: {response.choices[0].finish_reason}")

if message.tool_calls:
    for tool_call in message.tool_calls:
        print(f"\nTool called: {tool_call.function.name}")
        print(f"Arguments: {tool_call.function.arguments}")

도구 호출 응답 처리

도구 호출 발생 시, 실제 함수를 실행하고 결과를 모델에 반환하세요.

import subprocess

def execute_tool(tool_call):
    """도구 실행 후 결과 반환"""
    name = tool_call.function.name
    args = json.loads(tool_call.function.arguments)

    if name == "run_python":
        result = subprocess.run(
            ["python3", "-c", args["code"]],
            capture_output=True,
            text=True,
            timeout=10
        )
        return result.stdout or result.stderr

    elif name == "read_file":
        try:
            with open(args["path"]) as f:
                return f.read()
        except FileNotFoundError:
            return f"Error: file {args['path']} not found"

    return f"Unknown tool: {name}"


def run_agent_loop(user_message, tools, max_iterations=20):
    """도구 호출을 포함한 전체 에이전트 루프 실행"""
    messages = [{"role": "user", "content": user_message}]

    for i in range(max_iterations):
        response = client.chat.completions.create(
            model="glm-5.1",
            messages=messages,
            tools=tools,
            tool_choice="auto",
            max_tokens=4096
        )

        message = response.choices[0].message
        messages.append(message.model_dump())

        if response.choices[0].finish_reason == "stop":
            return message.content

        if response.choices[0].finish_reason == "tool_calls":
            for tool_call in message.tool_calls:
                tool_result = execute_tool(tool_call)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": tool_result
                })

    return "Max iterations reached"


result = run_agent_loop(
    "Write a quicksort implementation, test it with a random list of 1000 integers, and report the time.",
    tools
)
print(result)

이 패턴을 활용하면, 에이전트 모델의 도구 호출과 결과 처리를 완전 자동화할 수 있습니다.

주요 매개변수

매개변수	유형	기본값	설명
`model`	string	필수	`"glm-5.1"` 사용
`messages`	array	필수	대화 기록
`max_tokens`	integer	1024	생성 최대 토큰 수 (최대 163,840)
`temperature`	float	0.95	무작위성. 낮을수록 결정적
`top_p`	float	0.7	핵심 샘플링. 코딩엔 0.7 추천
`stream`	boolean	false	스트리밍 응답 활성화
`tools`	array	null	도구 호출 함수 정의
`tool_choice`	string/object	"auto"	"auto", "none", 특정 도구 지정
`stop`	string/array	null	사용자 정의 중지 시퀀스

코딩 작업 추천값:

{
    "model": "glm-5.1",
    "temperature": 1.0,
    "top_p": 0.95,
    "max_tokens": 163840  # 장기 실행·대용량 컨텍스트
}

결정적 코드 출력을 원하면 temperature를 0.2~0.4로 낮추세요.

코딩 도우미와 함께 GLM-5.1 사용

Z.AI 코딩 플랜을 활용하면 BigModel API를 통해 Claude Code, Cline, Kilo Code 등 다양한 AI 코딩 도우미를 GLM-5.1로 라우팅할 수 있습니다. 저렴하게 강력한 코딩 모델을 활용할 수 있습니다.

Claude Code 설정

~/.claude/settings.json 등 설정 파일에 아래처럼 입력:

{
  "model": "glm-5.1",
  "baseURL": "https://open.bigmodel.cn/api/paas/v4/",
  "apiKey": "your_bigmodel_api_key"
}

Cline / Roo Code 설정

VS Code 또는 Cline 확장 설정:

{
  "cline.apiProvider": "openai",
  "cline.openAIBaseURL": "https://open.bigmodel.cn/api/paas/v4/",
  "cline.openAIApiKey": "your_bigmodel_api_key",
  "cline.openAIModelId": "glm-5.1"
}

할당량 소비

GLM-5.1은 토큰당 과금이 아니라 Z.AI 할당량 시스템을 사용합니다:

피크 시간 (UTC+8 14:00-18:00): 요청당 3배 할당량 소모
비피크 시간: 요청당 2배 할당량 소모
2026년 4월까지 프로모션: 비피크 시간 1배 할당량

대규모 에이전트 작업은 비피크 시간 예약이 효율적입니다.

Apidog로 GLM-5.1 API 테스트

에이전트 API 통합을 검증하려면, 일반 응답, 스트리밍, 도구 호출, 오류 등 다양한 상태를 모두 테스트해야 합니다. 실 API 호출은 할당량 소모 및 네트워크 연결이 필요합니다.

Apidog의 Smart Mock 기능을 활용하면, 실제 API 호출 없이 모든 응답 상태를 시뮬레이션할 수 있습니다.

Mock 엔드포인트 설정

Apidog에서 새 엔드포인트 생성: POST https://open.bigmodel.cn/api/paas/v4/chat/completions
표준 성공 응답 Mock 등록:

{
  "id": "chatcmpl-test123",
  "object": "chat.completion",
  "created": 1744000000,
  "model": "glm-5.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "def sieve(n): ..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 120,
    "total_tokens": 152
  }
}

도구 호출 응답 Mock 등록:

{
  "id": "chatcmpl-tool456",
  "object": "chat.completion",
  "created": 1744000001,
  "model": "glm-5.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc",
            "type": "function",
            "function": {
              "name": "run_python",
              "arguments": "{\"code\": \"print(2+2)\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 48,
    "completion_tokens": 35,
    "total_tokens": 83
  }
}

속도 제한(Mock 429) 응답 추가:

{
  "error": {
    "message": "Rate limit exceeded. Please retry after 60 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

전체 에이전트 루프 테스트

Apidog 테스트 시나리오로 여러 요청을 연결해 실제 에이전트 루프를 검증하세요.

1단계: 초기 메시지로 /chat/completions에 POST, HTTP 200 + finish_reason == "tool_calls" 검증
2단계: 도구 결과 포함 메시지로 다시 POST, HTTP 200 + finish_reason == "stop" 검증
3단계: 최종 응답 코드 추출 및 내용 검증

Mock 응답을 429로 바꿔 재시도 로직 등 에러 처리도 쉽게 검증할 수 있습니다.

여러 단계 워크플로우에서는 Apidog 변수 기능을 이용해 단계 간 값을 바로 전달할 수 있어, 실제 에이전트 루프와 동일하게 통합 테스트가 가능합니다.

오류 처리

API는 표준 HTTP 상태 코드를 반환합니다:

상태	의미	조치
200	성공	정상 처리
400	잘못된 요청	요청 형식 확인
401	인증 실패	API 키 확인
429	속도 제한	`Retry-After` 값만큼 대기 후 재시도
500	서버 오류	지수 백오프로 재시도
503	서비스 불가	지수 백오프로 재시도

실전 예시:

import time
import requests

def call_with_retry(payload, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://open.bigmodel.cn/api/paas/v4/chat/completions",
                headers={"Authorization": f"Bearer {os.environ['BIGMODEL_API_KEY']}",
                         "Content-Type": "application/json"},
                json=payload,
                timeout=120
            )

            if response.status_code == 429:
                retry_after = int(response.headers.get("Retry-After", 60))
                print(f"Rate limited. Waiting {retry_after}s...")
                time.sleep(retry_after)
                continue

            response.raise_for_status()
            return response.json()

        except requests.exceptions.Timeout:
            wait = 2 ** attempt
            print(f"Timeout on attempt {attempt + 1}. Retrying in {wait}s...")
            time.sleep(wait)

    raise Exception("Max retries exceeded")

에이전트 실행이 길어질 수 있으니, 타임아웃을 넉넉히(120~300초) 설정하세요.

결론

GLM-5.1의 OpenAI 호환 API 덕분에 GPT 또는 Claude 기반 프로젝트는 몇 분 만에 GLM-5.1로 전환할 수 있습니다. 차이점은 엔드포인트(open.bigmodel.cn)와 할당량 기반 시스템뿐입니다.

수백 개 도구 호출이 포함된 장기 에이전트 앱 개발 시, GLM-5.1의 장기 최적화와 Apidog의 Smart Mock/테스트 시나리오로 엣지 케이스까지 확실히 검증하세요.

GLM-5.1의 상세 모델 비교는 GLM-5.1 모델 개요에서, Apidog로 에이전트 워크플로우 구축·테스트 방법은 AI 에이전트 메모리 작동 방식을 참고하세요.

자주 묻는 질문

GLM-5.1 API는 OpenAI와 호환됩니까?

네. 요청 형식, 응답 구조, 스트리밍, 도구 호출 모두 OpenAI 채팅 완성 API와 동일합니다. 기본 URL만 https://open.bigmodel.cn/api/paas/v4/로 지정하면 공식 OpenAI SDK 및 모든 호환 클라이언트에서 사용 가능합니다.

API 요청에 사용할 모델 이름은?

모델 이름은 "glm-5.1"입니다. 전체 버전명이 아닌 "glm-5.1"만 사용하세요.

GLM-5.1 API 요금 체계는?

BigModel API는 할당량 기반입니다. GLM-5.1은 피크 시간(UTC+8 14:00-18:00) 3배, 비피크 2배 할당량. 2026년 4월까지 비피크 시간은 1배 프로모션 적용.

최대 컨텍스트 길이 및 출력 토큰 제한?

입력 컨텍스트 최대 200,000 토큰, 출력 최대 163,840 토큰. 긴 작업은 max_tokens를 32,768 이상 권장.

함수 호출/도구 사용 가능한가?

네. OpenAI와 동일하게 type: "function" 스키마로 도구 정의, tools 배열 전달, finish_reason: "tool_calls" 처리 방식입니다.

할당량 소모 없이 API 테스트하려면?

Apidog의 Smart Mock으로 성공, 도구 호출, 속도 제한, 오류 등 모든 상태를 모의 응답으로 정의하세요. 개발·테스트는 Mock으로, 실제 API는 배포 전 최종 검증에만 사용하세요.

GLM-5.1 모델 가중치는 어디서?

오픈소스 가중치는 HuggingFace의 zai-org/GLM-5.1에서 MIT 라이선스로 제공되며, vLLM, SGLang 등 로컬 추론 지원.

DEV Community