PromptForge

Manage and serve your AI prompts via API.

Gemini Prompt Management: Serve Dynamic Prompts via API

PromptForge Team5 min read
Gemini prompt managementGoogle GeminisystemInstructionprompt versioningVertex AI

Google's docs call it systemInstruction. Your codebase calls it a multiline string in config/gemini.ts. Product updates the instruction for Gemini 2.0 Flash. Engineering ships it next Tuesday. Meanwhile staging still runs the Pro-tuned wording on Flash because someone reused the wrong constant.

Gemini prompt management puts systemInstruction text in a versioned registry, serves it through a REST API with dynamic {{variables}}, and lets you promote changes to production without a deploy.

Gemini's system instruction model

Unlike OpenAI's system role message, Gemini uses a dedicated systemInstruction parameter when you create a model instance. Google's system instructions guide documents it for the Gemini API and Vertex AI.

The instruction shapes persona, task context, safety boundaries, and output format across all generateContent or sendMessage calls in that session.

That text is natural language. It changes often. It belongs in a prompt registry, not buried in a constants file.

Hub: LLM-Specific Prompt Management.

Serve dynamic prompts via API

"Serving dynamic prompts via API" means three things:

  1. Storage outside your application repository
  2. Interpolation of {{variables}} at fetch time
  3. Version channels (stable, latest, pinned) controlling what production receives

Your Gemini application calls PromptForge, receives a fully rendered string, passes it to getGenerativeModel({ systemInstruction: content }).

Template example

You are a {{domain}} expert explaining concepts to {{audience}}.
Use plain language. Include one practical example per answer.
If you lack information, say so. Do not speculate about medical outcomes.

Fetch and use with Google Generative AI SDK

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);

async function fetchGeminiInstruction(domain: string, audience: string) {
  const res = await fetch(
    "https://www.promptforge-app.com/api/v1/prompts/your-prompt-id",
    {
      method: "POST",
      headers: {
        Authorization: "Bearer pfk_your_api_key",
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        version: "stable",
        variables: { domain, audience },
      }),
    },
  );
  const { content, version } = await res.json();
  return { content: content as string, version: version as number };
}

export async function explainTopic(topic: string, domain: string, audience: string) {
  const { content, version } = await fetchGeminiInstruction(domain, audience);

  const model = genAI.getGenerativeModel({
    model: "gemini-2.0-flash",
    systemInstruction: content,
  });

  const result = await model.generateContent(
    `Explain the following topic: ${topic}`,
  );

  console.log({ promptVersion: version, model: "gemini-2.0-flash" });
  return result.response.text();
}

Change domain or audience by updating variables at fetch time, not by editing code. Change the base instruction by promoting a new version in PromptForge.

Gemini Flash vs Pro: separate prompts

Flash and Pro have different context budgets and capability profiles. Flash often needs shorter, more direct instructions. Pro can handle longer rule sets and nested formatting requirements.

Create separate PromptForge prompts:

  • gemini-flash-support
  • gemini-pro-analyst

Version and promote each independently. Select prompt ID in code alongside model: "gemini-2.0-flash" vs gemini-2.0-pro.

Multi-turn chat sessions

For chat sessions, fetch systemInstruction once before creating the model:

const { content } = await fetchGeminiInstruction("healthcare", "patients");
const model = genAI.getGenerativeModel({
  model: "gemini-2.0-flash",
  systemInstruction: content,
});
const chat = model.startChat({ history: [] });
// subsequent sendMessage calls use the same instruction

The PromptForge fetch is one-time per session creation. If you need mid-session instruction updates (rare), recreate the model with a fresh fetch.

Multimodal Gemini apps

Images, video, and audio go in the contents array. systemInstruction is always text. Store and version that text in PromptForge. Your application assembles multimodal contents as before.

Prompt management covers the instructional layer, not binary media handling.

Vertex AI on Google Cloud

The Vertex AI SDK uses the same systemInstruction parameter:

const model = vertexAI.getGenerativeModel({
  model: "gemini-2.0-flash-001",
  systemInstruction: content, // fetched from PromptForge server-side
});

Vertex authentication (ADC, service account) is separate from your PromptForge API key. Fetch PromptForge from Cloud Run, GKE, or Cloud Functions with your server-side key. Never expose PromptForge credentials to the client.

Dynamic variables for multi-surface products

One Gemini-powered feature might serve consumers and clinicians from the same codebase:

You are assisting {{audience}} with {{domain}} questions.
Compliance mode: {{compliance_level}}.

Pass variables: { audience: "clinicians", domain: "cardiology", compliance_level: "HIPAA-aware" } at fetch time. One template, multiple surfaces. Update compliance wording once in PromptForge.

Version control and promotion

Gemini models are sensitive to instruction wording. Small edits change refusal rates and formatting.

Workflow:

  1. Save edits in PromptForge (new version automatically)
  2. Test on staging with latest
  3. Diff against stable in History tab
  4. Run eval set (even 20–30 representative queries helps)
  5. Promote to stable
  6. Log promptVersion with each generateContent call

Rollback: promote previous stable version. No redeploy.

Details: Stable vs latest vs pinned.

Why API delivery beats config files for Gemini

Config files in your repo still require deploys. Environment variables do not scale past a few prompts and hide diffs from non-engineers.

API-served prompts let product and domain experts iterate in a dashboard, support {{variables}} without templating code in your service, and give you immutable version history readable as plain language.

The fetch cost is one HTTP round trip per session or per request (your choice, with optional caching).

Getting started

Move your highest-traffic systemInstruction to PromptForge. Use stable in production, latest in staging. Add promptVersion to logs.

Examples and Vertex notes: Google / Gemini integration page. Ops foundation: Complete Guide to Prompt Management.