← All integrations

Integration · Meta / Llama

Meta / Llama Prompt Management

Run Llama 4 Scout locally or in the cloud and keep your system prompts in one versioned place. PromptForge manages the prompt layer so you can focus on the model.

Integration Example

PromptForge + Meta / Llama in one file

Version, template, and serve Meta Llama prompts via REST API. Whether you run Llama 4 Scout via Ollama, llama.cpp, or a hosted provider, PromptForge keeps your system prompts versioned and out of your source code.

  1. 1
  2. Fetch your versioned, interpolated prompt from the PromptForge REST API with a single fetch() call.
  3. 2
  4. Pass the returned content string directly to the Meta / Llama SDK as the system prompt, no transformation needed.
  5. 3
  6. Update the prompt in the PromptForge dashboard anytime, running applications pick up the change on the next request. No redeployment required.
integration.ts
TypeScript
// Meta Llama via Ollama (OpenAI-compatible API)
// Works identically with llama.cpp's server or any hosted Llama endpoint

async function fetchPrompt(context: string, language: string) {
  const res = await fetch(
    "https://www.promptforge-app.com/api/v1/prompts/your-prompt-id",
    {
      method: "POST",
      headers: {
        Authorization: "Bearer pfk_your_api_key",
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        version: "latest",
        variables: { context, language },
      }),
    }
  );
  const { content } = await res.json();
  return content as string;
}

// Prompt template in PromptForge (e.g.):
//   You are a {{context}} assistant. Respond in {{language}}.
// 1. Fetch your Llama system prompt with dynamic variables
const content = await fetchPrompt("code_review", "TypeScript");

// 2. Send to Llama running locally via Ollama
//    Swap the base URL for any hosted Llama endpoint (Together AI, Groq, etc.)
const response = await fetch("http://localhost:11434/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "llama4-scout",
    messages: [
      { role: "system", content },
      { role: "user", content: "Review this TypeScript function for bugs." },
    ],
  }),
});

const { choices } = await response.json();
console.log(choices[0].message.content);
How It Works

Meta / Llama + PromptForge in Three Steps

Add a prompt management layer to your Meta / Llama integration without refactoring your application.

Step 1

Write your Llama system prompt template

Llama 3.3 uses a standard system/user/assistant conversation format. Write your system prompt in PromptForge with {{context}}, {{language}}, or {{task}} variables, it works the same whether you run locally or on a hosted provider.

Step 2

Version across model upgrades

Each Llama release (3.1, 3.2, 3.3) responds differently to the same prompt wording. PromptForge lets you maintain separate versioned prompts per model, compare them side-by-side, and roll back if an upgrade regresses output quality.

Step 3

Fetch once, run anywhere

Call the PromptForge API to get the current system prompt, then pass it to your Ollama, llama.cpp, Together AI, or Groq endpoint. The fetch is decoupled from the inference call, so you can update the prompt in the dashboard and pick it up without restarting your service.

Features

Powerful Prompt Management Features for AI Developers

From simple prompt storage to production-ready APIs with version control, dynamic variables, rollback, and a public gallery.

Dynamic Variables

Use {{variable}} syntax to create reusable prompt templates. Pass different values via API for endless customization across any LLM.

Instant Prompt API

RESTful API endpoint ready in seconds. Fetch any prompt with version pinning and variable interpolation. No redeployment needed.

Rollback with Diff Checker

View a line-by-line diff of every change and roll back to any previous version in one click. Never lose a working prompt again.

Publish to Gallery

Share your best prompts with the community in the public gallery. Get discovered by other developers and grow your personal library.

Start for free, upgrade anytime

No credit card required to get started. Paid plans include a 14-day free trial.

HobbyNo credit card
  • 1k API requests/month
  • 1 prompt
  • Unlimited versions
  • Dynamic variables
  • Version pinning
  • API key management
Start Managing My Prompts
Starter
14-day free trial
$9/month

No charge until your trial ends

  • 10k API requests/month
  • 5 prompt
  • Unlimited versions
  • Dynamic variables
  • Version pinning
  • API key management
Start 14-day Free Trial
Most Popular
Pro
14-day free trial
$29/month

No charge until your trial ends

  • 100k API requests/month
  • 25 prompt
  • Unlimited versions
  • Dynamic variables
  • Version pinning
  • API key management
Start 14-day Free Trial
Business
14-day free trial
$49/month

No charge until your trial ends

  • 500k API requests/month
  • 100 prompt
  • Unlimited versions
  • Dynamic variables
  • Version pinning
  • API key management
Start 14-day Free Trial

Questions? Contact us

FAQ

Meta / Llama + PromptForge: Common Questions

Specific answers for developers integrating PromptForge with Meta / Llama.

How do I use PromptForge with self-hosted Llama models?

The PromptForge API call is a standard HTTPS fetch. It only requires outbound internet access to `api.promptforge-app.com` from your server. Your Llama inference server (Ollama, llama.cpp, vLLM) can run entirely on-premises or on a private network. Fetch the prompt from PromptForge once at request time, then send it to your local inference endpoint as a normal HTTP request. No PromptForge agent or sidecar is needed.

Can I manage Llama's special token format with PromptForge?

Llama 3 chat models use the OpenAI-compatible messages format via Ollama and most hosting providers, so you store the plain system-prompt text in PromptForge without special tokens. If you are using Llama's raw `<|begin_of_text|><|start_header_id|>system<|end_header_id|>` format directly (e.g. via llama.cpp's completion endpoint), store the full formatted string (including token delimiters) as the PromptForge template. PromptForge stores and returns it verbatim.

Does PromptForge work with Ollama and llama.cpp?

Yes. Both Ollama (`/v1/chat/completions`) and llama.cpp's server (`/v1/chat/completions` or `/completion`) accept a plain text system message, exactly what PromptForge returns. Fetch the prompt from PromptForge, place it in the `system` role message, and send the request to whichever local server you are running. The integration is the same regardless of which runtime serves the model.

How do I version prompts across different Llama model sizes (8B, 70B)?

Create separate PromptForge prompts for each model tier, for example, `llama-8b-summariser` and `llama-70b-analyst`. Larger models generally handle more complex, detailed instructions while smaller models need more concise prompts. Keeping them separate lets you tune each prompt to the model's capability and version them independently as you experiment.