Question 1

How do I use PromptForge with self-hosted Llama models?

Accepted Answer

The PromptForge API call is a standard HTTPS fetch. It only requires outbound internet access to `api.promptforge-app.com` from your server. Your Llama inference server (Ollama, llama.cpp, vLLM) can run entirely on-premises or on a private network. Fetch the prompt from PromptForge once at request time, then send it to your local inference endpoint as a normal HTTP request. No PromptForge agent or sidecar is needed.

Question 2

Can I manage Llama's special token format with PromptForge?

Accepted Answer

Llama 3 chat models use the OpenAI-compatible messages format via Ollama and most hosting providers, so you store the plain system-prompt text in PromptForge without special tokens. If you are using Llama's raw `<|begin_of_text|><|start_header_id|>system<|end_header_id|>` format directly (e.g. via llama.cpp's completion endpoint), store the full formatted string (including token delimiters) as the PromptForge template. PromptForge stores and returns it verbatim.

Question 3

Does PromptForge work with Ollama and llama.cpp?

Accepted Answer

Yes. Both Ollama (`/v1/chat/completions`) and llama.cpp's server (`/v1/chat/completions` or `/completion`) accept a plain text system message, exactly what PromptForge returns. Fetch the prompt from PromptForge, place it in the `system` role message, and send the request to whichever local server you are running. The integration is the same regardless of which runtime serves the model.

Question 4

How do I version prompts across different Llama model sizes (8B, 70B)?

Accepted Answer

Create separate PromptForge prompts for each model tier, for example, `llama-8b-summariser` and `llama-70b-analyst`. Larger models generally handle more complex, detailed instructions while smaller models need more concise prompts. Keeping them separate lets you tune each prompt to the model's capability and version them independently as you experiment.

Meta / Llama Prompt Management

PromptForge + Meta / Llama in one file

Meta / Llama + PromptForge in Three Steps

Write your Llama system prompt template

Version across model upgrades

Fetch once, run anywhere

Powerful Prompt Management Features for AI Developers

Dynamic Variables

Instant Prompt API

Rollback with Diff Checker

Publish to Gallery

Start for free, upgrade anytime

Meta / Llama + PromptForge: Common Questions

How do I use PromptForge with self-hosted Llama models?

Can I manage Llama's special token format with PromptForge?

Does PromptForge work with Ollama and llama.cpp?

How do I version prompts across different Llama model sizes (8B, 70B)?