TutorialMarch 31, 20265 min read

10 Prompt Engineering Tricks to Cut Token Usage

Every unnecessary word in your system prompt costs you money with every API call. Here are 10 specific, testable ways to rewrite your prompts to save up to 50% on input tokens.

1. Remove "Please" and "Thank You"

AI models don't need politeness. Extra words just consume tokens.

Before (15 tokens)
Please summarize this text for me, thank you.
After (3 tokens)
Summarize this:

2. Use JSON Keys Effectively

When forcing JSON output, keep keys extremely short. Long keys are repeated for every item in an array, wasting massive amounts of output tokens (which cost 4x more than input tokens).

Before
{ "user_first_and_last_name": "...", "customer_account_identification": "..." }
After
{ "name": "...", "id": "..." }

3. Combine Multiple API Calls

Instead of doing one request to translate, and a second request to summarize, do both in one prompt. You save the overhead of repeating your system instructions and context.

4. Leverage Markdown Over XML/HTML

LLM tokenizers are highly optimized for Markdown. HTML and XML tags cost significantly more tokens because angle brackets and slashes often tokenize separately.

Before (13 tokens)
<h1>Title</h1> <ul><li>Item</li></ul>
After (4 tokens)
# Title - Item

5. Eliminate Explanations

Models love to yap. To save output tokens, strictly forbid prefixes and explanations.

Return ONLY the JSON. No introductory text. No explanations.

6. Rely on Few-Shot Examples (Instead of Long Instructions)

Models learn better from examples than complex rules. Replacing 200 tokens of complicated edge-case rules with two 30-token examples often improves accuracy while saving 140 tokens per call.

7. Strip Whitespace in Code/Data

Multiple spaces and deep indentation eat tokens rapidly. A tab character or sets of 4 spaces often count as distinct tokens. Minify your context data before injecting it.

8. Use English for System Prompts

Even if your application is in German or French, write your system-level instructions in English. GPT-4o's tokenizer (o200k_base) is highly optimized for English, making it significantly cheaper to instruct the model in English and ask for the output in the target language.

9. Declare Defaults Explicitly

If 90% of your data has a common default, tell the model to omit the field if it matches the default. This saves massive amounts of tokens in arrays of JSON objects.

If status is "active", do not include the "status" key.

10. Test and Measure Constantly

The only way to know if a prompt tweak saves money is to measure it. Keep our real-time token calculator open in another tab while you write prompts to see the impact of your edits instantly.