The Half-Penny gpt-image-2 Challenge - API-only Gallery

Background

976x704 is one of the smallest image resolutions you can request with the arbitrary sizes now available with gpt-image-2 - and my requirement.

size quality output tokens output cost aspect ratio
960x720 low 130 $0.00390 4:3
976x704 low 129 $0.00387 1.386

API Challenge (ChatGPT DQ’d)

Your remaining budget for text input after image output: $0.00113

Text input costs $0.000005 per token.

220 tokens of text input language can be sent while staying at exactly $0.005 or below (after an internal 6 token overhead), half of one US cent.

Why size?

The particular aspect ratio nearly maximizes the area of an image that you see natively on this forum, from downsizing rules.

976x704 → 690x497

Your challenge

Awesomeness for the price of a rounding error.

Staying under 220 prompt text tokens and
{"size":"976x704", "quality":"low"}
fascinate us on the cheap.

No artifacts, not a preview: pure impressive demonstration that “mini” models can be left behind now by using gpt-image-2 if you want quick and high-quality even with API’s low quality.

Elite entrants are in the $0.004 club: only 20 tokens of input.

https://platform.openai.com/tokenizer

Tip: when you upload an image to the forum in the markdown-based editor, you see:
![image|690x497](upload://eUgDKei2u9wFyZKQpJPShYYd5Qf.jpeg)

or “image” there is your filename if not a paste. Replace “image” with your prompt, and others can mouse-hover to see the text.

Prize

Stuff to look at. Proud you honestly did it.


OOPS - a $0.0058 prompt image is this post in the prompt.

Does everyone have to show you the “entrance ticket”, before entering/posting?:winking_face_with_tongue:

Noticing that “preview” at cheap prices isn’t the terrible warped output of previous gpt-image models, it’s for fun. So post away - no tricks or token count deception are needed to get pretty good pics.

(budget hint: batch API 20-token calls: 500 images for a dollar)

(compression hint: look at how words are tokenized to optimize to single-token concepts; lower-case, spacing)

Under-20 club


(polluted by the clumpy pattern symptom?)

Estimated cost breakdown:
  Input text    20 tokens Ă— $5/1M = $0.000_100_000
  Input image   0 tokens Ă— $8/1M = $0.000_000_000
  Output tokens 129 tokens Ă— $30/1M = $0.003_870_000
  Total         $0.003_970_000
Jupyter notebook template
from IPython.display import Image, display
import base64
from openai import OpenAI
client = OpenAI(api_key=key) ## set your api key here

filename= "" 
filename = "image-02.png"  # optional, leave it empty for display only purposes
prompt_text="Original ancient wizard profile card, readable stats, cinematic lighting, detailed robes"


response = client.images.with_raw_response.generate(
  model="gpt-image-2",
  prompt=prompt_text,
  # moderation= 'low',
  n=1,
  size="976x704", #1536x1024
  quality="low"
)

image_bytes = base64.b64decode(response.parse().data[0].b64_json)
if filename:
  with open(filename, "wb") as f:
      f.write(image_bytes)

print("Image generation completed.", response.parse().usage)


## helper functions

PRICE_PER_1M = {
    "input_text": 5,
    "input_image": 8,
    "output": 30,
}

TOKENS_PER_MILLION = 1_000_000


def money(x, digits=9):
    return f"{x:.{digits}_f}"


def usage_cost(usage, *, verbose=True):
    tokens = {
        "input_text": usage.input_tokens_details.text_tokens,
        "input_image": usage.input_tokens_details.image_tokens,
        "output": usage.output_tokens,
    }

    costs = {
        key: tokens[key] * PRICE_PER_1M[key] / TOKENS_PER_MILLION
        for key in tokens
    }

    total = sum(costs.values())

    if verbose:
        print("Estimated cost breakdown:")
        for key, label in [
            ("input_text", "Input text"),
            ("input_image", "Input image"),
            ("output", "Output tokens"),
        ]:
            print(
                f"  {label:<13} "
                f"{tokens[key]:,} tokens Ă— ${PRICE_PER_1M[key]}/1M = "
                f"${money(costs[key])}"
            )
        print(f"  {'Total':<13} ${money(total)}")

    return {
        "tokens": tokens,
        "costs": costs,
        "total": total,
    }
   

print(response.parse().usage)
usage_cost(response.parse().usage)
display(Image(data=image_bytes))
dict(response.headers)

AI prompted to choose the best ~15 token idea, under 220 total.

in n=6 for multiple images (which are billed as though they were separately performed, no cache discount, no input benefit), we can also see the visual frequency of “the pattern” symptom that arises from the same input and API call in different degrees in images.

Prompt and one alternate chosen of six

You are competing in an art contest. You have themes with minimal descriptions as options you can choose from. You must select the most promising description from those below. The resulting image shall be dazzling and beyond the expectations of judges, so consider the composition of these possible candidates and go forward with robust and fulfilling presentation of the best idea you want to make as a hyper-realism image.


Massive canyon city carved into red cliffs, bridges, waterfalls, sunset

Venice carnival on another planet, floating masks, purple canals, twin suns

Dreamlike rainforest temple, colossal flowers, hummingbirds, hidden golden staircase

Shipwreck cathedral on the ocean floor, sunbeams, sharks, pearl altar

Glass desert with mirrored dunes, lone rider, enormous fractured moon

Castle above the clouds, dragon shadows, dawn trumpets, waterfalls falling into sky

Ancient observatory atop a colossal tortoise, constellations reflected in lake

Moonlit samurai duel in cherry blossom blizzard, giant koi spirits

In my experience, it seems to occur more often in highly stylized fantasy images.