Here are a few reasons that this is happening, and a few things you can try tweaking.
While many AI models have made bold claims about how smart they are, their capabilities may be exaggerated by flawed tests. So when you’re not seeing great outputs from AI tools, it’s very likely that it’s not you, it’s them.
We’ve tested our prompts to make sure they work, but they are likely far from perfect. Try providing more context, or the opposite, simplifying them, and see if that works better.
One thing that has often worked for us is breaking up the prompts into multiple steps, so the AI can focus on one task at a time. This helps you catch mistakes and make adjustments as needed.
Within an LLM, different models are good for different tasks. Here are some resources that compare them:
Figma Make
In Figma Make, you can select between Claude and Gemini models. Here’s what one Figma blog post says about what each model is good for.

If using Claude in Figma Make, we recommend using Opus 4.6 for better results.