Mikhail Zakharov, Senior Paid Social

|Meta Ads|May 20, 2026

How to test creatives in Meta Ads in 2026: a working system in the era of Advantage+ and Andromeda

By Mikhail Zakharov, Senior Paid Social

The creative-testing approach most of us have used for the past five years does not work well in 2026. Not because creative teams got worse or audiences burned out, but because Meta is fundamentally different under the hood: Andromeda at the retrieval stage, GEM in the auction, and Entity ID instead of the per-file uniqueness we were used to. The old A/B spreadsheets still look pretty, but the decisions they point to are more and more often useless.

In this article, we walk through how we at Affect rebuilt our creative testing process for the new engine. No fluff and no promises of a secret formula. This is what actually moves the needle in the accounts we run day to day.

Why the classic testing setup no longer works

The old logic was simple. You decided who saw your ads: you picked interests, lookalikes, and behaviors. Creative was the packaging on top of your audience. To figure out which banner pulled harder, you would launch two or three variants, wait, look at CTR and CPA, and keep the winner. Linear and easy to read.

Now Meta does the opposite. Andromeda decides first which creative even makes sense to show a specific person, and only then does the auction kick in. The filter fires before your bid ever enters the system. If your creative did not make it through retrieval, it does not exist. Doubling the budget will not change that.

The main consequence for us as media buyers is that “creative is the new targeting” is no longer a conference slide. It is literal mechanics. You used to assemble an audience around a creative. Now the algorithm assembles creatives around a specific user. That flip breaks half of the testing playbooks we all relied on.

The frustrating part is that nothing has changed on the surface. Same Ads Manager, same buttons, same reports. The rules of the game underneath them are different.

Three pillars of a testing system that holds up

Before we get into the Andromeda technicals, the foundation has to be in place. Without it, no tactic is going to save you.

Knowing what creative to make

You can hire the best designers in the world and still get weak videos out the door if the team does not have a sharp brief. Not “make it pretty and sales-y,” but specifically: what messages have worked, what did not land, and which hypotheses we want to test in the next iteration.

It helps to attach references. Not “this brand is cool,” but “we like how they go from problem to solution in the first two seconds.” Words like “dynamic” and “modern” mean something different to everyone, so it is better to avoid them.

A separate item is agreeing on what the creative is actually for. Looking respectable in the feed is one job. Pushing CPA below a specific number is another. Teams routinely mix the two up and then wonder why the beautiful creative does not sell.

Decisions on data, not on taste

Creative is subjective. Put ten people in a room and you will get ten opinions on which video is the best. The only honest judge is the numbers in the account.

Building hypotheses is useful. Debating them is useful too. But the final call, whether to pause, scale, or iterate, has to lean on the data. Not on “I feel like the second one has more energy.” Feels are fine. Keep watching what CPA says a week from now.

One more thing: do not look at data only inside Ads Manager. At the level of a pivot view across concepts and formats, you see a picture you cannot make out from inside a single campaign. For some teams that is Google Sheets, for others Looker, and for others an internal BI tool. The point is that the whole team is looking at the same view.

Speed beats perfection

The most expensive resource in creative testing is not budget. It is time. While you spend two weeks debating which font to use, competitors are launching and closing three hypotheses.

It is better to ship a creative that gives you a read in a week than to polish a perfect one for a month. Every launch is one more thing you learn about your audience. The more iterations, the deeper your read on what actually works for your product.

That leads to a simple rule: every batch of creatives should be part variations on what already works, part new concepts that take real risk. If you only re-shoot winners, the ceiling keeps dropping. If you only ship experiments, the campaign is unstable. The balance is roughly 60/40 or 70/30 in favor of proven, depending on the stage.

Testing concepts, not files: what Entity ID looks like in practice

This is probably the biggest shift of 2025–2026, and a lot of people still have not picked up on it.

When you upload an ad, Andromeda does not see it as “a file with a unique ID in the account.” The system breaks it down into meaningful elements: what is in the frame, what colors are used, who is speaking, what the text is about, and the tone of the audio. From all of that, it builds a digital fingerprint, an Entity ID. It is the Entity ID, not your ad ID, that competes for impressions.

What this means in practice. Say you are testing baby food ads. You launch 8 creatives: in each one, a mom holds a baby, a bright kitchen is in the background, and the copy reads “natural ingredients.” The background changes, the font changes, and a couple of words in the headline change. To you, these are 8 different tests. Andromeda sees one concept and assigns them a single Entity ID. From there, the 8 ads start fighting each other for the same auction slot. CPM climbs, real reach does not expand, and budget burns.

If those same 8 creatives were genuinely different, for example, a UGC video of an exhausted mom who finally found something her kid will eat without a meltdown, an infographic about the ingredient list, a clip with a pediatrician, and a “we need to feed the baby in the car right now” situation, that is 4–5 different Entity IDs. Each ad gets its own way into the auction and its own slice of the audience.

Before you push a new creative live, ask yourself: how is it different from the previous one in message, visual treatment, and format? If the answer is “it isn’t” on at least two of the three, Andromeda will roll it up with the existing one and the test will not give you any new information.

The takeaway: test concepts, not variations of one concept. Ten distinct ideas will give you more data and more impressions than fifty recolored versions of the same thing.

A useful framework for spinning up ideas across different Entity IDs:

Who we are talking to: mom with a toddler, busy dad, grandma who wants it “the way it used to be.”
What we are promising: natural ingredients, time saved, the baby actually sleeping, a doctor’s stamp of approval.
How we are saying it: UGC, expert format, product demo, slice-of-life moment.

Multiply the variations and you get a matrix of 20–30 genuinely different concepts. From there, it is just a question of what you shoot first.

How much budget to put into testing: why the 30% rule no longer works

For years, the line was: 70% of budget on proven creatives, 30% on tests. In 2026, that formula falls apart. Here is why.

Tests used to live in a separate campaign. There was a testing campaign with a small budget and a top performers campaign with a big one. A week or two in, you would compare and move winners up to the main campaign. Now Advantage+ and simplified account structures work differently: the algorithm decides how to split impressions between creatives inside a single ad set, based on which Entity IDs actually hook users.

If you keep carving out a separate testing campaign with 30% of the budget out of habit, you are effectively cutting your test creatives off from the very audience they need to learn on. Fewer signals mean slower learning, and conclusions arrive late.

The new model:

One main ad set, where old and new creatives live side by side. The algorithm decides on its own where to put more impressions.
Every 7–14 days, refresh with a fresh batch. Not on a fixed calendar, but on signals: frequency on first impressions drops, CPA climbs on a specific concept, or creative fatigue shows up. That is the moment to bring in new material.
At each refresh, the new share is roughly 20–30% of active ads. Out of 10 active creatives, 2–3 are new concepts and the rest are current leaders and solid mid-tier performers.

Testing stops being a share of budget and becomes a built-in process inside the main campaign. It is faster, and the algorithm gets more data to allocate properly.

A standalone, isolated testing campaign still has its place in narrow cases. For example, when you are introducing a fundamentally new format, such as your first UGC or your first long-form video, and you do not want it competing in the main pot right away. Then you give it a couple of weeks in the incubator and move the winners into the main ad set.

There are other tactics too, but we cannot cover them all in one article. It would balloon into something nobody actually finishes reading. So we picked one of the simpler ones.

5 mistakes that regularly burn budget

1. Strict A/B on everything

Classic A/B works when you are testing one variable in clean conditions. In the context of Meta Ads with Advantage+, it is often a luxury you cannot afford. While you wait for statistical significance between two variants, the market has already moved on, and your winner is outdated.

Speed in testing matters more than academic purity. If a creative has had a week to run under reasonable conditions and the signal is clear, that is enough to make a call. Do not turn testing into a dissertation.

2. Meta’s AI optimizations left on by default

Ads Manager has picked up a bunch of checkboxes where Meta offers to improve your creative automatically: expand the image, swap the background, regenerate the copy, add music. It is always pitched as “more conversions.” In reality, the outcome ranges from “nothing changed” to “you would not recognize your brand in the feed.”

Our recommendation: by default, turn off anything that can change the creative without you knowing. If you want to test a specific AI feature, do it on purpose, on a limited ad set, against a control.

3. Tests with no plan

When tests get launched “as ideas come up,” the team works in fits and starts. Today we shoot three videos about a discount, next week four about sustainability, then we drop everything for a holiday push. The account is a mess, and the data is noise.

A test roadmap a month out saves an enormous amount of effort. You write down hypotheses, formats, which personas each is for, and the order they run in. The team can see what to shoot and why, production runs in batches, and after the fact the data tells a clean story. You can pivot, but as the exception, not the default.

4. Pausing a creative too soon

This one stings especially hard in 2026, because Andromeda and GEM both need time to learn. A creative killed on day three for a bad start would often have been profitable if you had given it at least 7 days.

Baseline rule: do not pause before a week is up, unless the numbers look catastrophic from day one. Every kill and relaunch resets the learning, and in the new system the learning phase affects results more than it used to.

5. Testing for the sake of testing

The paradox: in the same week you will see “we do not really test, we just run the one thing that worked once” and “we ship 20 creatives a week, but they are all about the same thing.” Option one leads to a fried bundle. Option two leads to cannibalization through identical Entity IDs and budget poured down the drain.

A healthy cadence is new concepts every 1–2 weeks, not new creatives every day. This matters especially before seasonal peaks: by Black Friday, the holidays, or Mother’s Day, you should show up with a pool of concepts across segments, not a single working bundle.

What this looks like in the account

To keep this from staying at the level of principles, here is a simplified structure you can copy.

One audience: broad, no hard interest layering. Under Advantage+, that is the baseline scenario. Inside it, one campaign and one or two ad sets where 8–15 creatives run at the same time.

How creatives are distributed inside the ad set:

4–6 proven leaders: concepts that hold CPA reliably.
3–5 solid mid-tier: creatives with acceptable performance that have not fatigued yet.
2–3 fresh tests: new concepts added this week.

Every 7–14 days, review the account. Not on a fixed schedule, but on signals: a creative with high impressions but dropping CTR has burned out, so turn it off. A fresh test that is holding can get more weight, plus new variations. A concept that did not land gets paused, logged as a hypothesis, and replaced with the next one.

The bottom line

Creative testing in 2026 is not about launching 50 variants and picking the best one. It is about understanding how the algorithm sees your creatives and feeding it the right signals.

Old logic: “I picked the audience, packaged the creative, optimized the bid.”

New logic: “I uploaded a pool of different concepts and clean conversion data. The algorithm found the people on its own.”

That means the real product you ship as a marketer is not the settings inside Ads Manager. It is quality inputs: creatives that are genuinely different from each other, clean analytics, and a consistent strategy. The rest, including who sees what, when, and on which placement, Meta now does better than we do.

Teams that keep fighting the algorithm by the old rules, slicing audiences into micro-segments and churning out near-identical banners, will keep overpaying. Not because they are bad marketers, but because they are playing by rules that no longer exist.

The teams moving to “feed the algorithm strong concepts” are already seeing it: lower CPA, higher conversions, smoother scaling. There is no magic to it. They simply match the new mechanics Meta is running.

Andromeda is not the last update. There will be more AI, more automation, and fewer manual levers from here. But the logic stays the same: the algorithm finds the audience, while you feed it strong creatives and clean data. The sooner you build that into your process, the cheaper your conversion will be.