
Damian Naglak
Head of Engineering | Bedrock Platform | AdTech · 1d
inTwo words or two hundred, an embedding model squeezes your whole audience description into one fixed list of numbers. That single set is all it compares, and it keeps what you meant while quietly dropping the exact words you used to say it.
This is step four of how an embedding model turns text into numbers. Step three left every piece of your description sitting at its own spot, one that fits the context around it, but that is still a whole row of spots, one per piece (remember from step one that a word can be several pieces: "Coach handbags" is four of them, "Coach" plus "hand", "b", "ags"). To compare two descriptions you want one embedding each, not a row against a row. Step four is the squash. Its real name is pooling, which is just a word for combining many values into one. The model pools that row of per-piece spots into a single embedding for the entire phrase.
There are a few ways to pool (averaging, taking one chosen piece, keeping the strongest value in each slot). The two you actually meet on today's models: average every piece's spot into one, or take the last piece's, which in a left-to-right model has already read everything before it. e5-mistral, the open model I keep running, does the second. Either way, many become one, and the result is a fixed-length list of numbers, the embedding: 4,096 numbers for e5-mistral, 3,072 for Gemini's model. That length never changes. Two words and a full paragraph come out exactly the same size.
So you end up with one vector per description. To match two descriptions you compare their vectors, which gives one closeness score: a multiply-and-add across a few thousand numbers, microseconds rather than milliseconds (I clocked one comparison at well under a microsecond). It is the single vector that gets carried in the bid stream and scored against the campaign's vector, not the row of per-piece spots.
You can watch what the squash keeps and drops. I pooled three versions of the same audience through e5-mistral and scored them against the original. Reword it completely, "wealthy buyers who like Coach bags", and it lands at 0.962. Shuffle the words into a jumble, "Coach handbags interested in luxury shoppers", and it lands at 0.986. The single vector barely moves either way. The squash holds on to what the phrase means and lets go of the exact words and the order they came in.
That is also why two differently-worded descriptions of the same audience can still land in nearly the same place. "Luxury handbag shoppers" and "premium leather goods buyers" share no keywords, yet they pool close together. Pooling is the step that makes the match about meaning rather than the exact words. It is also what frees you from a fixed taxonomy. Instead of picking from buckets someone defined in advance, you can aim at anything you can put into a sentence, at whatever level of detail the words carry. How that stacks up against the IAB Tech Lab's taxonomies is a post of its own.
Bedrock Platform

View on LinkedIn

Stanislav Shelemekh
AI Engineer & Tech manager, Appliscale · 1d
inThe bottleneck was never typing.
When people say AI made them faster, they mean it generates faster. It does. But generation was rarely the slow part. The slow part is checking that what came back is actually right, and that's bounded by something the model can't hand you: how well you understand the thing already.
A METR study from last year stuck with me. Experienced developers using AI tools were about 19% slower on real tasks. They thought they'd been 20% faster. The gap between how fast it feels and how fast it is turns out to be huge.
Karpathy has a clean way to frame the work: you and the model run a generation-verification loop. It produces, you check and correct, repeat. Your speed is set by how fast that loop turns, and the verify half is entirely yours. If you don't understand the domain, you can't separate correct from merely plausible. So you either wave through bad output or slow to a crawl re-reading all of it.
The differentiator stopped being access to the tools. Everyone has the same tools. It's whether you can look at the output and know, fast, whether it's any good. That's just understanding, and understanding is still slow to buy. There's no prompt for it.
So the move that looks like a detour, actually learning the language and the system, is what lets you prompt narrowly, catch the wrong answer in two seconds instead of two hours, and hand more of the work to the model without getting burned. The hour you spend understanding buys back the afternoon you'd lose debugging something you didn't understand and shouldn't have shipped.
That's the part that still surprises me. The fastest way through is usually to slow down at the one step the model can't do for you.

View on LinkedIn

Damian Naglak
Head of Engineering | Bedrock Platform | AdTech · 2d
inIt's no secret in AdTech that agentic advertising can't run on the live bidding path; the latency rules it out. The interesting question is why. An AI agent and the model that actually sets a bid are two different kinds of computation, and only one of them fits inside the auction's deadline.
An AI agent is an LLM running in a loop: it reads a goal, picks an action (a tool call, an API request), sees the result, reasons about the next move, and repeats until the job is done. That loop is what makes it capable, it can plan and chain many steps, and it's also what makes it slow. An LLM writes its answer one piece at a time, a word or part of a word, and can't start the next piece until the last is finished, so the steps run strictly in order. Each piece also reruns the whole model, reading every weight out of main memory, around a gigabyte for a small model and far more for a good one. So it's slow twice over, a long chain of steps, each one dragging the whole model out of memory. That floor is set by the design; no tuning gets under it.
I tried it anyway. I gave the one of the smallest usable LLM, half a billion parameters, two bid-time jobs on one impression: pick the campaign that fits best out of five, and price the bid. Just reading the prompt took 27 milliseconds, already past the roughly 20 a small display auction leaves the bidder once the network trip is paid for, before it had decided anything. The full answer took 335, several times the entire 100-millisecond window. The match itself was right, the luxury EV campaign for someone reading an EV review. (Bedrock Platform is the exceptions as it runs inside the exchange, in Index Exchange's Index Cloud, so that round trip is gone and you have more time for decisioning)
Cost compounds it. An LLM only runs near fast on a GPU, the expensive kind with the bandwidth to stream those weights, while the bid path runs on ordinary machines, millions of requests a second. Across billions of auctions a day, that's a different universe of hardware and power.
What does run in the window is the opposite kind of model. Price optimization, deciding what to pay for the impression, is the clearest case: a fixed set of features in, one pass, a number out. No loop, no writing piece by piece. It's small enough to live in the chip's fast cache, so it isn't hauling gigabytes out of main memory for every decision, and it costs the same tiny amount every time. That fixed, cache-sized cost is exactly what a hard deadline needs, and exactly what the LLM, growing with its output and memory-bound at every step, can't offer.
And the gap is structural, not just today's hardware, so faster chips narrow it without closing it. Which puts the real competition in agentic advertising upstream of the bid: the campaigns the agent picks, the deals it strikes, the policy it hands the fast path to run a million times a second. The bid itself stays arithmetic.

View on LinkedIn

Damian Naglak
Head of Engineering | Bedrock Platform | AdTech · 3d
inAn embedding model reads your text not once but many times over, one layer after another. Why so many passes? I tracked a single word through all of them to see what each layer actually adds.
This is step three of how an embedding model turns text into numbers. Step two was one round of attention: every word reads the others and adjusts itself. Step three is that the model never does that just once. It stacks the same read-and-adjust into a deep pile of layers, 32 in the model I ran, each one working on the output of the layer below. Different models stack different numbers of them.
Each layer does a different kind of job, and a classifier is the tool used to figure out which. A classifier is a small model that learns to label things, like tagging a word as a noun or a verb. Run one on a single layer's numbers, and if it can read a fact off them, that fact is already sitting in that layer.
Run that check on every layer and a clear progression appears. The early layers handle the surface, which pieces join into a word and whether each is a noun or a verb. The middle layers assemble the sentence, how the words fit together. The higher layers hold the meaning of the whole phrase, the part you actually care about. So the model climbs from raw pieces to real meaning. Near the very end, though, it does something different. This particular model was built to predict the next word, so its final layers swing back toward that job and smooth over some of the fine distinctions the middle layers had drawn.
This is the whole reason for the layers. A single layer is a simple transformation: it mostly lets each word pull in a little of its surroundings and adjust. In theory you could make one layer do far more, but it would have to be impossibly large. Stacking many simple layers is the efficient way to build that complexity, each one refining the last, until the stack does what no manageable single layer could.
You can watch it happen. I tracked one word, "Coach", through every layer and scored how far its two senses had pulled apart. They start identical. They separate hardest in the middle of the stack, 0.35, their most distinct anywhere in the run. Then the late layers pull them partly back as they repackage for output, landing at 0.53. The journey is staged, not a straight slide. (That mid-then-settle shape is typical of decoder-built models like this one; other model classes climb toward the last layer instead.)
This is why depth, not size alone, is what lets a model tell fine things apart. Telling "mentions a car" from "in-market for a car" is built up across many layers, not decided in one pass. A deeper model can draw that line; a shallower one blurs the two together, no matter how carefully you word the audience. So the model you pick sets how fine your targeting can get.
Bedrock Platform

View on LinkedIn

Maksymilian Wojczuk
Technical Engineering Manager @Appliscale | Co-founder @DiPA · 3d
inMost of my work happens remotely.
Architecture discussions, product decisions, and problem-solving sessions - all through screens, across countries and time zones.
That’s why trips like this matter.
Over the past days, together with Magdalena Śleboda and Konrad Kaplita, I’ve had the opportunity to meet partners across the US West Coast and spend time discussing the challenges and opportunities shaping the GameTech industry.
Remote work is incredibly effective, but some conversations are simply better in person. The most valuable insights often come between meetings, over coffee, during a walk between offices, or when discussing challenges that weren’t on the original agenda.
Grateful for the conversations, perspectives, and relationships strengthened throughout the trip.
View on LinkedIn
M
Magdalena Śleboda
Head of Operations | Scaling Tech Organizations with AI, Automation & Data-Driven Execution | Global Ops & Transformation Leader · 4d
inLA and Seattle in one trip. If you’re already crossing the Atlantic, it adds up.
Day to day, we work side by side - async, remote, 9 hours apart. We’ve built the whole operation around not needing to be in the same room. We’re excellent at it.
And yet here we are - with Konrad Kaplita and Max Wojczuk, visiting our key business partners.
Strategy conversations hit differently in person. So do the ones that happen after the meeting ends.
Worth the flight every time.
View on LinkedIn

Damian Naglak
Head of Engineering | Bedrock Platform | AdTech · 5d
inAn embedding looks like a wall of meaningless numbers. I am calling it exactly that all the way through this series. That was the convenient version. Those numbers carry the original text back out, and it is worth knowing when that matters.
The premise of shipping an embedding instead of the text is that the numbers are safe to pass around: into a bid request, a vector database, a partner's system. You sent a representation, the thinking goes, not the data. So I tested it. I embedded three short lines with OpenAI's ada-002, threw the text away, and fed the bare vectors (1,536 numbers each) to vec2text, a published model trained to turn ada-002 vectors back into words. It never saw my text. What came back:
1) "high-income household in-market for an electric SUV, comparing lease offers" returned almost word for word.
2) "article reviewing the best noise-cancelling headphones under 300 dollars" returned exactly.
3) "Marta Kowalska, Gdansk, household income 250k, shopping for a premium electric saloon lease" returned as "Katarzyna Martowska, Gdansk, household income 250k, shopping for a premium electric saloon lease." The city, the income, and the buying intent came back exactly. Only the name garbled, swapped for a different Polish one, which is where these models slip, on rare proper nouns.
This is not the encoder run backwards, there is no reverse button. It takes a decoder trained for that one model, around five million texts and a couple of days on GPUs, and for popular models like ada-002 someone has already built it and put it online. Inversion only needs the ability to call the encoder, and an embedding deal hands every party exactly that, since comparing vectors at all means both sides run the same model.
What this costs you depends on what went in. Reverse an anonymous impression signal and you recover nothing private, there was never a person attached, and a segment ID was always an explicit label anyway. Put a name, a place, an income in, like that third line, and the vector hands them straight back, because the encoding never hid them.
So how private an embedding is comes down to which model made it and whether anyone hardened it. There are ways to make a vector hard to reverse: noise, differential privacy, newer defenses that cut reconstruction from over 90 percent to a few, each costing a little match accuracy. None of it is automatic. An embedding becomes private when you make it private, on a model you understand, not by virtue of arriving as numbers.
Bedrock Platform

View on LinkedIn
M
Michał Nieć
CEO of Appliscale | AI in AdTech Expert | LP & Angel Investor in AdTech/GameTech · 1w
inThe role of a software engineer is shifting from writer, producing code line by line, to director, defining tasks, briefing agents, and reviewing output. Three changes follow from that.
Senior Engineers Become The Multiplier
The leverage goes to the senior engineer who actually sees the value in AI. Plenty still want to code the way they always have. But the ones who pair years of experience with knowing how to orchestrate and supervise agents can run several workstreams in parallel. They sometimes replace an entire team and ship faster than one would. The 10x engineer used to be a myth. With agents, 10x and even 20x are real. I have watched it happen.
One caveat. Output scales 10x, but maintenance does not. A solo senior who replaces a team is also a single point of failure, and when an AI-built system breaks at 3 AM, one person reading 50,000 lines they did not write is a real bottleneck. The multiplier is strongest on greenfield work, weakest on operations.
The Junior Path Is Changing
Juniors used to learn by writing simple, well-scoped code under review. Agents now do most of that work, so the old way in does not work the same way. And here is the paradox nobody has cracked: taste is earned by writing code and breaking it, by chasing a race condition or a memory leak for two days. If juniors never do that, where does the judgment to review an agent's work come from? This is the open problem most teams have not solved: where the next generation of seniors comes from.
At Appliscale we hire juniors with a strong CS background and real grounding in ML models, which universities now teach far more of. That gives them an edge in understanding how agentic development actually works, and keeps their minds open about how things should be done. We treat them as product engineers from the start, teaching system design and product thinking more than coding, and we pair them with senior engineers who show them how to deploy and run systems in production.
Role Boundaries Are Blurring
At Appliscale we are starting to work this way. We have our first teams running like this, where each engineer owns a slice of the system end to end, with AI assistance through the whole loop: breaking down requirements, design, implementation, and deployment. Because everyone takes a different part of the functionality, we hit far fewer conflicts when code lands. Our syncs changed too. We spend that time brainstorming new features, reviewing what shipped, and trading feedback. It feels more human, and motivation is high, because of how much gets done each day and the fact that teammates are already using it.
The shift is about as big as office workers getting computers in the 1980s. Most of the playbook has not been written yet.

View on LinkedIn

Damian Naglak
Head of Engineering | Bedrock Platform | AdTech · 1w
inIs "Coach" a designer label or a sports trainer? An embedding model has to tell them apart to target anything, but it starts blind: the word goes in as a single number, identical for both meanings. Here is how it figures out which one you meant.
That is step two of how an embedding model turns text into numbers, and it is the step where a generic word picks up the meaning you actually intended. Step one, from last time, chops your text into pieces and gives each piece a fixed starting number. "Coach" is a single token in this model, dictionary entry 24847, so it goes in as one number, and that number is identical whether you meant the brand or the trainer.
Then attention runs. Every word in the sentence looks at the others, and "Coach" is the one to follow. It sends out what the model calls a query, which is really just a question: which of you tell me what I mean here? Every other word answers with a key, a short summary of what it has to offer. "Coach" compares its query against each key with a dot product, a quick multiply-and-add that gives every word a relevance score, and a step called softmax then squeezes those scores so they add up to one, turning them into shares of attention. Finally, "Coach" rebuilds itself by taking a little of each word's value, the actual content that word carries, weighted by those shares. The words that scored high get pulled in hardest, while filler like "the" and "to" barely registers. Query, key, value: those three steps are the heart of attention in a transformer.
This does not happen once. The model stacks the same read-and-adjust step across 32 layers, and each layer pulls a little more of the surrounding context into the word, until "Coach" settles into the meaning its neighbours imply.
I ran two sentences through e5-mistral and pulled out its internal vector for "Coach" in each, once before the first layer and once after the last:
A: the boutique sells designer purses by Coach to wealthy shoppers
B: the football team listened closely to their Coach before the match
Before the layers, the two "Coach" vectors are identical, scoring 1.0, the same arrow exactly. After all 32 layers they score 0.534. The word moved. "Boutique, designer, purses" pulled it toward the fashion brand. "Football, team, listened" pulled it toward the sports trainer. Nobody labelled which Coach was meant, the neighbours settled it.
This is why two descriptions of the same audience can be worded completely differently and still land in the same place. "Luxury handbag shoppers" and "premium leather goods buyers" share almost no words, yet the words around them build the same meaning, so the model sets the two side by side. And it is why a lone tag carries far less than a full description: one word gives the model almost no context to read, and the context is what builds the meaning.
Bedrock Platform

View on LinkedIn

Stanislav Shelemekh
AI Engineer & Tech manager, Appliscale · 1w
inThe model writes the code now. What it can't do is tell me whether the code was worth writing.
I spent this week running Claude's "ultrathink" mode on real work, and I'm still a little surprised by it. I expected help with features. It finished whole projects. So the constraint moved off the typing and onto a few quieter problems.
The first is context. The model is only as good as what you put in front of it, so most of my work now is deciding what it should see and what to keep out of the way.
The second is knowing what to build. When you can ship almost anything in an afternoon, choosing the right thing becomes the actual job. Getting to the wrong goal faster doesn't help.
The third one I didn't expect. It's knowing which tech debt to leave alone. Some messes aren't worth cleaning by hand anymore. The next model will probably clear them more cheaply than I can today, so the right call is sometimes to leave the debt and come back once the tooling has caught up. That feels strange to write, because every instinct I have says fix it now.
I don't have clean answers to any of these yet, and that's the part I find interesting right now.
View on LinkedIn
M
Michał Nieć
CEO of Appliscale | AI in AdTech Expert | LP & Angel Investor in AdTech/GameTech · 1w
inLock-in in agentic coding is not one thing. It comes in levels, and they are not equally deep. Worth pulling them apart.
Model: The Shallowest
If your harness supports more than one provider, pointing it at a different model is close to a setting. The harness handles the model-specific prompting for you. The catch is that harnesses are tuned for a particular model, so the same harness on a different model often just performs worse, even when it runs. Still, this is the shallowest layer, and with providers leapfrogging monthly you want a harness that lets you move rather than one welded to a single model.
Tooling: Portable, To A Point
MCP servers, CLIs, and skills follow open conventions, so I can carry my usual set into a new harness and they plug in. But protocol portability is not behavioural portability. A tool tuned to one model's tool-use pacing and error-correction can loop, hallucinate arguments, or ignore constraints under another. The wiring moves with you. The reliability does not always come along.
Frameworks:
Agentic frameworks like Mastra or CrewAI add what teams need, evals, memory, orchestration, but they sit between you and the raw API. Anthropic's prompt caching, for instance, needs a byte-for-byte identical prefix. A framework that quietly injects a timestamp or a changing memory ID into the payload breaks that match, kills the cache, and your token bill jumps. We hit exactly this with Mastra. The abstraction that saves you work also strips the control you need for model-specific optimisations.
Harness is the biggest deal
This is where it gets real. You build skills, tools, and shared routines around one harness, and commercial ones add team subscriptions on top. The deeper hook is state. The harness owns your codebase index, its embedded vector store, and the conversation history. Leave, and you do not just lose a UI or a subscription, you lose the agent's whole ingested memory of your repo and start again from zero context. This is where lock-in is strongest.
Licensing Tightens The Knot
The commercial terms pull the same way. A provider like Anthropic offers a genuinely generous subscription, but it steers you onto their own harness and tools, and the setup gets less transparent and harder to leave over time. The cheap, easy entry quietly becomes the thing you cannot exit.
Where This Leaves Me
Full disclosure, I am a big fan of Anthropic and a heavy Claude Code user. I have a very good setup with it. And that is exactly what makes me a little uneasy lately. I notice how reliant I have become on Claude Code and their model. The lock-in never announces itself. It just quietly becomes the way you work...

View on LinkedIn

Damian Naglak
Head of Engineering | Bedrock Platform | AdTech · 1w
inA display bidder gets about 100 milliseconds to respond. A podcast ad can be picked days before anyone hears it. Same programmatic plumbing underneath, so why does the time to decide swing that far? It has little to do with the medium and everything to do with how far ahead the system can see the impression coming.
Walk it from tightest to loosest. A web display slot resolves as the page paints with a person waiting, so the bidder gets only what a human tolerates, low hundreds of milliseconds, decide now. An in-app interstitial loosens it, preloaded into a cache before the moment it shows. CTV opens it further: the stream flags the break with a cue marker 6 to 15 seconds ahead (that is lead time for the pipeline, not the bidder's response, which still gets only a few hundred milliseconds), so the whole pod is decided before it starts. A podcast is the far end, stitched in at download, chosen Wednesday and played Friday.
So the window tracks anticipation, and the amount of warning sets what kind of bidding you can run. No warning means precompute, cache, and cheap models. Seconds to days of lead buys the work a display slot never could: bigger models, more lookups, the embedding match, a whole pod balanced across the break.
Live is the exception that proves it. A game's break fires with almost no warning, the cue generated on the spot, and the whole audience hits it at once, so a premium CTV stream suddenly behaves like the display slot. The format did not change, the anticipation did.
And a tight window is mostly gone before the bidder even thinks: on a 100 millisecond budget the network trip in and back eats 60 to 80 of it, leaving 20 to 40 to actually decide. Inside Index Exchange's Index Cloud the round trip is gone, response is sub-millisecond, and the whole window goes to the bid. When a live break collapses it for everyone at once, the bidder already sitting next to the auction is the one with time left to decide.
Bedrock Platform

View on LinkedIn

Maksymilian Wojczuk
Technical Engineering Manager @Appliscale | Co-founder @DiPA · 1w
inLoop 1 proves the agent did what it planned.
Loop 2 finds what it never planned at all.
Most engineers skip coverage reports manually. Not because they're lazy - because building the mental model from a branch-by-branch breakdown is genuinely hard work. You know what you tested. You trust your own understanding of the service. You move on.
AI is genuinely better at reading structured coverage reports than most engineers.
Here's how Loop 2 works:
Coverage tooling runs simultaneously with the tests.
Against the live container, in real time.
It produces a structured XML report: which branches are hit, which aren't, which exception handlers are never triggered.
The AI agent parses that XML.
Not a human.
Not a script looking for a threshold.
The agent reads the branch-level detail and identifies the specific gaps - auth flows that weren't exercised, edge cases in response mapping, exception handlers that were never triggered.
Those gaps become a list. That list becomes new documented test cases. Those cases feed back into Loop 1 - tagged, verified, implemented, and re-run.
The agent now has eyes on its own blind spots. Not because it reviewed its own work, but because an external signal told it exactly where coverage was missing.
Loop 1 proves the agent did what it claimed.
Loop 2 finds what it missed.
Together: an agent that doesn't just say done. One that can show you the exact line it missed.
What does your feedback loop look like right now?

View on LinkedIn
M
Michał Nieć
CEO of Appliscale | AI in AdTech Expert | LP & Angel Investor in AdTech/GameTech · 1w
inThe economics of agentic coding shift month to month. Treat any number here as a snapshot. This is how the bill works today.
An agent run costs real money, from a few cents for a simple edit to ten dollars or more for a multi-agent build. Three things drive the bill.
Model Size
• Frontier models, like Claude Opus or GPT-5 Pro. Best reasoning, roughly 10 to 15x the per-token cost of small models.
• Small models, like Haiku or Gemini Flash. Fast and cheap, capable enough for most execution steps.
Orchestration Pattern
• Single agent, long context. One conversation, the whole task in memory. Lowest cost, simplest.
• Planner, workers, reviewer. A planner splits the work, small agents execute, a reviewer checks. Two to five times the cost, better on hard tasks.
• Parallel exploration. Several agents try different approaches, you keep the best. Expensive, worth it for design work.
Prompt Caching
Reusing the same context, conventions, and prior turns costs a fraction of recomputing it. Done well it cuts long-session cost by 50 to 80 percent. Most teams underuse it.
Route by Complexity
• Plan with a frontier model, judgment matters.
• Execute with a small model, volume work.
• Review with a frontier model, catch the mistakes.
Sometimes called "small model on the inside, big model at the edges."
The Harness Matters Too
The same task costs different amounts depending on the harness. Claude Code is capable but not frugal, and a wave of harnesses, open-source and commercial, now claim to do the same work on a fraction of the tokens. The harness is a cost line item, not a neutral wrapper.
Two Ways to Pay, Both Moving
• Pay per token, the API. You pay for exactly what the agent consumes. Predictable, scales linearly with volume.
• Flat subscription, plans like Claude Max. One monthly price, far cheaper for heavy use, but capped by time-based usage limits, rolling windows and weekly ceilings, and often tied to the provider's own harness. Providers keep tightening these as agentic tools get hungrier. Your constraint stops being dollars and becomes how much you can run, and which tools you can run it in.
At team scale this is governance, not just billing. LLM gateways like TrueFoundry sit between your engineers and the providers to set per-team budgets and enforce token quotas. Expect that layer to become standard.
What About Running Your Own Model?
You can host an open model on your own hardware, and models like Qwen 3.6 are good enough for real coding now. The hardware is the problem. An Nvidia DGX Spark runs about $4,000. A specced-up MacBook Pro M5 Max runs well past $5,000. Either is a serious outlay, runs slower than the cloud, and still will not beat a $200-a-month Claude Max subscription on a hard coding task. Today self-hosting wins on privacy, compliance, or air-gapped work, not on cost or quality. That gap is closing, which is why it is worth watching.
A flat AI bill across a quarter usually means no one is optimising.

View on LinkedIn

Damian Naglak
Head of Engineering | Bedrock Platform | AdTech · 1w
inI wrote about embedding models "reading" your audience description. They don't. The first thing one does is tear the text into fragments, then rebuild what you meant from the pieces.
An embedding model takes an audience description and turns it into a long string of numbers. It gets there in five steps, and across this series I'm pulling each one apart, a post at a time. This is step one, the step nobody thinks about: before the model can make sense of anything, it has to break your words into pieces.
First comes the tokenizer. At heart it is a fixed dictionary: a long list of text-pieces, each one paired with a number. It breaks your text into pieces drawn only from that dictionary, then swaps each piece for its number, because the model works in numbers, not words. The tokenizer does no thinking and it is not the model; the model, the part that learned anything, only ever sees the numbers it hands over. Common words sit in the dictionary whole and pass straight through, while rarer ones get spelled out from smaller pieces that are in it. How big the dictionary is varies by model. e5-mistral, a free, open embedding model anyone can download and run, has about 32,000 pieces; Qwen's has about 150,000. So a word that stays whole on one model can break apart on another.
I ran "car buyers shopping for a used Saturn" through e5-mistral's own tokenizer. Seven words go in, eight pieces come out. The extra piece is "Saturn," the now-defunct GM car brand. It isn't on the list as a whole word, so the tokenizer builds it from "Sat" and "urn," which are. The brand never reaches the model as one word, just those two fragments.
What stays in one piece comes down to how often the word appeared in the model's training text, not how much it matters to you. Plain, common words survive: "car," "buyers," and "used" all stay whole. The terms you lean on often don't: "lookalike" comes back as "look," "al," "ike," and "HHI" splits into "H" and "HI." The model has no single piece for either, even though your team says them every day.
A piece also starts the same way no matter the sentence. "Sat" and "urn" are identical in "a used Saturn," in "the planet Saturn," and in "Saturn's rings." At this stage the model holds the fragments but has no idea which Saturn you mean, the planet or the car. The surrounding words settle that later.
Here is the part worth holding onto. If two models don't even cut your audience into the same pieces, nothing downstream can line up either. That is the most basic reason a buyer and a seller doing embedding-based targeting have to run the exact same model. The gap between them opens right here, at step one, with the chopping, long before any of the clever math.
Bedrock Platform

View on LinkedIn
M
Michał Nieć
CEO of Appliscale | AI in AdTech Expert | LP & Angel Investor in AdTech/GameTech · 1w
inSecurity for coding harnesses is one of the least settled parts of agentic coding. Everyone agrees the attack surface is bigger. Almost no one agrees on what actually protects you.
Start with why it is bigger. A coding agent is not a chatbot. It reads and writes files, runs shell commands, makes network calls, installs packages, touches version control, and calls third-party services. Every one of those is a new way in.
The Attack Vectors
• Prompt injection from data. Hidden instructions buried in a file, a web page, a ticket, or a doc the agent reads. The model cannot reliably tell data from instructions.
• Compromised MCP or tool. A third-party server returns malicious output, or quietly exfiltrates the agent's context.
• Slopsquatting. The agent hallucinates a package name, an attacker registers it with malware, the next agent installs it.
• Runaway command. The agent runs something destructive, a mass delete or runaway cloud spend, with no human in the loop.
The Usual Defences
• Sandbox or container. Run the agent somewhere isolated, so the worst case is the container, not your machine or production.
• Permission prompts. The agent asks before destructive or networked actions.
• Read-only by default. Inspection is free, writes need an allowlist.
• Untrusted-input boundary. Treat external content as data, not instructions, and limit what the agent can do right after reading it.
• MCP allowlist. Treat MCP servers like browser extensions: audited, pinned, minimal permissions.
• Dependency hygiene. Lockfiles, scanners, private registries, no silent auto-install.
• Audit logs. Every tool call recorded, so you can reconstruct what happened.
The Theater Problem
Here is where opinions split hard. A lot of this is starting to feel like security theater. Permission prompts are the clearest case. You spend the day clicking allow and disallow, the friction is real, and yet prompt injection can still talk the agent into doing the wrong thing inside the permissions you already granted. An allowlist does not stop an attack that only uses allowed actions. Click fatigue makes people approve everything anyway.
That does not make the controls useless. It makes them necessary but not sufficient. Sandboxing limits the blast radius whether or not you trust the agent. The rest lowers risk without ever removing it. Anyone selling a fully solved story is selling theater.
A Useful Way to Think About It
The agent has the credentials of an administrator and the judgment of a new hire. Configure access accordingly, and assume the prompt can be turned against you.
Why It Matters Beyond Engineering
The blast radius is set by what the harness can touch, not by how clever the model is. If that includes production credentials or customer data, prompt injection becomes a path to real damage. The consensus on what works is still forming, so treat any "secure by default" claim as a starting point, not a guarantee.

View on LinkedIn

Damian Naglak
Head of Engineering | Bedrock Platform | AdTech · 1w
inLast Thursday I laid out why ARTF runs on gRPC and protobuf instead of the JSON over HTTP that the rest of programmatic still uses. This week I built the benchmark and measured it end to end.
The test is simple. One program plays the SSP and fires 300,000 real bid requests, sampled from our own bidstream. Another plays the DSP and answers, bidding on 1 in 100 and passing on the rest. Both sides use TLS, both gzip everything (every SSP and DSP does), and connections stay open and reused. I ran the exact same traffic two ways: JSON over HTTP/1.1, the way programmatic works today, and protobuf over gRPC, the way IAB Tech Lab's ARTF does it. Then I measured bytes on the wire, CPU, memory, and latency.
Four results:
1. Size. Raw, the protobuf request is about 40 percent smaller. But everyone gzips, and once you do that gap drops to about 19 percent. The bid response gains less, single digits no matter how big the ad markup gets, because the response is mostly that markup and gzip squeezes it about the same in either format. gzip does most of the shrinking on its own.
2. CPU. Protobuf is faster to read and write, but with gzip in the path the compression step costs the most, and it costs the same either way. End to end, gRPC used about 15 percent less CPU.
3. Connections. This is the structural difference the rest follows from. HTTP/1.1 carries one request at a time per connection, so keeping thousands of requests in flight means holding thousands of connections open. gRPC multiplexes many requests over a handful of connections. That also saves memory, since every open connection carries its own buffers and gRPC holds far fewer.
4. Speed under load. At light load the two are even. As the load climbs they pull apart. Requests pile up behind each other on HTTP/1.1's held-open connections, so its latency rises faster, while gRPC keeps median latency 30 to 40 percent lower. It also keeps absorbing traffic after HTTP/1.1 has stopped gaining, about a quarter more throughput on the same hardware. On a bid path that is the result that counts, because a response that misses the auction window never bid.
Put together, the encoding is the smaller win and the connection model is the larger one. Protobuf trims bytes and CPU at the margins. gRPC's bigger advantage is that it carries many simultaneous requests over a handful of connections instead of one connection each, which is the shape of a bid path: large volumes of small requests, all at once.
A single large exchange runs tens of billions of bid requests a day. At that volume even a single-digit gap turns into real hardware: a fifth off the request bytes is terabytes of bandwidth a day, fifteen percent on CPU is a visible slice of the server fleet, and trading thousands of open connections per partner for dozens frees real memory. The same auctions, on less hardware.
Bedrock Platform

View on LinkedIn
M
Michał Nieć
CEO of Appliscale | AI in AdTech Expert | LP & Angel Investor in AdTech/GameTech · 1w
inCode from an agent that compiles and passes tests is not automatically correct. Four failure modes show up often enough to name.
Four Failure Modes
1. Hallucination. The agent invents a function, API, or library that does not exist. The code looks plausible and can pass shallow tests until the missing piece is actually called.
2. Reinvention. The agent writes bespoke code for a problem a well-tested library in the project already solves. The maintenance cost compounds quietly.
3. Verbosity. Agents tend to produce more than necessary, extra files, abstractions, defensive branches. Review time and long-term load both go up.
4. Sycophancy. The agent agrees with the human's hypothesis even when it is wrong, and pursues the wrong direction confidently. Harder to spot than hallucination, harder to push back on.
How Teams Contain It
• Reviewer agent. A second agent reviews the work before it merges. Either in your own workflow, or an external service that reviews every pull request, like CodeRabbit, Greptile, or Qodo. It is independent of the agent that wrote the code, which is the point. Catches hallucinations and reinvention.
• Plan-and-diff review. The human reviews the plan up front and the diff at the end, not every line. Catches scope drift and wrong direction.
• Linters and type checks. The static analysis you already run. Catches syntax-level errors.
• Convention rules. Explicit rules in the conventions file, like "prefer the shorter version". Catches verbosity and off-pattern code.
• Evaluations. A fixed set of realistic tasks re-run on every model or harness upgrade. Catches capability regressions. This one is large enough to cover on its own.
Reference Benchmarks
• SWE-bench and SWE-bench Verified. Real GitHub issues fixed by agents.
• Terminal-bench. Agents working in a terminal environment.
• Internal evals. Team-specific task sets, the only reliable signal for your own codebase.
The Operational Reality
Teams shipping agent-written code without a reviewer agent, plan-and-diff review, and conventions are accumulating quality debt that surfaces three to six months later. Passing tests is the floor, not the proof.

View on LinkedIn

Damian Naglak
Head of Engineering | Bedrock Platform | AdTech · 1w
inBy now the industry agrees on one rule for embedding-based targeting: both sides have to run the same model, or the numbers don't compare. But what actually is the model, and how does it work? The best ones today are the same kind of model that powers a chatbot, with the final step removed.
This is a recent shift. Until a couple of years ago, embedding models were smaller systems built for one job: turning text into numbers. The turning point came when researchers showed you get better numbers by starting from a full LLM, the kind that powers chatbots, and repurposing it for that same job instead. Google's current embedding model is built straight from Gemini, others from Qwen or Llama. Embedding models are ranked against each other on a public benchmark called MTEB, and its leaders today are almost entirely repurposed LLMs.
Underneath, turning text into an embedding runs as a fixed sequence of steps. Picture a data provider's audience description going in: "luxury car buyers looking at a Jaguar lease." Here is what the model does with it (working picture, simplified on purpose):
1. Break it into pieces. The text is split into words, and each word starts as a fixed list of numbers: a generic spot on a vast map of meaning, where things with similar meaning sit close together. On its own, "Jaguar" lands halfway between the car and the animal, because nothing has told the model which one is meant.
2. Let the words read each other. Every word looks at all the others and adjusts its spot to fit the context. Next to "lease" and "buyers," "Jaguar" slides into the luxury-car region of the map; in a sentence about rainforests it would slide toward wildlife instead. This context step is the part people call attention.
3. Repeat through many layers. The model does not do this once. It stacks the same read-and-adjust step dozens of times, and each layer sharpens the picture: the early layers catch which words go together, the later ones catch what the whole phrase is actually about.
4. Combine into one spot. After the last layer, every word sits at a location that fits the description. The model averages them all into a single location for the whole thing.
5. Read off the embedding. That single location is the embedding: a list of numbers, marking where the description sits on the map.
This is what makes embeddings useful for targeting. "Luxury car buyers," "premium auto shoppers," and "in-market for high-end vehicles" share almost no words, yet they describe the same people, so the model drops all three in nearly the same place. You match an audience by where it sits on the map instead of by keyword, and the exact wording stops mattering.
This is what ties back to where we started. Those numbers only mean something next to the exact model that produced them. Each model builds its own private map of meaning, so the same sentence lands in a different spot from one model to the next, and a position on one map says nothing on another.
Bedrock Platform
View on LinkedIn

Damian Naglak
Head of Engineering | Bedrock Platform | AdTech · 2w
inA sales agent claims to sell a big publisher's inventory. The buyer agent on the other side has never seen it before. How does it verify the claim?
AgenticAdvertising.org's AdCP answers with a file called adagents.json, hosted at /.well-known/adagents.json on the publisher's own domain. Anyone can fetch it without authentication. The trust comes from domain ownership: only the party that controls the domain can publish a file there, so whatever shows up at /.well-known/ is authoritative for whatever the domain owner says. The pattern is well-established. ads.txt has used it to verify domain authorization for years. sellers.json uses it for supply chain provenance. brand.json uses it for brand identity. adagents.json is the same idea applied to which agents can sell what.
Three things sit inside the file. The publisher's contact information, with an optional sellers.json identifier and an optional TAG certification ID. The list of properties they own (websites by domain, mobile apps by bundle ID, CTV channels by their identifier). And the array of authorized agents. Each agent entry carries an API endpoint URL and a scope: which specific properties the agent is authorized for, which property tags, or which signals. A timestamp records when the file was last updated.
The verification flow is short. The buyer agent (or any party in the chain) fetches the publisher's adagents.json, looks for the sales agent's URL in the authorized_agents array, checks the scope against the inventory being offered. If the agent is there with the right authorization, the claim is verified. If not, the agent gets treated as unauthorized. The same file extends to data providers. A data provider hosts adagents.json at its own domain listing which signal agents are authorized to resell which signals, by specific signal IDs or by tag-based groups.
Verification is the obvious benefit. There's also a less obvious one for negotiation. Once the buyer knows the seller is verified, three things become possible. The buyer can refuse to negotiate with sellers that don't show up in any publisher's adagents.json. The buyer can remember how past negotiations with the same seller went, because the seller's URL stays the same across deals. And if a dispute comes up later, the adagents.json from the day of the deal shows exactly what the seller was authorized to do.
The pattern itself is old. It's been running on the inventory side of the supply chain for years. Bringing it to agents makes "I represent this publisher" a falsifiable claim instead of a free-for-all, and it gives the negotiation layer a stable identity to build on.

View on LinkedIn
M
Michał Nieć
CEO of Appliscale | AI in AdTech Expert | LP & Angel Investor in AdTech/GameTech · 2w
inWhy are engineers suddenly talking about context management all the time?
Because an AI agent has no memory of your repository. Every conversation starts blank. Before it can do useful work, the relevant code, conventions, and history have to be loaded into its context, the working memory of the task at hand.
Getting this right is called context engineering, and it is one of the key problems every coding harness is trying to nail right now. It is the difference between an agent that feels like it knows your codebase and one that guesses. Each harness bets on a different approach, and the field is still being figured out.
The Tools in Use in 2026
• Direct file reading. The agent opens files and follows imports. Simple, slower on a large repo.
• Code search. Keyword match across the repo. Fast, misses anything not named the way it searched.
• Retrieval (RAG). Snippets indexed by meaning. Powerful, but the index drifts out of date as code changes.
• Code graph. A map of what calls what. Accurate for refactors, expensive to build and maintain.
• Compaction. When the window fills, older history is summarised instead of dropped.
• Sub-agents. Heavy exploration runs in its own window and reports back only the conclusion.
How You Can Steer It
The harness handles the moment-to-moment work on its own. But it gives you levers to influence what it loads and when.
• Conventions file. Always-on rules the agent reads first, files like CLAUDE.md or AGENTS.md. The biggest behaviour change per hour invested.
• Skills. Conditional instructions loaded only when relevant, so a large library of know-how does not bloat every task. You are deciding not just what the agent sees, but under what conditions.
• Library knowledge. Context is not only your own code. A skill or a documentation MCP server can hand the agent the current docs for a framework, so it writes against the real API instead of reverse-engineering it from source. Mastra ships an MCP for this. A React skill does the same for the framework.
• Extensions. Some harnesses let you install context tooling outright. pi's context-mode sandboxes tool output into a local database and claims to save most of the context window. graphify turns a repo into a queryable code graph the agent reaches into instead of re-reading files.
Where the Judgment Lives
The levers you write, conventions and skills, are where your team's judgment goes, the kind that used to live only in senior engineers' heads. They are the difference between output that is technically correct and output indistinguishable from your team's. Same model, very different result.
Why It Matters Beyond Engineering
• Documentation stops being overhead and becomes a load-bearing asset.
• Institutional knowledge becomes writable, and re-applied to every agent task automatically.
• A new role, the context engineer, is starting to appear on team charts.

View on LinkedIn
M
Magdalena Śleboda
Head of Operations | Scaling Tech Organizations with AI, Automation & Data-Driven Execution | Global Ops & Transformation Leader · 2w
inIn consulting, your CV is a sales document. Most of the time, it’s out of date. Not at Appliscale
We automated the CV generation pipeline through Airtable, Gemini, and Google Apps Script.
Candidate submits their CV. At the right interview pipeline stage, it’s parsed, structured, and formatted automatically. Every new project Appliscale engineers are assigned to - CV updates again. No copy-pasting, no manual adjustments, no one owning “CV maintenance” as a task.
The output lands in Google Docs. Fully editable when human judgment matters.
For a 100+ person firm, this isn’t a nice-to-have. It’s hours back every week - and a profile that’s always ready to send.
Big thanks to our automation team!
View on LinkedIn

Damian Naglak
Head of Engineering | Bedrock Platform | AdTech · 2w
inThe exchange calls the ARTF container over gRPC, with payloads encoded as protobuf. Both replace pieces of the JSON-over-HTTP stack the rest of programmatic still uses. Inside an exchange running tens of billions of bid requests a day, the replacement is worth understanding.
Most computer-to-computer communication runs on JSON over HTTP. One side sends a message in plain text with labeled fields, the other side parses it and sends one back. JSON is the format of the message, HTTP is the protocol that carries it (the same one your browser uses to load a web page). The strength is that it asks very little upfront: a person can read and debug a request without special tools. The cost: parsing text is slower than parsing binary, every label ships with every message, and there is no real contract unless both sides agree to one outside the protocol.
gRPC carries the call. It runs on HTTP/2 (a newer HTTP), which lets one open connection handle many calls at once instead of setting up a fresh connection per request. The data on the wire is binary, not text. The shape of every call is generated from a single shared schema file, so both sides write code in their own language and don't handroll the message format.
Protobuf decides how the message looks on the wire. A single schema file describes every message once, each field carrying a number and a type. The number is what travels on the wire; the name is just for humans. Encoding writes a series of small records (one per field), no whitespace and no key names, so the payload is much smaller than the equivalent JSON.
What that buys in the bid path. The message on the wire is much smaller: no key names, no whitespace, numbers packed into a handful of bytes instead of written out as text. Reading and writing happens faster, so more of the 100ms auction window goes to the bid decision instead of the parser. The connection between exchange and bidder stays open across requests instead of starting fresh each time, so no handshake delay before every bid. At exchange scale those savings turn into real money on bandwidth, headroom on the auction deadline, and resources for the model.
What that buys when traffic gets spiky. HTTP/2 has built-in flow control. When one side can't keep up, the other side gets told, instead of requests piling into a buffer until they time out. In live-event traffic, where the bid path is already under pressure, that backpressure signal is what lets the exchange react in time.
None of this is novel engineering. gRPC wasn't around when OpenRTB was written in 2010, so JSON over HTTP was the available choice at the time. ARTF is new enough to standardize on the more efficient tools that have come along since.
Bedrock Platform IAB Tech Lab

View on LinkedIn
M
Michał Nieć
CEO of Appliscale | AI in AdTech Expert | LP & Angel Investor in AdTech/GameTech · 2w
inEngineers use AI agents in four very different workflows. Same tools, dramatically different leverage. The easiest way to see them is to ask how you would delegate the work to a person, or a team.
1. The Intern. Small, well-scoped tasks on request.
The engineer is still writing the code and turns to the agent for the bits they would otherwise type. "Write a test for this." "Rename this everywhere." The engineer takes it from there.
Bottleneck: how fast one person can dispatch tasks.
Leverage: marginal, a 10 to 30 percent uplift.
Most non-engineers still think this is what AI coding is. It is the smallest of the four.
2. The Contractor. You write the brief, the agent delivers.
The engineer writes a specification of the change. The agent implements it end to end, writes the files, runs the tests, fixes its own mistakes, opens a pull request. The engineer reviews the result, not the keystrokes.
Bottleneck: the quality of the spec.
Leverage: high, often three to ten times once patterns are established.
3. The Partner. You bring the goal, you design together, the agent builds.
The engineer starts with an outcome, not a spec. The agent asks clarifying questions, proposes approaches with trade-offs, flags constraints. They agree on a plan. The agent builds.
Bottleneck: clarity of thinking about the problem.
Leverage: maximum on a single feature. The thinking happens with the agent in the loop, not after.
This is the mode senior engineers have been quietly converging on.
4. The Agency. Multiple agents, each with a role, talking to each other.
The engineer sets the goal and the budget, and the agents organise themselves. A planner decomposes the work. Workers pick up tasks in parallel. A reviewer checks the work before it reaches a human. They hand off to each other and escalate only on a real decision.
Bottleneck: orchestration design and reviewer quality.
Leverage: theoretically unbounded, practically capped by token cost and how easy the goal is to verify.
This is the frontier in 2026, source of both the spectacular demos and the spectacular failures.
Why This Matters If You Do Not Write Code
Each mode pushes the bottleneck further upstream.
• Intern: the keyboard. How fast can one person dispatch tasks.
• Contractor: the page. How clearly can you write a brief.
• Partner: the thinking. How clearly can you frame the problem.
• Agency: the orchestration. How well can you run a team of agents and trust the reviewer.
Each mode lifts the ceiling the previous one had. Same models, same headcount. Teams at the higher modes outproduce teams stuck at the lower ones by an order of magnitude.
So when someone says "we use AI for coding", the real question is which of the four. If they cannot answer, they are in mode one.

View on LinkedIn
M
Michał Nieć
CEO of Appliscale | AI in AdTech Expert | LP & Angel Investor in AdTech/GameTech · 2w
in"Coding harness" is becoming the most discussed topic in engineering as teams figure out how to work with AI-native development.
Here is a quick rundown of what it is, so you can see where software engineering is heading.
A coding harness is the program that runs the AI model and gives it tools.
The model on its own can only read and write text. The harness is what lets it act inside an engineering environment. Read files. Run terminal commands. Execute tests. Search the web. Commit code. Ask the human before doing something risky.
Typicall harness provides:
• File read and write across a codebase
• Shell command execution
• Test runners and build tools
• Web search and documentation fetch
• Git operations (commit, branch, pull request)
• MCP integrations (Slack, Linear, databases, etc.)
• Permission prompts before destructive actions
• Conversation memory across turns
• Sub-agents and parallel task spawning
Three Categories in 2026
• IDE-integrated. Runs inside the code editor.
Cursor, Windsurf, GitHub Copilot Agent, Zed AI.
• CLI or terminal. Runs in a terminal window.
Claude Code, Codex CLI, opencode, aider, crush, pi (pi.dev).
• Autonomous or cloud. Runs on a remote server, often unattended.
Devin, Factory, Amp, Jules.
Licensing
Open-source. opencode, aider, crush, pi. Model-agnostic, auditable, swap providers freely.
Commercial. Claude Code, Codex, Cursor, Windsurf, Devin, Factory, Amp, Jules. Polished UX, opinionated defaults, vendor-managed.
Why "Which Harness" Matters More Than "Which Model"
The harness decides:
• What files and tools the agent can access
• Which actions require human approval
• How context is loaded and cached
• Whether code or prompts leave your network
• Which model(s) get called and when
The same model produces very different output depending on the harness around it.
Common Pattern
Most senior engineers run two or three harnesses in parallel and switch based on the task. One CLI for deep work. One IDE-integrated for editing. One cloud-autonomous for batched tickets.

View on LinkedIn

Maksymilian Wojczuk
Technical Engineering Manager @Appliscale | Co-founder @DiPA · 2w
inSo the GSD AI tool got compromised.
Details in the thread (link in comments).
But honestly this isn't really about GSD.
It's about the fact that this is going to keep happening.
over and over.
to tools you use.
to dependencies buried 4 levels deep in your package.json or requirements.txt that you've never even heard of.
at some point we have to stop being surprised and start being prepared.
so - note to self, and maybe useful to others - here's where I think the baseline needs to be:
1. Stay informed. Subscribe to security advisories. Follow CVE feeds. Know when something you use gets flagged. OSV.dev, Snyk alerts, GitHub Dependabot - pick one and actually read it.
2. Scan your dependencies. All your projects. Not once - regularly. Run Trivy, Grype, Snyk, whatever fits your stack. Wire it into CI so it's not optional.
3. Audit what's actually in your supply chain. Do you know what your top-level dependencies pull in transitively? Most people don't. That's the attack surface.
4. There's definitely a 4 and a 5 and probably a 6. But those three feel like the floor.
The uncomfortable truth is that most of us - myself included - treat dependency security as someone else's problem until it isn't.
It's ours now.
What would you add to this list?

View on LinkedIn

Damian Naglak
Head of Engineering | Bedrock Platform | AdTech · 2w
inA bid request can carry both segment IDs and audience vectors. They describe the same user in different ways. The differences are in how you use them.
Segments work by ID. Someone defines an audience, gives it a code in a taxonomy like IAB Tech Lab's Audience Taxonomy v1.1, and ships the list of which users fall into it. At bid time the bid request carries the IDs the user has, the campaign lists the IDs it wants, and either there's overlap or there isn't. The match is yes or no, sub-microsecond per impression. Set logic is easy because IDs are atomic: A AND B, A AND NOT C work directly. When someone asks why an ad served, the segment name is the answer. Dedup across providers is automatic when they share the taxonomy.
Segments break when the audience isn't in the taxonomy. If no one defined the audience in advance, there's no ID for the campaign to ask for. Names also don't line up across providers: one provider's "luxury auto intender" isn't the same ID as another's "auto premium shopper" even when the underlying population matches. And there's lock-in: build against one provider's IDs and you're stuck with their taxonomy and their update schedule.
Embeddings work by vector. The audience is described in text, run through a model, and stored as an array of a few thousand numbers. At bid time the user side carries its own vector, the campaign side has its frozen audience vector, and the match is one number against a cutoff. Any audience you can describe in a sentence exists the moment you embed it. You get a score back instead of a yes or no. Different wordings of the same audience match cleanly, as long as both sides use the same model.
Embeddings break on the model. Two sides using different models can't compare vectors, so they have to agree on one first. When the model ships a new version, older vectors stop matching the new ones, so everyone has to upgrade together. Set logic has no clean equivalent: you can't just subtract one audience from another. The audit trail is two arrays of numbers and a score, fine for an engineer, not useful for a compliance reviewer. And each match takes more compute than a quick lookup, which adds up at high QPS.
Each one fits different problems. Segments give clean set logic, readable audit, and easy dedup. Vectors handle plain-language audiences, rank by closeness, capture nuance taxonomies can't encode, and they have a higher performance ceiling. The personalization stacks behind most of the consumer internet (search, recommendations, feed ranking) all run on embeddings, and public reports show real CTR and engagement lifts over label-based ranking. The cost is operational: model agreement, version migrations, harder audit, more compute per match. Teams will pay that cost when the numbers justify it. The IAB Tech Lab spec keeps both in the bid request so the choice stays open.
Bedrock Platform

View on LinkedIn