ByteDance Prepares Its Own Arm and RISC-V CPUs to Regain Control Over Cost Per Token

With Doubao, ByteDance claims to process 120 trillion tokens per day. At this scale, the hardware challenge is no longer limited to Nvidia GPUs: server CPUs, long relegated to the background in the AI debate, become a strategic variable again. According to Reuters, the Chinese group is developing two families of in-house processors, one based on Arm and the other on RISC-V, to support the deployment of its AI agents via Coze and reduce its dependence on Intel and AMD.

ByteDance is said to have crossed an industrial threshold. In March 2026, Doubao was processing 120 trillion tokens per day - 120 trillion in the Anglo-American sense - according to figures published by Volcano Engine and relayed by TechNode. Usage is said to have doubled in three months and increased a thousandfold since the public launch of the model in May 2024.

At this level of traffic, the inference cost no longer depends solely on the price of AI accelerators. It also depends on the entire server stack: CPU, memory, orchestration, tool calls, database access, network, queues, latency, and availability. It is in this context that Reuters revealed on May 28, 2026, that ByteDance is developing its own central processors along two tracks: an Arm architecture, owned by SoftBank, and a RISC-V architecture, an open instruction set.

The program is linked to the expanded deployment of AI agents via Coze, the group's agent platform. Its immediate motivation is as much economic as it is strategic: Intel is said to have notified its Chinese customers of delivery delays of up to six months on certain server CPUs, with price increases of 10 to 35% per quarter according to Reuters. For ByteDance, the issue is not only to "do like the American hyperscalers" but to secure the material foundation of an AI used on a massive scale.

The AI Battle Is Not Only About GPUs

For the past two years, the hardware debate on AI has focused on Nvidia, American export restrictions, H100/H200/B200 GPUs, and Chinese alternatives like Huawei Ascend. This framing is necessary but incomplete.

GPUs and AI accelerators remain central for training large models and the most intensive inference workloads. But AI agents introduce another constraint. An agent does not simply generate a long response in one pass. It plans, calls tools, checks results, relaunches subtasks, consults documentary databases, executes code, interacts with APIs, and multiplies reasoning loops.

In this type of workload, the server CPU becomes critical again. It does not replace the AI accelerator, but it conditions the full cost of inference: orchestration of calls, latency between components, session management, security, scheduling, preprocessing, post-processing, and execution of functions called by agents.

This is the layer ByteDance seems to want to take back under control. The project revealed by Reuters should not be read as an attempt to directly replace Nvidia with in-house CPUs. Rather, it is a move towards vertical integration on the server foundation surrounding AI workloads, particularly agent inference workloads.

A Chinese Server Market Moving Away from Intel

The shift is not just about ByteDance. According to a UBS study from January 2026 cited by Business Times, Intel's market share on server processors in China is said to have dropped from over 90% in 2019 to about 60% in 2025. During the same period, AMD is said to have increased from about 5% to more than 20%.

This evolution has two consequences. First, Intel is no longer in a near-monopoly situation on the Chinese server. Second, large Chinese customers now have a stronger incentive to diversify their hardware stack, especially when lead times, prices, and geopolitical restrictions are increasing simultaneously.

China accounts for more than 20% of Intel's total revenue. But the shortage on the fourth and fifth generation Xeon has made this dependency more costly for local customers. In this context, ByteDance's development of in-house CPUs is part of a broader movement: that of a progressive migration of major Chinese publishers towards better-controlled architectures, whether they are Arm, RISC-V, or from national suppliers.

The program, however, remains embryonic. ByteDance only formed its hardware design team in 2022. The group thus has limited experience compared to Apple, Google, Amazon, or Microsoft, which have been accumulating the necessary skills for developing their own chips for fifteen to twenty years.

The Precedent of Hyperscalers: A Traffic Threshold, Not Just a Reaction to Sanctions

The ByteDance move is reminiscent of the American hyperscalers. Google, AWS, and Microsoft did not develop their in-house chips solely for sovereignty or strategic communication reasons. They did so when a traffic, cost, or performance threshold made the standard purchase model insufficient.

At Google, the decision to develop a dedicated AI accelerator was triggered in 2013 when an internal projection showed that voice search could double the compute needs of data centers. The TPU, designed for the engine's internal workloads, was then developed and deployed at high speed, with massive gains over contemporary CPUs and GPUs on certain workloads.

AWS followed with Trainium, designed to reduce training costs compared to GPU instances. Microsoft generalized Azure Cobalt 100, an in-house Arm CPU intended to optimize general cloud workloads, with a better price/performance ratio than the previous Arm generation.

The common point is not the exact nature of the chip. TPU and Trainium are AI accelerators; Cobalt 100 is an Arm CPU; ByteDance's projects involve Arm and RISC-V CPUs. The common point is deeper: when an actor reaches a sufficient scale, it seeks to internalize part of its silicon to optimize its own workloads rather than relying entirely on the standard market.

ByteDance is entering this logic. But its case differs on one essential point: American hyperscalers have been able to rely on TSMC and an advanced supply chain. The foundry for ByteDance's future CPUs has not been announced.

SMIC Is Not TSMC: A Structuring Hypothesis, Not a Detail

The foundry is the big blind spot in the file. Reuters does not specify who would manufacture ByteDance's future CPUs. Some analysts mention SMIC as a likely option, given export restrictions and the geopolitical context, but this hypothesis is not confirmed.

Yet it radically changes the economic calculation. Google's, AWS's, or Microsoft's precedents rely on access to TSMC's best fabrication nodes. If ByteDance had to rely on SMIC, the gap in yield, energy density, and cost per wafer would become central.

In other words, vertical integration does not automatically guarantee a gain. It only makes sense if the total cost - design, manufacturing, yield, consumption, software maintenance, production volume, and data center integration - becomes lower or strategically preferable to buying Intel or AMD CPUs.

In ByteDance's case, the motivation can be as defensive as it is offensive: securing supply, reducing dependency on Intel and AMD, adapting the CPU to internal workloads, but also accepting an initial overcost to gain control over time.

To remember: SMIC remains a hypothesis, not an established fact. But if this hypothesis is confirmed, the comparison with American hyperscalers will need to be strongly nuanced: developing its own chip does not produce the same gains depending on whether or not one accesses the world's best fabrication nodes.

A Hybrid Hardware Strategy, Not an Exit from Western Lock-In

The development of in-house CPUs does not mean ByteDance is exiting Western hardware lock-in. On the contrary, the available information outlines a much more hybrid strategy.

ByteDance is said to have raised its 2026 investment plan to 200 billion yuan, or about 29.4 billion dollars, up 25% from an initial envelope of 160 billion. In the initial plan, 85 billion yuan is said to have been earmarked for AI chips. But the detailed breakdown of the revised envelope has not been made public.

In parallel, Bloomberg reported that Qualcomm is said to have won a contract to supply millions of custom AI ASICs to ByteDance's data centers. The group is also devoting several billion dollars to Huawei Ascend chips. Nvidia, however, remains difficult to replace on large-scale pre-training workloads, despite export restrictions.

This combination contradicts the idea of a clean break. ByteDance does not seem to be choosing between Nvidia, Huawei, Qualcomm, Arm, RISC-V, and its own developments. It arbitrates between several hardware layers depending on use: training, inference, agents, internal cloud, availability, cost, compliance, and geopolitical constraints.

The strategy resembles less a quest for autarky than an industrial insurance: no longer relying on a single supplier, nor on a single architecture, nor on a single export regime.

Why Agentic Inference Changes the Calculation

The most important element of the file may be the least spectacular: agentic AI shifts the cost center of gravity.

In a classic chatbot, most of the visible cost is related to the model and the accelerator that executes the inference. In an agentic system, each response can trigger a chain of actions: planning, searching, calling a tool, checking, intermediate generation, execution, correction, new request, and final output.

At large scale, these loops do not only consume GPUs. They mobilize the entire infrastructure. The CPU then becomes a central piece of the cost per task, not just a server commodity.

This is what makes ByteDance's case interesting. With Doubao and Coze, the group is not just trying to serve conversations. It is building an infrastructure for agents capable of acting, orchestrating services, and multiplying machine-to-machine interactions. At this level, hardware optimization no longer targets only raw performance. It targets the marginal cost of each agentic action.

A Gamble Far from Being Won

The project, however, remains far from mature. Designing a competitive server CPU requires considerable hardware, software, and industrial expertise. It is necessary to develop or adapt the cores, optimize consumption, ensure software compatibility, maintain compilers, secure the manufacturing chain, guarantee volumes, and convince internal teams to migrate their workloads.

The great successes of in-house silicon rarely rely on the chip alone. They rely on a complete stack: hardware, low-level software, internal frameworks, stabilized workloads, massive volumes, and the ability to amortize costs over several years.

ByteDance has the volume. It also has an obvious economic pressure. But it has not yet demonstrated that it can transform these constraints into a material advantage comparable to that of Google, Amazon, or Microsoft.

The project should therefore be read for what it is: not an immediate revolution in the server CPU market, but a strategic signal. As agentic AI changes load profiles, major players can no longer be content to buy standard components. They seek to control the hardware layers that determine their cost per token, availability, and operational independence.

A Battle of Total Cost

The development of Arm and RISC-V CPUs by ByteDance marks a stage in the industrialization of AI on a very large scale. After the battle of models, then that of GPUs, another battle opens: that of the total execution cost.

In this battle, the winner will not only be the one with the best model or the best accelerator. It will be the one who knows how to align hardware architecture, software, orchestration, supply, and unit costs with its own uses.

ByteDance has not yet won this bet. But with Doubao, Coze, and its inference volumes, the group now has an economic reason to attempt it.