MiniMax M3 Blends Coding, 1M Context and Computer Use

MiniMax has released M3, describing it as the first open-weight model to combine frontier-tier software engineering capabilities with a one-million-token context window and native multimodal computer use. The model is built on the company's MiniMax Sparse Attention architecture, which underpins its ability to handle very long inputs efficiently.

Three Capabilities in One Model

M3's pitch rests on merging three features that have often appeared separately. First, strong coding performance suited to real engineering work. Second, an extended context window able to ingest large volumes of text or code at once. Third, computer use, meaning the model can operate a graphical interface by interpreting screens and taking actions rather than working only through text.

The Sparse Attention Foundation

Handling a million tokens is computationally expensive under conventional attention mechanisms, whose cost grows sharply with sequence length. MiniMax Sparse Attention, or MSA, is designed to reduce that burden by focusing computation on the most relevant parts of the input. Sparse attention approaches aim to preserve long-range reasoning while keeping inference tractable.

Frontier-tier software engineering performance
One-million-token context window for large inputs
Native multimodal computer use across graphical interfaces
Built on the MiniMax Sparse Attention architecture

Why Computer Use Is Notable

Computer use extends a model beyond text generation into direct interaction with software. A system that can read a screen, locate elements and perform clicks or entries can automate workflows that lack clean programmatic interfaces. Combining that with a large context window means the model can hold extensive task instructions or reference material in memory while it operates, a useful pairing for multi-step automation.

Open Weights and Deployment

As an open-weight model, M3 can be downloaded and run outside a hosted API, which appeals to organizations that need control over deployment environments or wish to adapt the model to specialized tasks. That openness, paired with the breadth of capabilities, positions M3 among a growing cohort of models challenging closed systems on multiple fronts at once.

Computer use enables automation of interface-driven workflows
Large context lets the model retain lengthy instructions during tasks
Open weights support self-hosting and customization

The Broader Shift

M3 reflects a trend in which open-weight models increasingly bundle capabilities that were once split across specialized systems. Rather than choosing between a strong coder, a long-context reader and an interface-operating agent, developers can evaluate a single model spanning all three. Real-world value will depend on how reliably these capabilities hold up across the variety of tasks encountered in production, an assessment that typically requires hands-on testing beyond headline benchmarks.

For teams building agents that must both reason over large inputs and act within software, M3 represents a notable consolidation of features in the open-weight category. Whether that consolidation translates into fewer moving parts in production, or simply a larger model to manage, will depend on each team's infrastructure and the reliability of the computer-use component on real interfaces.

Menu

MiniMax M3 Pairs Frontier Coding With 1M Context and Computer Use

MiniMax M3 combines frontier software engineering ability with a one-million-token context window and native computer use via a sparse attention design.

Three Capabilities in One Model

The Sparse Attention Foundation

Why Computer Use Is Notable

Open Weights and Deployment

The Broader Shift

Most Read