Tiles
Download
Skip to Content
Models

Models

Our approach to model selection is to use the most suitable model for each task, within the constraints of the supported hardware. Currently, Tiles is designed to handle everyday tasks, with plans to expand into more domain-specific use cases over time. We provide a carefully tested combination of system prompts, tools, and models, so users do not need to manage model selection or scaffolding themselves.

We use gpt-oss-20b as the primary model for everyday tasks, with the Harmony renderer and support for the Open Responses API to ensure optimal inference efficiency. Its Mixture of Experts architecture activates roughly 4B parameters at inference time out of 21B total parameters, providing a strong balance between quality and efficiency.

Memory capacity and memory bandwidth, not peak FLOPs, are the primary constraints for local inference. Personal computing assistants typically operate at batch_size = 1, where throughput is limited by memory movement rather than raw compute.

We started with Apple Silicon hardware because it offers strong memory unit economics relative to competing platforms, due to its unified memory architecture and high memory bandwidth per dollar.

This setup requires 16 GB of unified memory to run optimally. Adoption should not be a constraint, as all Macs shipped from late 2024 onward include 16 GB of base memory.

Last updated on