admin

admin

MANTRA: The First MultiVM Blockchain For RWAs With Native EVM And CosmWasm Support

MANTRA: The First MultiVM Blockchain For RWAs With Native EVM And CosmWasm Support

MANTRA, the Layer 1 blockchain purpose-built for real world assets (RWAs), today announced that its latest mainnet upgrades are now live. With this release, MANTRA has become the first blockchain to support both EVM and CosmWasm smart contracts natively, thereby making it the first true MultiVM layer 1 built specifically for real world assets (RWAs).
The post MANTRA: The First MultiVM Blockchain...

NVIDIA Run:ai model streamer enhances LLM inference speed

NVIDIA Run:ai model streamer enhances LLM inference speed

Ted Hisokawa Sep 16, 2025 20:22

NVIDIA has introduced the Run:ai Model streamer which reduces cold start latency significantly for large language models within GPU environments. This enhances user experience and scale.

NVIDIA's Run:ai Model streamer is a major advancement in artificial intelligence deployment. It reduces cold start latency during inference for large language model (LLM) models. According to NVIDIA, this innovation is a solution to one of the most critical problems faced by AI developers - optimizing the time required to load models into GPU memory.

Addressing Cold Start Latencies

Cold start delays are a major bottleneck when deploying LLMs. This is especially true in large-scale or cloud-based environments, where models need a lot of memory. These delays can have a significant impact on the user experience as well as the scalability and performance of AI applications. NVIDIA’s Run:ai Model Streamer reduces latency by simultaneously reading model weights directly from storage into GPU memory.

Benchmarking Model Streamer

The Run:ai Model streamer was compared to other loaders, such as the Hugging face Safetensors and CoreWeave Tensorizer on various storage types including local SSDs and Amazon S3. The Model Streamer was able to significantly reduce model loading times by leveraging concurrent stream and optimizing storage throughput.

Technical Insights

Model Streamer architecture uses a C++ backend with high performance to accelerate the loading of models from multiple storage sources. Multiple threads are used to read tensors simultaneously, allowing data to be transferred seamlessly from CPU memory to GPU memory. This method maximizes bandwidth and reduces time spent loading models.

The Model Streamer has a number of key features, including support for different storage types, native Safetensors integration, and an easily-integrated Python API. The Model Streamer is a powerful tool that can be used to improve inference performance for different AI frameworks.

Comparative Performance

Experiments have shown that increasing concurrency with the Model Streamer on GP3 SSD storage reduced loading times by a significant amount, achieving maximum throughput for the storage medium. Model Streamer outperformed all other loaders on IO2 SSDs, and S3 storage.

AI deployment implications

Run:ai Model Streamer is a significant step forward for AI deployment. It improves AI systems' scalability by reducing the cold start delay and optimizing model load times.

The Model Streamer is a useful tool for developers and organizations that deploy large models, or operate in cloud-based environments. It can improve the speed and efficiency of inference. It integrates with existing frameworks such as vLLM to provide a seamless upgrade of AI infrastructure.

NVIDIA’s Run:ai Model Streamer will become an indispensable tool for AI practitioners who want to optimize the model deployment and inference process, resulting in faster and more efficient AI operations.



Image source: Shutterstock

BloFin Title Sponsors TOKEN2049 Singapore, Debuts Largest-Ever “Build” Booth and Afterparty Headlined by DJ BLOND:ISH over 1,800+ Attendees

BloFin Title Sponsors TOKEN2049 Singapore, Debuts Largest-Ever “Build” Booth and Afterparty Headlined by DJ BLOND:ISH over 1,800+ Attendees

BloFin, the leading crypto exchange, is delighted to become the Title Sponsor of TOKEN2049 Singapore, Asia’s flagship crypto event. TOKEN2049 will unite over 20,000 attendees from the digital asset, Web3, and institutional finance sectors. This year, BloFin unveils a refreshed brand identity alongside major platform upgrades, including grid trading, instant convert features, and premium trade
The post BloFin Title Sponsors TOKEN2049 Singapore,...

Innovation Leads, Stability Endures: Join Three Major HTX Events to Unlock the iPhone 17 and Rich Rewards

Innovation Leads, Stability Endures: Join Three Major HTX Events to Unlock the iPhone 17 and Rich Rewards

In the digital age, security and trust are the cornerstone linking users to the future. From the first-generation iPhone to today’s iPhone 17, every iteration has strengthened security and privacy, allowing users to enjoy the digital life with confidence and trust. Likewise, over 12 years of growth, HTX has continually refined its ecosystem, evolving from
The post Innovation Leads, Stability Endures: Join...

VeChain Flips dApps Playbook With Launch of VeFounder

VeChain Flips dApps Playbook With Launch of VeFounder

Launches VeFounder Program to Empower Web3 Builders with Operational Control and Eventual Ownership of Live dApps VeChain, the leading real-world-application focused Layer 1 today announced the launch of the VeFounder Program, a first-of-its-kind initiative designed to revolutionize the dApps economy with a top-down approach to unlock untapped growth opportunities. The global dApps market has grown
The post VeChain Flips dApps Playbook With...