<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Ray — CraftedSignal Threat Feed</title><link>https://feed.craftedsignal.io/tags/ray/</link><description>Trending threats, MITRE ATT&amp;CK coverage, and detection metadata — refreshed continuously.</description><generator>Hugo</generator><language>en</language><managingEditor>hello@craftedsignal.io</managingEditor><webMaster>hello@craftedsignal.io</webMaster><lastBuildDate>Fri, 24 Apr 2026 16:15:00 +0000</lastBuildDate><atom:link href="https://feed.craftedsignal.io/tags/ray/feed.xml" rel="self" type="application/rss+xml"/><item><title>Ray Data Remote Code Execution via Parquet Arrow Extension Type Deserialization</title><link>https://feed.craftedsignal.io/briefs/2026-04-ray-parquet-rce/</link><pubDate>Fri, 24 Apr 2026 16:15:00 +0000</pubDate><author>hello@craftedsignal.io</author><guid isPermaLink="true">https://feed.craftedsignal.io/briefs/2026-04-ray-parquet-rce/</guid><description>Ray Data is vulnerable to remote code execution via Parquet Arrow Extension Type Deserialization; specifically, a maliciously crafted Parquet file can trigger arbitrary code execution due to the unsafe deserialization of Arrow extension metadata, affecting Ray versions 2.49.0 through 2.54.0.</description><content:encoded><![CDATA[<p>Ray Data, a component of the Ray distributed computing framework, is susceptible to remote code execution (RCE) due to unsafe deserialization of Parquet file metadata. The vulnerability stems from Ray&rsquo;s registration of custom Arrow extension types (<code>ray.data.arrow_tensor</code>, <code>ray.data.arrow_tensor_v2</code>, <code>ray.data.arrow_variable_shaped_tensor</code>) within PyArrow. When a Parquet file containing these extension types is processed, the <code>__arrow_ext_deserialize__</code> function is invoked, leading to the execution of arbitrary code through <code>cloudpickle.loads()</code> on the field&rsquo;s metadata, prior to any data being read.  This issue affects Ray versions 2.49.0 through 2.54.0, introduced in July 2025 via commit <code>f6d21db1a4</code>. Successful exploitation does not require authentication or network access to a Ray cluster. Instead, it hinges on the framework reading a maliciously crafted Parquet file, which can originate from various sources like cloud storage, HuggingFace datasets, or shared file systems.</p>
<h2 id="attack-chain">Attack Chain</h2>
<ol>
<li>An attacker crafts a Parquet file containing a column with a <code>ray.data.arrow_tensor</code>, <code>ray.data.arrow_tensor_v2</code>, or <code>ray.data.arrow_variable_shaped_tensor</code> extension type.</li>
<li>The attacker injects a malicious payload in the <code>ARROW:extension:metadata</code> field of the Parquet file, serialized using <code>cloudpickle</code>.</li>
<li>The attacker places the crafted Parquet file in a location accessible to a Ray Data pipeline, such as a HuggingFace dataset, a shared filesystem, or a cloud storage bucket.</li>
<li>A Ray Data pipeline, using functions like <code>ray.data.read_parquet()</code>, <code>pyarrow.parquet.read_table()</code>, or <code>pandas.read_parquet()</code>, attempts to read the Parquet file.</li>
<li>During schema parsing, PyArrow encounters the custom Arrow extension type and automatically calls the <code>__arrow_ext_deserialize__</code> method.</li>
<li>The <code>__arrow_ext_deserialize__</code> method invokes <code>_deserialize_with_fallback()</code>, which attempts to deserialize the metadata using <code>cloudpickle.loads()</code>.</li>
<li>The <code>cloudpickle.loads()</code> function executes the attacker&rsquo;s arbitrary code from the crafted Parquet metadata.</li>
<li>The attacker achieves arbitrary command execution as the user running the Ray worker process, potentially leading to full server compromise.</li>
</ol>
<h2 id="impact">Impact</h2>
<p>This vulnerability affects Ray versions 2.49.0 through 2.54.0, impacting any process utilizing Ray Data that reads Parquet files. The global registration of extension types in PyArrow means that all Parquet reads within the affected process are vulnerable. An attacker can achieve arbitrary command execution as the Ray worker process user, leading to full server compromise, without requiring authentication or cluster access. Successful exploitation allows attackers to compromise systems by simply placing a malicious Parquet file in a location that a Ray Data pipeline processes.</p>
<h2 id="recommendation">Recommendation</h2>
<ul>
<li>Upgrade Ray to a patched version beyond 2.54.0 to remediate the vulnerability, ensuring the fix addresses the <code>cloudpickle.loads()</code> call in the deserialization path.</li>
<li>Implement strict input validation and sanitization for Parquet files before processing them with Ray Data to prevent the execution of malicious payloads embedded in the <code>ARROW:extension:metadata</code> field.</li>
<li>Monitor for suspicious process execution originating from <code>python</code> processes using <code>cloudpickle.loads()</code> with the intent of arbitrary code execution.</li>
<li>Deploy the Sigma rule <code>Detect Ray Data Parquet Deserialization RCE</code> to detect exploitation attempts by monitoring for specific metadata within Parquet files.</li>
</ul>
]]></content:encoded><category domain="severity">critical</category><category domain="type">advisory</category><category>remote-code-execution</category><category>parquet</category><category>deserialization</category><category>cloudpickle</category><category>ray</category></item></channel></rss>