{"description":"Trending threats, MITRE ATT\u0026CK coverage, and detection metadata — refreshed continuously.","feed_url":"https://feed.craftedsignal.io/tags/cloudpickle/","home_page_url":"https://feed.craftedsignal.io/","items":[{"_cs_actors":[],"_cs_cves":[],"_cs_exploited":false,"_cs_products":["Ray Data"],"_cs_severities":["critical"],"_cs_tags":["remote-code-execution","parquet","deserialization","cloudpickle","ray"],"_cs_type":"advisory","_cs_vendors":["Ray"],"content_html":"\u003cp\u003eRay Data, a component of the Ray distributed computing framework, is susceptible to remote code execution (RCE) due to unsafe deserialization of Parquet file metadata. The vulnerability stems from Ray\u0026rsquo;s registration of custom Arrow extension types (\u003ccode\u003eray.data.arrow_tensor\u003c/code\u003e, \u003ccode\u003eray.data.arrow_tensor_v2\u003c/code\u003e, \u003ccode\u003eray.data.arrow_variable_shaped_tensor\u003c/code\u003e) within PyArrow. When a Parquet file containing these extension types is processed, the \u003ccode\u003e__arrow_ext_deserialize__\u003c/code\u003e function is invoked, leading to the execution of arbitrary code through \u003ccode\u003ecloudpickle.loads()\u003c/code\u003e on the field\u0026rsquo;s metadata, prior to any data being read.  This issue affects Ray versions 2.49.0 through 2.54.0, introduced in July 2025 via commit \u003ccode\u003ef6d21db1a4\u003c/code\u003e. Successful exploitation does not require authentication or network access to a Ray cluster. Instead, it hinges on the framework reading a maliciously crafted Parquet file, which can originate from various sources like cloud storage, HuggingFace datasets, or shared file systems.\u003c/p\u003e\n\u003ch2 id=\"attack-chain\"\u003eAttack Chain\u003c/h2\u003e\n\u003col\u003e\n\u003cli\u003eAn attacker crafts a Parquet file containing a column with a \u003ccode\u003eray.data.arrow_tensor\u003c/code\u003e, \u003ccode\u003eray.data.arrow_tensor_v2\u003c/code\u003e, or \u003ccode\u003eray.data.arrow_variable_shaped_tensor\u003c/code\u003e extension type.\u003c/li\u003e\n\u003cli\u003eThe attacker injects a malicious payload in the \u003ccode\u003eARROW:extension:metadata\u003c/code\u003e field of the Parquet file, serialized using \u003ccode\u003ecloudpickle\u003c/code\u003e.\u003c/li\u003e\n\u003cli\u003eThe attacker places the crafted Parquet file in a location accessible to a Ray Data pipeline, such as a HuggingFace dataset, a shared filesystem, or a cloud storage bucket.\u003c/li\u003e\n\u003cli\u003eA Ray Data pipeline, using functions like \u003ccode\u003eray.data.read_parquet()\u003c/code\u003e, \u003ccode\u003epyarrow.parquet.read_table()\u003c/code\u003e, or \u003ccode\u003epandas.read_parquet()\u003c/code\u003e, attempts to read the Parquet file.\u003c/li\u003e\n\u003cli\u003eDuring schema parsing, PyArrow encounters the custom Arrow extension type and automatically calls the \u003ccode\u003e__arrow_ext_deserialize__\u003c/code\u003e method.\u003c/li\u003e\n\u003cli\u003eThe \u003ccode\u003e__arrow_ext_deserialize__\u003c/code\u003e method invokes \u003ccode\u003e_deserialize_with_fallback()\u003c/code\u003e, which attempts to deserialize the metadata using \u003ccode\u003ecloudpickle.loads()\u003c/code\u003e.\u003c/li\u003e\n\u003cli\u003eThe \u003ccode\u003ecloudpickle.loads()\u003c/code\u003e function executes the attacker\u0026rsquo;s arbitrary code from the crafted Parquet metadata.\u003c/li\u003e\n\u003cli\u003eThe attacker achieves arbitrary command execution as the user running the Ray worker process, potentially leading to full server compromise.\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch2 id=\"impact\"\u003eImpact\u003c/h2\u003e\n\u003cp\u003eThis vulnerability affects Ray versions 2.49.0 through 2.54.0, impacting any process utilizing Ray Data that reads Parquet files. The global registration of extension types in PyArrow means that all Parquet reads within the affected process are vulnerable. An attacker can achieve arbitrary command execution as the Ray worker process user, leading to full server compromise, without requiring authentication or cluster access. Successful exploitation allows attackers to compromise systems by simply placing a malicious Parquet file in a location that a Ray Data pipeline processes.\u003c/p\u003e\n\u003ch2 id=\"recommendation\"\u003eRecommendation\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eUpgrade Ray to a patched version beyond 2.54.0 to remediate the vulnerability, ensuring the fix addresses the \u003ccode\u003ecloudpickle.loads()\u003c/code\u003e call in the deserialization path.\u003c/li\u003e\n\u003cli\u003eImplement strict input validation and sanitization for Parquet files before processing them with Ray Data to prevent the execution of malicious payloads embedded in the \u003ccode\u003eARROW:extension:metadata\u003c/code\u003e field.\u003c/li\u003e\n\u003cli\u003eMonitor for suspicious process execution originating from \u003ccode\u003epython\u003c/code\u003e processes using \u003ccode\u003ecloudpickle.loads()\u003c/code\u003e with the intent of arbitrary code execution.\u003c/li\u003e\n\u003cli\u003eDeploy the Sigma rule \u003ccode\u003eDetect Ray Data Parquet Deserialization RCE\u003c/code\u003e to detect exploitation attempts by monitoring for specific metadata within Parquet files.\u003c/li\u003e\n\u003c/ul\u003e\n","date_modified":"2026-04-24T16:15:00Z","date_published":"2026-04-24T16:15:00Z","id":"/briefs/2026-04-ray-parquet-rce/","summary":"Ray Data is vulnerable to remote code execution via Parquet Arrow Extension Type Deserialization; specifically, a maliciously crafted Parquet file can trigger arbitrary code execution due to the unsafe deserialization of Arrow extension metadata, affecting Ray versions 2.49.0 through 2.54.0.","title":"Ray Data Remote Code Execution via Parquet Arrow Extension Type Deserialization","url":"https://feed.craftedsignal.io/briefs/2026-04-ray-parquet-rce/"}],"language":"en","title":"CraftedSignal Threat Feed — Cloudpickle","version":"https://jsonfeed.org/version/1.1"}