All Products
Search
Document Center

E-MapReduce:Fusion engine

Last Updated:Jan 19, 2026

The Fusion engine is a high-performance vectorized SQL execution engine built into EMR Serverless Spark. It performs three times better than open source Spark in TPC-DS benchmark tests. The Fusion engine is fully compatible with open source Spark. You do not need to modify your existing code. In EMR Serverless Spark, you can enable the engine by turning on the Use Fusion Acceleration switch when you create a session.

Supported scenarios

The Fusion engine is suitable for Spark SQL and DataFrame jobs. The Fusion engine can be used to improve the performance of most operators, expressions, and data types. However, the Fusion engine cannot accelerate the execution of Resilient Distributed Dataset (RDD) jobs or the execution of jobs in which user-defined functions (UDFs) are used.

Storage formats

The Fusion engine supports the following data storage formats:

  • Parquet

  • Paimon

  • ORC (partial)

Operators

The Fusion engine provides acceleration for most common operators. The following table describes the operators.

Type

Operator List

Source

  • FileSourceScanExec

  • HiveTableScanExec

  • BatchScanExec

  • InMemoryTableScanExec

Sink

DataWritingCommandExec

Common operation

  • FilterExec

  • ProjectExec

  • SortExec

  • UnionExec

Aggregation

HashAggregateExec

Join

  • BroadcastHashJoinExec

  • ShuffledHashJoinExec

  • SortMergeJoinExec

  • BroadcastNestedLoopJoinExec

  • CartesianProductExec

Window

  • WindowExec

  • WindowTopK

Exchange

  • ShuffleExchangeExec

  • ReusedExchangeExec

  • BroadcastExchangeExec

  • CoalesceExec

Limit

  • GlobalLimitExec

  • LocalLimitExec

  • TakeOrderedAndProjectExec

Subquery

SubqueryBroadcastExec

Others

  • ExpandExec

  • GenerateExec

Expressions

The following table describes the expressions supported by the Fusion engine.

Type

Expression List

Comparison/Logic

=, ==, !=, <>, <, <=, >, >=, <=>, is null, is not null, between, and, or, ||, !, negative, null if

Arithmetic

%, +, -, *, /, isnan, mod, negative, not, positive, abs, acos, acosh, asin, asinh, atan, atan2, atanh, cbrt, ceil, ceiling, cos, cosh, degrees, e, exp, floor, ln, log, log10, log2, pi, pmod, pow, power, radians, rand, random, rint, round, shiftleft, shiftright, sign, signum, sin, sqrt, tan, and tanh

Bitwise

^, |, &, ~, bit_and, bit_count, bit_or, bit_xor, and bit_length

Conditional Expression

case, if, and when

Set

in and find_in_set

String calculation

ascii, char, chr, char_length, character_length, concat, instr, lcase, lower, length, locate, lower, lpad, and ltrim

overlay, replace, reverse, rtrim, split, split_part, substr, substring, trim, ucase, upper, like, regexp, regexp_extract, regexp_extract_all, regexp_like, regexp_replace, and rlike

Aggregation

aggregate, approx_count_distinct, avg, collect_list, collect_set, corr, count, covar_pop, covar_samp, first, first_value, kurtosis, last, last_value, max, max_by, mean, min, regr_avgx, regr_avgy, regr_count, regr_r2

regr_intercept, regr_slope, regr_sxy, regr_sxx, regr_syy, skewness, std, stddev, stddev_pop, stddev_samp, sum, var_pop, var_samp, and variance

Window

cume_dist, dense_rank, lag, lead, nth_value, ntile, percent_rank, rank, and row_number

Time

add_months, current_date, current_timestamp, current_timezone, date, date_add, date_format, date_from_unix_date, date_sub, datediff, day, dayofmonth, dayofweek, dayofyear, from_unixtime, from_utc_timestamp, hour, last_day, make_date, minute, month, next_day, now, quarter, second, timestamp_micros, timestamp_millis, to_date, to_unix_timestamp, unix_seconds, unix_millis, unix_micros, weekday, weekofyear, and year

JSON

get_json_object and json_array_length

Array

array, array_contains, array_distinct, array_except, array_intersect, array_join, array_max, array_min, array_position, array_remove, array_repeat, array_sort, arrays_overlap, arrays_zip, element_at, exists, filter, forall, flatten, shuffle, size, and sort_array

Map

map, get_map_value, map_from_arrays, map_keys, map_values, map_zip_with, named_struct, struct, and str_to_map

Encoding

crc32, hash, md5, sha1, and sha2

Others

current_catalog, current_database, greatest, least, monotonically_increasing_id, nanvl, spark_partition_id, stack, uuid, and rand

Data types

The Fusion engine supports the following data types:

  • Byte, Short, Int, and Long

  • Boolean

  • String and Binary

  • Decimal

  • Float and Double

  • Date and Timestamp

Unsupported scenarios

Operators

Type

Operator

Aggregation

  • ObjectHashAggregateExec

  • SortAggregateExec

Exchange

CustomShuffleReaderExec

Pandas

  • AggregateInPandasExec

  • FlatMapGroupsInPandasExec

  • ArrowEvalPythonExec

  • MapInPandasExec

  • WindowInPandasExec

Others

  • CollectLimitExec

  • RangeExec

  • SampleExec

Data types

  • Struct

  • Array

  • Map