Skip to content

Type Mapping

PGStyx uses type mappings when it creates a target table, compares an existing table to a DataFrame, and decides whether a schema change is compatible.

Spark typePostgreSQL type
ByteTypeSMALLINT
ShortTypeSMALLINT
IntegerTypeINTEGER
LongTypeBIGINT
FloatTypeREAL
DoubleTypeDOUBLE PRECISION
DecimalType(p, s)NUMERIC(p, s)
StringTypeTEXT
BooleanTypeBOOLEAN
DateTypeDATE
TimestampTypeTIMESTAMP
TimestampNTZTypeTIMESTAMP
BinaryTypeBYTEA
ArrayType(e)e[]
MapType(k, v)JSONB
StructTypeJSONB
anything elseTEXT
  • Arrays land as PostgreSQL arrays.
  • MapType and StructType land as JSONB.
  • NUMERIC[] columns read back with a generic decimal element type during schema comparison because array-element precision is not available from metadata.

When the target column expects types such as uuid, jsonb, inet, or enums, a StringType source can still be valid as long as PostgreSQL can parse the value.

PostgreSQL typeSpark type
smallintShortType
integer, serialIntegerType
bigint, bigserialLongType
realFloatType
double precisionDoubleType
numeric(p, s)DecimalType(p, s)
char, varchar, textStringType
dateDateType
timestamp, timestamptzTimestampType
booleanBooleanType
byteaBinaryType
uuid, json, jsonb, inetStringType

When schemaEvolution=addColumns, these widenings are accepted automatically:

SourceTargetWhy
ShortTypeIntegerType or LongTypeLossless widening
IntegerTypeLongTypeLossless widening
ByteTypeShortType, IntegerType, LongTypeLossless widening
FloatTypeDoubleTypeLossless widening
IntegerTypeDecimalType(p, s) where p ≥ 10Integer fits the target
LongTypeDecimalType(p, s) where p ≥ 19Long fits the target
VarcharType(n)StringTypeUnbounded text is wider
CharType(n)StringTypeUnbounded text is wider
Same typeSame typeNo promotion needed

Some widenings still require schemaEvolutionAllowTableRewrite=true. See Schema Evolution.

Changes that still need manual intervention

Section titled “Changes that still need manual intervention”
SourceTargetReason
StringTypeany numericRequires parsing, can fail
BooleanTypeIntegerTypeSemantics change
IntegerTypeFloatTypeLossy precision change
TimestampTypeDateTypeLoses time component
DateTypeTimestampTypeMidnight assumptions are easy to get wrong
DecimalType(p1, s1)DecimalType(p2, s2) where p2 < p1Precision loss

Any of these needs manual intervention in the target schema.

  • Be explicit about decimal precision and scale when the target table already enforces them.
  • Decide early whether nested data should land as JSONB or be flattened upstream.
  • Pair this page with Schema Evolution when schemas drift over time.