Type Mapping
PGStyx uses type mappings when it creates a target table, compares an existing table to a DataFrame, and decides whether a schema change is compatible.
Spark → PostgreSQL
Section titled “Spark → PostgreSQL”| Spark type | PostgreSQL type |
|---|---|
ByteType | SMALLINT |
ShortType | SMALLINT |
IntegerType | INTEGER |
LongType | BIGINT |
FloatType | REAL |
DoubleType | DOUBLE PRECISION |
DecimalType(p, s) | NUMERIC(p, s) |
StringType | TEXT |
BooleanType | BOOLEAN |
DateType | DATE |
TimestampType | TIMESTAMP |
TimestampNTZType | TIMESTAMP |
BinaryType | BYTEA |
ArrayType(e) | e[] |
MapType(k, v) | JSONB |
StructType | JSONB |
| anything else | TEXT |
Arrays and semi-structured values
Section titled “Arrays and semi-structured values”- Arrays land as PostgreSQL arrays.
- MapType and StructType land as
JSONB. NUMERIC[]columns read back with a generic decimal element type during schema comparison because array-element precision is not available from metadata.
String-backed PostgreSQL types
Section titled “String-backed PostgreSQL types”When the target column expects types such as uuid, jsonb, inet, or enums, a StringType source can still be valid as long as PostgreSQL can parse the value.
PostgreSQL → Spark during comparison
Section titled “PostgreSQL → Spark during comparison”| PostgreSQL type | Spark type |
|---|---|
smallint | ShortType |
integer, serial | IntegerType |
bigint, bigserial | LongType |
real | FloatType |
double precision | DoubleType |
numeric(p, s) | DecimalType(p, s) |
char, varchar, text | StringType |
date | DateType |
timestamp, timestamptz | TimestampType |
boolean | BooleanType |
bytea | BinaryType |
uuid, json, jsonb, inet | StringType |
Widenings accepted by addColumns
Section titled “Widenings accepted by addColumns”When schemaEvolution=addColumns, these widenings are accepted automatically:
| Source | Target | Why |
|---|---|---|
ShortType | IntegerType or LongType | Lossless widening |
IntegerType | LongType | Lossless widening |
ByteType | ShortType, IntegerType, LongType | Lossless widening |
FloatType | DoubleType | Lossless widening |
IntegerType | DecimalType(p, s) where p ≥ 10 | Integer fits the target |
LongType | DecimalType(p, s) where p ≥ 19 | Long fits the target |
VarcharType(n) | StringType | Unbounded text is wider |
CharType(n) | StringType | Unbounded text is wider |
| Same type | Same type | No promotion needed |
Some widenings still require schemaEvolutionAllowTableRewrite=true. See Schema Evolution.
Changes that still need manual intervention
Section titled “Changes that still need manual intervention”| Source | Target | Reason |
|---|---|---|
StringType | any numeric | Requires parsing, can fail |
BooleanType | IntegerType | Semantics change |
IntegerType | FloatType | Lossy precision change |
TimestampType | DateType | Loses time component |
DateType | TimestampType | Midnight assumptions are easy to get wrong |
DecimalType(p1, s1) | DecimalType(p2, s2) where p2 < p1 | Precision loss |
Any of these needs manual intervention in the target schema.
Planning tips
Section titled “Planning tips”- Be explicit about decimal precision and scale when the target table already enforces them.
- Decide early whether nested data should land as
JSONBor be flattened upstream. - Pair this page with Schema Evolution when schemas drift over time.