Skip to content

Write Modes

Three modes. Pick based on how the target should change over time, not on how the current batch happens to look.

Use append when each run only adds rows to the target.

  • New rows are written as-is.
  • Duplicate-key conflicts fail the write.
  • For insert-only feeds where duplicates should be skipped instead of failing, pair append with validationMode=warnAndFilter and a target-side uniqueness rule.
df.write
.format("pgstyx")
.option("url", "jdbc:postgresql://localhost:5432/warehouse")
.option("dbtable", "events")
.option("user", "postgres")
.option("password", "secret")
.option("writeMode", "append")
.save()

Example progression:

Start

Target:1: created

Append run 1

Incoming: 2: paid
Target after run: 1: created, 2: paid

Append run 2

Incoming: 3: shipped
Target after run: 1: created, 2: paid, 3: shipped

Use upsert when rows should be created or refreshed based on a stable key.

  • mergeKeys is required.
  • Omitting mergeKeys throws IllegalArgumentException('mergekeys required for upsert mode').
  • Single-key upserts are available on Community. Composite keys require Pro.
df.write
.format("pgstyx")
.option("url", "jdbc:postgresql://localhost:5432/warehouse")
.option("dbtable", "users")
.option("user", "postgres")
.option("password", "secret")
.option("writeMode", "upsert")
.option("mergeKeys", "user_id")
.save()

Example progression:

Upsert run 2

Incoming: 2: bob@example
Target after run: 1: alice@new.example, 2: bob@example

Use overwrite for snapshots, full refreshes, or compacted rebuilds.

  • PGStyx replaces the table contents before loading the new dataset.
  • The table schema stays in place.
  • If other tables depend on the target through foreign keys, plan that refresh carefully.
df.write
.format("pgstyx")
.option("url", "jdbc:postgresql://localhost:5432/warehouse")
.option("dbtable", "events")
.option("user", "postgres")
.option("password", "secret")
.option("writeMode", "overwrite")
.save()

Example progression:

Start

Target: jan, feb, mar

Overwrite run 1

Incoming: apr, may
Target after run: apr, may

Overwrite run 2

Incoming:jun
Target after run:jun

If dbtable does not exist when the write starts, PGStyx creates it from the DataFrame schema.

  • A fresh table created with writeMode=upsert uses mergeKeys as the primary key.
  • A fresh table created with writeMode=append has no primary key by default.
  • After the first write, later writes expect the table to exist and follow the active schemaEvolution rules.
ModeBest fit
appendEvent ingest, raw landing, additive feeds
upsertDimension sync, idempotent pipelines, record-identity updates
overwriteSnapshots, full rebuilds, compacted tables