Skip to content

Install

PGStyx publishes signed artifacts to Maven Central. Pick the install path that matches your build environment, then run the verify snippet at the bottom to confirm the datasource is registered.

SparkScalaArtifact
3.3.42.12com.pgstyx:pgstyx-spark3_2.12:1.0.0
3.5.82.13com.pgstyx:pgstyx-spark3_2.13:1.0.0
4.1.12.13com.pgstyx:pgstyx-spark4_2.13:1.0.0

The artifact pattern is pgstyx-spark{sparkMajor}_{scalaBinary}. Spark 3 artifacts target JDK 11+; Spark 4 artifacts target JDK 17+.

Terminal window
spark-submit \
--packages com.pgstyx:pgstyx-spark3_2.12:1.0.0 \
my-job.jar

For a locally-built JAR, swap --packages for --jars target/scala-2.12/pgstyx-spark3_2.12-1.0.0.jar.

libraryDependencies += "com.pgstyx" %% "pgstyx-spark3" % "1.0.0"

The %% operator resolves the active scalaBinaryVersion against the published matrix. For Spark 4, use pgstyx-spark4 and pin scalaVersion := "2.13.17".

Attach PGStyx as a cluster library so every notebook and job on that cluster resolves the datasource.

  1. Compute → your cluster → Libraries → Install new.
  2. Source: Maven.
  3. Coordinates: com.pgstyx:pgstyx-spark3_2.12:1.0.0. Adjust for the runtime’s Scala binary version.
  4. Install and restart the cluster.

Do not add the JAR to the driver application separately — the cluster library covers both driver and executors.

Pass --packages com.pgstyx:pgstyx-spark3_2.12:1.0.0 to your spark-submit step. On EMR step definitions, add it to Args. On Dataproc, use --properties spark.jars.packages=com.pgstyx:pgstyx-spark3_2.12:1.0.0.

The _2.12 / _2.13 suffix must match the Spark runtime’s Scala binary version. Mixing them fails at class-loading time, not at dependency resolution.

  • Spark 3.x ships Scala 2.12 and 2.13 builds.
  • Spark 4.x is Scala 2.13 only.

If unsure, run spark.version and scala.util.Properties.versionNumberString on the target cluster and match exactly.

The smallest snippet that proves the datasource resolved:

val df = spark.createDataFrame(Seq((1, "ok"))).toDF("id", "label")
df.write
.format("pgstyx")
.option("url", "jdbc:postgresql://localhost:5432/warehouse")
.option("dbtable", "pgstyx_verify")
.option("user", "postgres")
.option("password", "secret")
.save()

A successful run creates pgstyx_verify with one row. Failed to find data source: pgstyx means the JAR is not on the classpath — recheck --packages or your cluster libraries.