J'ai deux dataframes:
DataFrame1: p> dataframe2: p> Je dois fusionner ces Dataframe Pour obtenir ce qui suit: P> +-----++-----++-------------++---------------+
| id || name| has_bank_acc || has_email_acc |
+-----++-----++-------------++---------------+
| 0|| qwe|| true | null |
| 1|| asd|| false | null |
| 2|| rty|| false | null |
| 3|| tyu|| true | null |
| 0|| qwe|| null | true |
| 5|| hjk|| null | false |
| 8|| oiu|| null | false |
| 7|| nmb|| null | true |
+-----++-----++-------------+----------------+
4 Réponses :
Vous ne pouvez pas effectuer Union code> avec différentes colonnes. Si vous ajoutez des colonnes manquantes et laissez NULL, il donnera une erreur de type de données.
Donc, la seule solution est rejoindre.
scala> df1.show()
+---+----+------------+
| id|name|has_bank_acc|
+---+----+------------+
| 0| qwe| true|
| 1| asd| false|
| 2| rty| false|
| 3| tyu| true|
+---+----+------------+
scala> df2.show()
+---+----+-------------+
| id|name|has_email_acc|
+---+----+-------------+
| 0| qwe| true |
| 5| hjk| false |
| 8| oiu| false |
| 7| nmb| true |
+---+----+-------------+
scala> val df11 = df1.withColumn("fid", lit(1))
scala> val df22 = df1.withColumn("fid", lit(2))
scala> df11.alias("1").join(df22.alias("2"), List("fid", "id", "name"),"full").drop("fid").show()
+---+----+------------+------------+
| id|name|has_bank_acc|has_bank_acc|
+---+----+------------+------------+
| 0| qwe| true| null|
| 1| asd| false| null|
| 2| rty| false| null|
| 3| tyu| true| null|
| 0| qwe| null| true|
| 1| asd| null| false|
| 2| rty| null| false|
| 3| tyu| null| true|
+---+----+------------+------------+
La solution pourrait être: laissez-moi savoir si cela aide !! p> p>
val data = Seq((0,"qwe","true"),(1,"asd","false"),(2,"rty","false"),(3,"tyu","true")).toDF("id","name","has_bank_acc") scala> data.show +---+----+------------+ | id|name|has_bank_acc| +---+----+------------+ | 0| qwe| true| | 1| asd| false| | 2| rty| false| | 3| tyu| true| +---+----+------------+ val data2 = Seq((0,"qwe","true"),(5,"hjk","false"),(8,"oiu","false"),(7,"nmb","true")).toDF("id","name","has_email_acc") scala> data2.show +---+----+-------------+ | id|name|has_email_acc| +---+----+-------------+ | 0| qwe| true| | 5| hjk| false| | 8| oiu| false| | 7| nmb| true| +---+----+-------------+ val data_cols = data.columns val data2_cols = data2.columns val transformedData = data2_cols.diff(data_cols).foldLeft(data) { case (df, (newCols)) => df.withColumn(newCols, lit("null")) } val transformedData2 = data_cols.diff(data2_cols).foldLeft(data2) { case (df, (newCols)) => df.withColumn(newCols, lit("null")) } val finalData = transformedData2.unionByName(transformedData) finalData.show scala> finalData.show +---+----+-------------+------------+ | id|name|has_email_acc|has_bank_acc| +---+----+-------------+------------+ | 0| qwe| true| null| | 5| hjk| false| null| | 8| oiu| false| null| | 7| nmb| true| null| | 0| qwe| null| true| | 1| asd| null| false| | 2| rty| null| false| | 3| tyu| null| true| +---+----+-------------+------------+
"Unionall" avec des colonnes manquées Ajout peut aider:
dataframe1 .withColumn("has_email_acc", lit(null).cast(BooleanType)) .unionByName(dataframe2.withColumn("has_bank_acc", lit(null).cast(BooleanType)))
Quelle erreur vous obtenez tout en faisant union et rejoindre