spark中如何实现行列转换即宽表窄表转换,很多新手对此不是很清楚,为了帮助大家解决这个难题,下面小编将为大家详细讲解,有这方面需求的人可以来学习下,希望你能有所收获。
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession, SQLContext, Row, functions as F
from pyspark.sql.functions import array, col, explode, struct, lit
conf = SparkConf().setAppName("test").setMaster("local[*]")
sc = SparkContext(conf=conf)
spark = SQLContext(sc)
# df is datasource, by will exclude column
def df_columns_to_line(df, by):
# Filter dtypes and split into column names and type description
df_a = df.select([col(c).cast("string") for c in df.columns])
cols, dtypes = zip(*((c, t) for (c, t) in df_a.dtypes if c not in by))
# Spark SQL supports only homogeneous columns
assert len(set(dtypes)) == 1, "All columns have to be of the same type"
# Create and explode an array of (column_name, column_value) structs
kvs = explode(array([
struct(lit(c).alias("feature"), col(c).alias("value")) for c in cols
])).alias("kvs")
return df_a.select(by + [kvs]).select(by + ["kvs.feature", "kvs.value"])
df = sc.parallelize([(1, 0.0, 0.6), (1, 0.6, 0.7)]).toDF(["A", "col_1", "col_2"])
df_row_data = df_columns_to_line(df, ["A"])
df.show()
df_row_data.show()
>>> df.show()
+---+-----+-----+
| A|col_1|col_2|
+---+-----+-----+
| 1| 0.0| 0.6|
| 1| 0.6| 0.7|
+---+-----+-----+
>>> df_row_data.show()
+---+-------+-----+
| A|feature|value|
+---+-------+-----+
| 1| col_1| 0.0|
| 1| col_2| 0.6|
| 1| col_1| 0.6|
| 1| col_2| 0.7|
+---+-------+-----+
注意feature和value是原多列名转换为行数据后,重新定义的最终两列名
df_features = df_row_data.select('feature').distinct().collect()
features = map(lambda r:r.feature, df_features)
df_column_data = df_row_data.groupby("A").pivot('feature', features).agg(F.first('value', ignorenulls=True))
df_column_data.show()
+---+-----+-----+
| A|col_2|col_1|
+---+-----+-----+
| 1| 0.6| 0.0|
+---+-----+-----+
行转列比较简单,在上文结果基础上直接转换,关键是pivot函数的使用
看完上述内容是否对您有帮助呢?如果还想对相关知识有进一步的了解或阅读更多相关文章,请关注亿速云行业资讯频道,感谢您对亿速云的支持。
亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。
原文链接:https://my.oschina.net/u/3744350/blog/4678031