Unpivotdfcolumnstomultiplecolumnsandrows
I have a df like this:
Country Industry 2011_0-9_AF 2011_0-9_AP
US AB 0 0
US AC 12.34 12.4
UK AB 1 2
UK AC 12 5
So, in my original dataframe I have 3
countries for every country I have 4
industries and I have 1120
columns like 2011_0-9_AF etc.
I need to transform the df like this:
Country Industry Year Group_Type Tags Value
US AB 2011 0-9 AF 0
US AB 2011 0-9 AP 0
US AC 2011 0-9 AF 12.34
US AC 2011 0-9 AP 12.4
And similarly for UK and other countries. So, I want columns to be split into 4, the value from starting to 1st underscore as Year
, then Group_Type
, then Tags
and then the value of it in Value
column
I am able to create the same in PowerBI but since it has 1120 columns, it has already taken more than 2 hours and still running and I have 5 files like this.
Looking for a solution which can be faster in Python?
回答
您可以尝试melt
它,然后split
通过_
以下方式尝试变量列:
long_df = pd.melt(df, id_vars=['Country', 'Industry'])
long_df[['Year', 'Group_Type', 'Tags']] = long_df.variable.str.split('_', expand=True)
long_df.drop('variable', axis=1)
# Country Industry value Year Group_Type Tags
#0 US AB 0.00 2011 0-9 AF
#1 US AC 12.34 2011 0-9 AF
#2 UK AB 1.00 2011 0-9 AF
#3 UK AC 12.00 2011 0-9 AF
#4 US AB 0.00 2011 0-9 AP
#5 US AC 12.40 2011 0-9 AP
#6 UK AB 2.00 2011 0-9 AP
#7 UK AC 5.00 2011 0-9 AP
- Wow that's an amazing solution, I knew melt was needed but didn't think of using `id_vars`
THE END
二维码