Unpivotdfcolumnstomultiplecolumnsandrows

I have a df like this:

Country  Industry   2011_0-9_AF    2011_0-9_AP
US        AB            0               0
US        AC           12.34           12.4
UK        AB            1               2
UK        AC            12              5

So, in my original dataframe I have 3 countries for every country I have 4 industries and I have 1120 columns like 2011_0-9_AF etc.

I need to transform the df like this:

Country  Industry   Year   Group_Type    Tags    Value
US        AB         2011   0-9          AF      0
US        AB         2011   0-9          AP      0
US        AC         2011   0-9          AF      12.34
US        AC         2011   0-9          AP      12.4

And similarly for UK and other countries. So, I want columns to be split into 4, the value from starting to 1st underscore as Year, then Group_Type, then Tags and then the value of it in Value column

I am able to create the same in PowerBI but since it has 1120 columns, it has already taken more than 2 hours and still running and I have 5 files like this.

Looking for a solution which can be faster in Python?

回答

您可以尝试melt它,然后split通过_以下方式尝试变量列:

long_df = pd.melt(df, id_vars=['Country', 'Industry'])
long_df[['Year', 'Group_Type', 'Tags']] = long_df.variable.str.split('_', expand=True)

long_df.drop('variable', axis=1)
#  Country Industry  value  Year Group_Type Tags
#0      US       AB   0.00  2011        0-9   AF
#1      US       AC  12.34  2011        0-9   AF
#2      UK       AB   1.00  2011        0-9   AF
#3      UK       AC  12.00  2011        0-9   AF
#4      US       AB   0.00  2011        0-9   AP
#5      US       AC  12.40  2011        0-9   AP
#6      UK       AB   2.00  2011        0-9   AP
#7      UK       AC   5.00  2011        0-9   AP

  • Wow that's an amazing solution, I knew melt was needed but didn't think of using `id_vars`

以上是Unpivotdfcolumnstomultiplecolumnsandrows的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>