沧海拾珠

Pandas中unstack和stack用法

1. DataFrame.unstack(level = -1 , fill_value = None),unstack 本身意为不堆叠,这里我理解为将数据结构展开。

level值可以改为其他值,如0。fill_value可以指定空白值为多少,如fill_value = 0。此外还可以将需要unstack的元素传入作为参数,如DataFrame.unstack(‘Medal’)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
>>> s
one a 1.0
b 2.0
two a 3.0
b 4.0
>>> s.unstack()
a b
one 1.0 2.0
two 3.0 4.0
>>> s.unstack(level=0)
one two
a 1.0 3.0
b 2.0 4.0

当dataFrame用到分组groupby的时候,数据会堆叠在一起,如:

1
2
3
4
5
6
7
8
9
10
11
df.groupby(['Edition','Medal']).size()
thlete Medal
AABYE, Edgar Gold 1
AALTONEN, Arvo Ossian Bronze 2
AALTONEN, Paavo Johannes Bronze 2
Gold 3
AAMODT, Ragnhild Gold 1
AANING, Alf Lied Silver 1
AARDENBURG, Willemien Bronze 1
AARDEWIJN, Pepijn Silver 1

使用unstack()

1
2
3
4
5
6
7
8
9
10
11
df.groupby(['Edition','Medal']).size().unstack()
Medal Bronze Gold Silver
Athlete
AABYE, Edgar NaN 1.0 NaN
AALTONEN, Arvo Ossian 2.0 NaN NaN
AALTONEN, Paavo Johannes2.0 3.0 NaN
AAMODT, Ragnhild NaN 1.0 NaN
AANING, Alf Lied NaN NaN 1.0
AARDENBURG, Willemien 1.0 NaN NaN
AARDEWIJN, Pepijn NaN NaN 1.0

应用unstack()之后medal数据分为单独的三列,之后方便作图。

2. DataFrame.stack(level = -1, dropna = True),暂时还没用到这个功能。

1
2
3
4
5
6
7
8
9
>>> s
a b
one 1. 2.
two 3. 4.
>>> s.stack()
one a 1
b 2
two a 3
b 4