Overview

Dataset statistics

Number of variables14
Number of observations87
Missing cells248
Missing cells (%)20.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory59.7 KiB
Average record size in memory702.1 B

Variable types

Text4
Numeric3
Categorical7

Alerts

birth_year is highly overall correlated with sex and 2 other fieldsHigh correlation
films is highly overall correlated with sexHigh correlation
gender is highly overall correlated with sexHigh correlation
height is highly overall correlated with mass and 1 other fieldsHigh correlation
mass is highly overall correlated with height and 3 other fieldsHigh correlation
sex is highly overall correlated with birth_year and 5 other fieldsHigh correlation
skin_color is highly overall correlated with birth_year and 3 other fieldsHigh correlation
species is highly overall correlated with birth_year and 4 other fieldsHigh correlation
height has 6 (6.9%) missing valuesMissing
mass has 28 (32.2%) missing valuesMissing
hair_color has 5 (5.7%) missing valuesMissing
birth_year has 44 (50.6%) missing valuesMissing
sex has 4 (4.6%) missing valuesMissing
gender has 4 (4.6%) missing valuesMissing
homeworld has 10 (11.5%) missing valuesMissing
species has 4 (4.6%) missing valuesMissing
vehicles has 76 (87.4%) missing valuesMissing
starships has 67 (77.0%) missing valuesMissing
name has unique valuesUnique

Reproduction

Analysis started2023-12-30 08:20:07.955915
Analysis finished2023-12-30 08:20:09.120938
Duration1.17 second
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

name
Text

UNIQUE 

Distinct87
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size6.0 KiB
2023-12-30T09:20:09.332705image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length21
Median length16
Mean length10.287356
Min length3

Characters and Unicode

Total characters895
Distinct characters59
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique87 ?
Unique (%)100.0%

Sample

1st rowLuke Skywalker
2nd rowC-3PO
3rd rowR2-D2
4th rowDarth Vader
5th rowLeia Organa
ValueCountFrequency (%)
lars 3
 
1.9%
skywalker 3
 
1.9%
organa 2
 
1.3%
antilles 2
 
1.3%
fett 2
 
1.3%
jar 2
 
1.3%
darth 2
 
1.3%
obi-wan 1
 
0.6%
kenobi 1
 
0.6%
biggs 1
 
0.6%
Other values (138) 138
87.9%
2023-12-30T09:20:09.647860image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 90
 
10.1%
70
 
7.8%
e 62
 
6.9%
i 55
 
6.1%
r 54
 
6.0%
o 49
 
5.5%
n 47
 
5.3%
s 38
 
4.2%
l 37
 
4.1%
t 35
 
3.9%
Other values (49) 358
40.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 638
71.3%
Uppercase Letter 167
 
18.7%
Space Separator 70
 
7.8%
Decimal Number 11
 
1.2%
Dash Punctuation 9
 
1.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 90
14.1%
e 62
9.7%
i 55
 
8.6%
r 54
 
8.5%
o 49
 
7.7%
n 47
 
7.4%
s 38
 
6.0%
l 37
 
5.8%
t 35
 
5.5%
u 29
 
4.5%
Other values (15) 142
22.3%
Uppercase Letter
ValueCountFrequency (%)
S 13
 
7.8%
B 12
 
7.2%
P 12
 
7.2%
T 12
 
7.2%
W 11
 
6.6%
D 11
 
6.6%
A 10
 
6.0%
L 10
 
6.0%
R 9
 
5.4%
G 8
 
4.8%
Other values (15) 59
35.3%
Decimal Number
ValueCountFrequency (%)
8 3
27.3%
4 2
18.2%
2 2
18.2%
1 1
 
9.1%
3 1
 
9.1%
5 1
 
9.1%
7 1
 
9.1%
Space Separator
ValueCountFrequency (%)
70
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 805
89.9%
Common 90
 
10.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 90
 
11.2%
e 62
 
7.7%
i 55
 
6.8%
r 54
 
6.7%
o 49
 
6.1%
n 47
 
5.8%
s 38
 
4.7%
l 37
 
4.6%
t 35
 
4.3%
u 29
 
3.6%
Other values (40) 309
38.4%
Common
ValueCountFrequency (%)
70
77.8%
- 9
 
10.0%
8 3
 
3.3%
4 2
 
2.2%
2 2
 
2.2%
1 1
 
1.1%
3 1
 
1.1%
5 1
 
1.1%
7 1
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 891
99.6%
None 4
 
0.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 90
 
10.1%
70
 
7.9%
e 62
 
7.0%
i 55
 
6.2%
r 54
 
6.1%
o 49
 
5.5%
n 47
 
5.3%
s 38
 
4.3%
l 37
 
4.2%
t 35
 
3.9%
Other values (48) 354
39.7%
None
ValueCountFrequency (%)
é 4
100.0%

height
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct45
Distinct (%)55.6%
Missing6
Missing (%)6.9%
Infinite0
Infinite (%)0.0%
Mean174.60494
Minimum66
Maximum264
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size828.0 B
2023-12-30T09:20:09.730440image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum66
5-th percentile96
Q1167
median180
Q3191
95-th percentile224
Maximum264
Range198
Interquartile range (IQR)24

Descriptive statistics

Standard deviation34.774157
Coefficient of variation (CV)0.19915907
Kurtosis2.1268624
Mean174.60494
Median Absolute Deviation (MAD)12
Skewness-1.0854741
Sum14143
Variance1209.242
MonotonicityNot monotonic
2023-12-30T09:20:09.798412image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
183 7
 
8.0%
188 5
 
5.7%
180 5
 
5.7%
170 4
 
4.6%
178 4
 
4.6%
196 3
 
3.4%
191 3
 
3.4%
175 3
 
3.4%
193 3
 
3.4%
150 2
 
2.3%
Other values (35) 42
48.3%
(Missing) 6
 
6.9%
ValueCountFrequency (%)
66 1
1.1%
79 1
1.1%
88 1
1.1%
94 1
1.1%
96 2
2.3%
97 1
1.1%
112 1
1.1%
122 1
1.1%
137 1
1.1%
150 2
2.3%
ValueCountFrequency (%)
264 1
1.1%
234 1
1.1%
229 1
1.1%
228 1
1.1%
224 1
1.1%
216 1
1.1%
213 1
1.1%
206 2
2.3%
202 1
1.1%
200 1
1.1%

mass
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct38
Distinct (%)64.4%
Missing28
Missing (%)32.2%
Infinite0
Infinite (%)0.0%
Mean97.311864
Minimum15
Maximum1358
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size828.0 B
2023-12-30T09:20:09.865705image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile30.8
Q155.6
median79
Q384.5
95-th percentile136.4
Maximum1358
Range1343
Interquartile range (IQR)28.9

Descriptive statistics

Standard deviation169.45716
Coefficient of variation (CV)1.7413823
Kurtosis55.418003
Mean97.311864
Median Absolute Deviation (MAD)11
Skewness7.3365961
Sum5741.4
Variance28715.73
MonotonicityNot monotonic
2023-12-30T09:20:09.935789image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=38)
ValueCountFrequency (%)
80 6
 
6.9%
79 4
 
4.6%
77 3
 
3.4%
84 3
 
3.4%
75 3
 
3.4%
32 2
 
2.3%
136 2
 
2.3%
82 2
 
2.3%
48 2
 
2.3%
45 2
 
2.3%
Other values (28) 30
34.5%
(Missing) 28
32.2%
ValueCountFrequency (%)
15 1
1.1%
17 1
1.1%
20 1
1.1%
32 2
2.3%
40 1
1.1%
45 2
2.3%
48 2
2.3%
49 1
1.1%
50 2
2.3%
55 2
2.3%
ValueCountFrequency (%)
1358 1
1.1%
159 1
1.1%
140 1
1.1%
136 2
2.3%
120 1
1.1%
113 1
1.1%
112 1
1.1%
110 1
1.1%
102 1
1.1%
90 1
1.1%

hair_color
Categorical

MISSING 

Distinct11
Distinct (%)13.4%
Missing5
Missing (%)5.7%
Memory size5.2 KiB
none
38 
brown
18 
black
13 
white
blond
 
3
Other values (6)

Length

Max length13
Median length12
Mean length4.804878
Min length4

Characters and Unicode

Total characters394
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)7.3%

Sample

1st rowblond
2nd rownone
3rd rowbrown
4th rowbrown, grey
5th rowbrown

Common Values

ValueCountFrequency (%)
none 38
43.7%
brown 18
20.7%
black 13
 
14.9%
white 4
 
4.6%
blond 3
 
3.4%
brown, grey 1
 
1.1%
auburn, white 1
 
1.1%
auburn, grey 1
 
1.1%
grey 1
 
1.1%
auburn 1
 
1.1%
(Missing) 5
 
5.7%

Length

2023-12-30T09:20:09.996563image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
none 38
44.7%
brown 19
22.4%
black 13
 
15.3%
white 5
 
5.9%
blond 3
 
3.5%
grey 3
 
3.5%
auburn 3
 
3.5%
blonde 1
 
1.2%

Most occurring characters

ValueCountFrequency (%)
n 102
25.9%
o 61
15.5%
e 47
11.9%
b 39
 
9.9%
r 25
 
6.3%
w 24
 
6.1%
l 17
 
4.3%
a 16
 
4.1%
k 13
 
3.3%
c 13
 
3.3%
Other values (9) 37
 
9.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 388
98.5%
Other Punctuation 3
 
0.8%
Space Separator 3
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 102
26.3%
o 61
15.7%
e 47
12.1%
b 39
 
10.1%
r 25
 
6.4%
w 24
 
6.2%
l 17
 
4.4%
a 16
 
4.1%
k 13
 
3.4%
c 13
 
3.4%
Other values (7) 31
 
8.0%
Other Punctuation
ValueCountFrequency (%)
, 3
100.0%
Space Separator
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 388
98.5%
Common 6
 
1.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 102
26.3%
o 61
15.7%
e 47
12.1%
b 39
 
10.1%
r 25
 
6.4%
w 24
 
6.2%
l 17
 
4.4%
a 16
 
4.1%
k 13
 
3.4%
c 13
 
3.4%
Other values (7) 31
 
8.0%
Common
ValueCountFrequency (%)
, 3
50.0%
3
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 394
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 102
25.9%
o 61
15.5%
e 47
11.9%
b 39
 
9.9%
r 25
 
6.3%
w 24
 
6.1%
l 17
 
4.3%
a 16
 
4.1%
k 13
 
3.3%
c 13
 
3.3%
Other values (9) 37
 
9.4%

skin_color
Categorical

HIGH CORRELATION 

Distinct31
Distinct (%)35.6%
Missing0
Missing (%)0.0%
Memory size5.5 KiB
fair
17 
light
11 
green
grey
dark
Other values (26)
41 

Length

Max length19
Median length16
Mean length5.9310345
Min length3

Characters and Unicode

Total characters516
Distinct characters24
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)18.4%

Sample

1st rowfair
2nd rowgold
3rd rowwhite, blue
4th rowwhite
5th rowlight

Common Values

ValueCountFrequency (%)
fair 17
19.5%
light 11
12.6%
green 6
 
6.9%
grey 6
 
6.9%
dark 6
 
6.9%
pale 5
 
5.7%
brown 4
 
4.6%
none 2
 
2.3%
white 2
 
2.3%
white, blue 2
 
2.3%
Other values (21) 26
29.9%

Length

2023-12-30T09:20:10.054472image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
fair 18
17.0%
grey 12
11.3%
light 11
10.4%
green 10
9.4%
blue 8
7.5%
brown 7
 
6.6%
white 7
 
6.6%
dark 6
 
5.7%
pale 5
 
4.7%
red 5
 
4.7%
Other values (11) 17
16.0%

Most occurring characters

ValueCountFrequency (%)
e 71
13.8%
r 62
12.0%
i 37
 
7.2%
l 37
 
7.2%
g 37
 
7.2%
a 35
 
6.8%
n 30
 
5.8%
t 26
 
5.0%
w 19
 
3.7%
19
 
3.7%
Other values (14) 143
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 479
92.8%
Space Separator 19
 
3.7%
Other Punctuation 17
 
3.3%
Dash Punctuation 1
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 71
14.8%
r 62
12.9%
i 37
 
7.7%
l 37
 
7.7%
g 37
 
7.7%
a 35
 
7.3%
n 30
 
6.3%
t 26
 
5.4%
w 19
 
4.0%
o 19
 
4.0%
Other values (11) 106
22.1%
Space Separator
ValueCountFrequency (%)
19
100.0%
Other Punctuation
ValueCountFrequency (%)
, 17
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 479
92.8%
Common 37
 
7.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 71
14.8%
r 62
12.9%
i 37
 
7.7%
l 37
 
7.7%
g 37
 
7.7%
a 35
 
7.3%
n 30
 
6.3%
t 26
 
5.4%
w 19
 
4.0%
o 19
 
4.0%
Other values (11) 106
22.1%
Common
ValueCountFrequency (%)
19
51.4%
, 17
45.9%
- 1
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 516
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 71
13.8%
r 62
12.0%
i 37
 
7.2%
l 37
 
7.2%
g 37
 
7.2%
a 35
 
6.8%
n 30
 
5.8%
t 26
 
5.0%
w 19
 
3.7%
19
 
3.7%
Other values (14) 143
27.7%

eye_color
Categorical

Distinct15
Distinct (%)17.2%
Missing0
Missing (%)0.0%
Memory size5.4 KiB
brown
21 
blue
19 
yellow
11 
black
10 
orange
Other values (10)
18 

Length

Max length13
Median length9
Mean length5.1034483
Min length3

Characters and Unicode

Total characters444
Distinct characters22
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)8.0%

Sample

1st rowblue
2nd rowyellow
3rd rowred
4th rowyellow
5th rowbrown

Common Values

ValueCountFrequency (%)
brown 21
24.1%
blue 19
21.8%
yellow 11
12.6%
black 10
11.5%
orange 8
 
9.2%
red 5
 
5.7%
hazel 3
 
3.4%
unknown 3
 
3.4%
blue-gray 1
 
1.1%
pink 1
 
1.1%
Other values (5) 5
 
5.7%

Length

2023-12-30T09:20:10.110682image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
brown 21
23.6%
blue 20
22.5%
yellow 12
13.5%
black 10
11.2%
orange 8
 
9.0%
red 6
 
6.7%
hazel 3
 
3.4%
unknown 3
 
3.4%
blue-gray 1
 
1.1%
pink 1
 
1.1%
Other values (4) 4
 
4.5%

Most occurring characters

ValueCountFrequency (%)
l 59
13.3%
e 53
11.9%
b 52
11.7%
o 45
10.1%
n 40
9.0%
r 38
8.6%
w 37
8.3%
u 24
5.4%
a 23
 
5.2%
k 15
 
3.4%
Other values (12) 58
13.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 439
98.9%
Other Punctuation 2
 
0.5%
Space Separator 2
 
0.5%
Dash Punctuation 1
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 59
13.4%
e 53
12.1%
b 52
11.8%
o 45
10.3%
n 40
9.1%
r 38
8.7%
w 37
8.4%
u 24
5.5%
a 23
 
5.2%
k 15
 
3.4%
Other values (9) 53
12.1%
Other Punctuation
ValueCountFrequency (%)
, 2
100.0%
Space Separator
ValueCountFrequency (%)
2
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 439
98.9%
Common 5
 
1.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 59
13.4%
e 53
12.1%
b 52
11.8%
o 45
10.3%
n 40
9.1%
r 38
8.7%
w 37
8.4%
u 24
5.5%
a 23
 
5.2%
k 15
 
3.4%
Other values (9) 53
12.1%
Common
ValueCountFrequency (%)
, 2
40.0%
2
40.0%
- 1
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 444
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l 59
13.3%
e 53
11.9%
b 52
11.7%
o 45
10.1%
n 40
9.0%
r 38
8.6%
w 37
8.3%
u 24
5.4%
a 23
 
5.2%
k 15
 
3.4%
Other values (12) 58
13.1%

birth_year
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct36
Distinct (%)83.7%
Missing44
Missing (%)50.6%
Infinite0
Infinite (%)0.0%
Mean87.565116
Minimum8
Maximum896
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size828.0 B
2023-12-30T09:20:10.165635image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile19
Q135
median52
Q372
95-th percentile191.2
Maximum896
Range888
Interquartile range (IQR)37

Descriptive statistics

Standard deviation154.69144
Coefficient of variation (CV)1.7665875
Kurtosis20.590786
Mean87.565116
Median Absolute Deviation (MAD)20
Skewness4.4531193
Sum3765.3
Variance23929.441
MonotonicityNot monotonic
2023-12-30T09:20:10.230144image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=36)
ValueCountFrequency (%)
19 2
 
2.3%
48 2
 
2.3%
41.9 2
 
2.3%
52 2
 
2.3%
82 2
 
2.3%
92 2
 
2.3%
72 2
 
2.3%
62 1
 
1.1%
8 1
 
1.1%
91 1
 
1.1%
Other values (26) 26
29.9%
(Missing) 44
50.6%
ValueCountFrequency (%)
8 1
1.1%
15 1
1.1%
19 2
2.3%
21 1
1.1%
22 1
1.1%
24 1
1.1%
29 1
1.1%
31 1
1.1%
31.5 1
1.1%
33 1
1.1%
ValueCountFrequency (%)
896 1
1.1%
600 1
1.1%
200 1
1.1%
112 1
1.1%
102 1
1.1%
92 2
2.3%
91 1
1.1%
82 2
2.3%
72 2
2.3%
67 1
1.1%

sex
Categorical

HIGH CORRELATION  MISSING 

Distinct4
Distinct (%)4.8%
Missing4
Missing (%)4.6%
Memory size5.2 KiB
male
60 
female
16 
none
 
6
hermaphroditic
 
1

Length

Max length14
Median length4
Mean length4.5060241
Min length4

Characters and Unicode

Total characters374
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)1.2%

Sample

1st rowmale
2nd rownone
3rd rownone
4th rowmale
5th rowfemale

Common Values

ValueCountFrequency (%)
male 60
69.0%
female 16
 
18.4%
none 6
 
6.9%
hermaphroditic 1
 
1.1%
(Missing) 4
 
4.6%

Length

2023-12-30T09:20:10.288544image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-30T09:20:10.344883image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
male 60
72.3%
female 16
 
19.3%
none 6
 
7.2%
hermaphroditic 1
 
1.2%

Most occurring characters

ValueCountFrequency (%)
e 99
26.5%
m 77
20.6%
a 77
20.6%
l 76
20.3%
f 16
 
4.3%
n 12
 
3.2%
o 7
 
1.9%
h 2
 
0.5%
r 2
 
0.5%
i 2
 
0.5%
Other values (4) 4
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 374
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 99
26.5%
m 77
20.6%
a 77
20.6%
l 76
20.3%
f 16
 
4.3%
n 12
 
3.2%
o 7
 
1.9%
h 2
 
0.5%
r 2
 
0.5%
i 2
 
0.5%
Other values (4) 4
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 374
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 99
26.5%
m 77
20.6%
a 77
20.6%
l 76
20.3%
f 16
 
4.3%
n 12
 
3.2%
o 7
 
1.9%
h 2
 
0.5%
r 2
 
0.5%
i 2
 
0.5%
Other values (4) 4
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 374
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 99
26.5%
m 77
20.6%
a 77
20.6%
l 76
20.3%
f 16
 
4.3%
n 12
 
3.2%
o 7
 
1.9%
h 2
 
0.5%
r 2
 
0.5%
i 2
 
0.5%
Other values (4) 4
 
1.1%

gender
Categorical

HIGH CORRELATION  MISSING 

Distinct2
Distinct (%)2.4%
Missing4
Missing (%)4.6%
Memory size5.6 KiB
masculine
66 
feminine
17 

Length

Max length9
Median length9
Mean length8.7951807
Min length8

Characters and Unicode

Total characters730
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmasculine
2nd rowmasculine
3rd rowmasculine
4th rowmasculine
5th rowfeminine

Common Values

ValueCountFrequency (%)
masculine 66
75.9%
feminine 17
 
19.5%
(Missing) 4
 
4.6%

Length

2023-12-30T09:20:10.402947image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-30T09:20:10.454080image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
masculine 66
79.5%
feminine 17
 
20.5%

Most occurring characters

ValueCountFrequency (%)
i 100
13.7%
n 100
13.7%
e 100
13.7%
m 83
11.4%
a 66
9.0%
s 66
9.0%
c 66
9.0%
u 66
9.0%
l 66
9.0%
f 17
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 730
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 100
13.7%
n 100
13.7%
e 100
13.7%
m 83
11.4%
a 66
9.0%
s 66
9.0%
c 66
9.0%
u 66
9.0%
l 66
9.0%
f 17
 
2.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 730
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 100
13.7%
n 100
13.7%
e 100
13.7%
m 83
11.4%
a 66
9.0%
s 66
9.0%
c 66
9.0%
u 66
9.0%
l 66
9.0%
f 17
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 730
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 100
13.7%
n 100
13.7%
e 100
13.7%
m 83
11.4%
a 66
9.0%
s 66
9.0%
c 66
9.0%
u 66
9.0%
l 66
9.0%
f 17
 
2.3%

homeworld
Text

MISSING 

Distinct48
Distinct (%)62.3%
Missing10
Missing (%)11.5%
Memory size5.3 KiB
2023-12-30T09:20:10.574180image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length14
Median length12
Mean length7.1428571
Min length4

Characters and Unicode

Total characters550
Distinct characters40
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39 ?
Unique (%)50.6%

Sample

1st rowTatooine
2nd rowTatooine
3rd rowNaboo
4th rowTatooine
5th rowAlderaan
ValueCountFrequency (%)
naboo 11
 
12.9%
tatooine 10
 
11.8%
alderaan 3
 
3.5%
coruscant 3
 
3.5%
kamino 3
 
3.5%
mirial 2
 
2.4%
ryloth 2
 
2.4%
corellia 2
 
2.4%
kashyyyk 2
 
2.4%
bespin 1
 
1.2%
Other values (46) 46
54.1%
2023-12-30T09:20:10.766481image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 77
14.0%
a 73
13.3%
n 42
 
7.6%
i 40
 
7.3%
e 36
 
6.5%
r 30
 
5.5%
t 28
 
5.1%
l 27
 
4.9%
u 15
 
2.7%
T 14
 
2.5%
Other values (30) 168
30.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 456
82.9%
Uppercase Letter 86
 
15.6%
Space Separator 8
 
1.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 77
16.9%
a 73
16.0%
n 42
9.2%
i 40
8.8%
e 36
7.9%
r 30
 
6.6%
t 28
 
6.1%
l 27
 
5.9%
u 15
 
3.3%
s 14
 
3.1%
Other values (10) 74
16.2%
Uppercase Letter
ValueCountFrequency (%)
T 14
16.3%
N 13
15.1%
C 11
12.8%
K 7
8.1%
M 6
 
7.0%
S 6
 
7.0%
A 5
 
5.8%
D 3
 
3.5%
R 3
 
3.5%
I 3
 
3.5%
Other values (9) 15
17.4%
Space Separator
ValueCountFrequency (%)
8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 542
98.5%
Common 8
 
1.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 77
14.2%
a 73
13.5%
n 42
 
7.7%
i 40
 
7.4%
e 36
 
6.6%
r 30
 
5.5%
t 28
 
5.2%
l 27
 
5.0%
u 15
 
2.8%
T 14
 
2.6%
Other values (29) 160
29.5%
Common
ValueCountFrequency (%)
8
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 550
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 77
14.0%
a 73
13.3%
n 42
 
7.6%
i 40
 
7.3%
e 36
 
6.5%
r 30
 
5.5%
t 28
 
5.1%
l 27
 
4.9%
u 15
 
2.7%
T 14
 
2.5%
Other values (30) 168
30.5%

species
Categorical

HIGH CORRELATION  MISSING 

Distinct37
Distinct (%)44.6%
Missing4
Missing (%)4.6%
Memory size5.4 KiB
Human
35 
Droid
Gungan
 
3
Kaminoan
 
2
Twi'lek
 
2
Other values (32)
35 

Length

Max length14
Median length5
Mean length6.1686747
Min length3

Characters and Unicode

Total characters512
Distinct characters44
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique29 ?
Unique (%)34.9%

Sample

1st rowHuman
2nd rowDroid
3rd rowDroid
4th rowHuman
5th rowHuman

Common Values

ValueCountFrequency (%)
Human 35
40.2%
Droid 6
 
6.9%
Gungan 3
 
3.4%
Kaminoan 2
 
2.3%
Twi'lek 2
 
2.3%
Zabrak 2
 
2.3%
Mirialan 2
 
2.3%
Wookiee 2
 
2.3%
Sullustan 1
 
1.1%
Vulptereen 1
 
1.1%
Other values (27) 27
31.0%
(Missing) 4
 
4.6%

Length

2023-12-30T09:20:10.840925image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
human 35
40.7%
droid 6
 
7.0%
gungan 3
 
3.5%
kaminoan 2
 
2.3%
twi'lek 2
 
2.3%
zabrak 2
 
2.3%
mirialan 2
 
2.3%
wookiee 2
 
2.3%
togruta 1
 
1.2%
tholothian 1
 
1.2%
Other values (30) 30
34.9%

Most occurring characters

ValueCountFrequency (%)
a 78
15.2%
n 67
13.1%
u 49
 
9.6%
m 40
 
7.8%
H 36
 
7.0%
o 31
 
6.1%
i 29
 
5.7%
e 24
 
4.7%
r 19
 
3.7%
l 15
 
2.9%
Other values (34) 124
24.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 420
82.0%
Uppercase Letter 85
 
16.6%
Other Punctuation 4
 
0.8%
Space Separator 3
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 78
18.6%
n 67
16.0%
u 49
11.7%
m 40
9.5%
o 31
 
7.4%
i 29
 
6.9%
e 24
 
5.7%
r 19
 
4.5%
l 15
 
3.6%
d 12
 
2.9%
Other values (11) 56
13.3%
Uppercase Letter
ValueCountFrequency (%)
H 36
42.4%
D 8
 
9.4%
T 7
 
8.2%
C 4
 
4.7%
M 4
 
4.7%
K 4
 
4.7%
G 4
 
4.7%
N 2
 
2.4%
W 2
 
2.4%
S 2
 
2.4%
Other values (11) 12
 
14.1%
Other Punctuation
ValueCountFrequency (%)
' 4
100.0%
Space Separator
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 505
98.6%
Common 7
 
1.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 78
15.4%
n 67
13.3%
u 49
9.7%
m 40
 
7.9%
H 36
 
7.1%
o 31
 
6.1%
i 29
 
5.7%
e 24
 
4.8%
r 19
 
3.8%
l 15
 
3.0%
Other values (32) 117
23.2%
Common
ValueCountFrequency (%)
' 4
57.1%
3
42.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 512
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 78
15.2%
n 67
13.1%
u 49
 
9.6%
m 40
 
7.8%
H 36
 
7.0%
o 31
 
6.1%
i 29
 
5.7%
e 24
 
4.7%
r 19
 
3.7%
l 15
 
2.9%
Other values (34) 124
24.2%

films
Categorical

HIGH CORRELATION 

Distinct24
Distinct (%)27.6%
Missing0
Missing (%)0.0%
Memory size8.2 KiB
Attack of the Clones
13 
The Phantom Menace
13 
The Phantom Menace, Attack of the Clones, Revenge of the Sith
Attack of the Clones, Revenge of the Sith
The Force Awakens
Other values (19)
41 

Length

Max length137
Median length95
Mean length38.218391
Min length10

Characters and Unicode

Total characters3325
Distinct characters35
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)9.2%

Sample

1st rowA New Hope, The Empire Strikes Back, Return of the Jedi, Revenge of the Sith, The Force Awakens
2nd rowA New Hope, The Empire Strikes Back, Return of the Jedi, The Phantom Menace, Attack of the Clones, Revenge of the Sith
3rd rowA New Hope, The Empire Strikes Back, Return of the Jedi, The Phantom Menace, Attack of the Clones, Revenge of the Sith, The Force Awakens
4th rowA New Hope, The Empire Strikes Back, Return of the Jedi, Revenge of the Sith
5th rowA New Hope, The Empire Strikes Back, Return of the Jedi, Revenge of the Sith, The Force Awakens

Common Values

ValueCountFrequency (%)
Attack of the Clones 13
14.9%
The Phantom Menace 13
14.9%
The Phantom Menace, Attack of the Clones, Revenge of the Sith 8
 
9.2%
Attack of the Clones, Revenge of the Sith 7
 
8.0%
The Force Awakens 5
 
5.7%
Return of the Jedi 5
 
5.7%
A New Hope 4
 
4.6%
The Phantom Menace, Attack of the Clones 4
 
4.6%
The Empire Strikes Back 3
 
3.4%
Revenge of the Sith 3
 
3.4%
Other values (14) 22
25.3%

Length

2023-12-30T09:20:10.904530image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 155
24.6%
of 94
14.9%
attack 40
 
6.4%
clones 40
 
6.4%
phantom 34
 
5.4%
menace 34
 
5.4%
revenge 34
 
5.4%
sith 34
 
5.4%
return 20
 
3.2%
jedi 20
 
3.2%
Other values (8) 124
19.7%

Most occurring characters

ValueCountFrequency (%)
542
16.3%
e 495
14.9%
t 278
 
8.4%
h 223
 
6.7%
o 197
 
5.9%
n 173
 
5.2%
a 135
 
4.1%
c 101
 
3.0%
f 94
 
2.8%
i 86
 
2.6%
Other values (25) 1001
30.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2256
67.8%
Space Separator 542
 
16.3%
Uppercase Letter 441
 
13.3%
Other Punctuation 86
 
2.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 495
21.9%
t 278
12.3%
h 223
9.9%
o 197
 
8.7%
n 173
 
7.7%
a 135
 
6.0%
c 101
 
4.5%
f 94
 
4.2%
i 86
 
3.8%
k 83
 
3.7%
Other values (10) 391
17.3%
Uppercase Letter
ValueCountFrequency (%)
A 69
15.6%
T 61
13.8%
R 54
12.2%
S 50
11.3%
C 40
9.1%
M 34
7.7%
P 34
7.7%
J 20
 
4.5%
N 18
 
4.1%
H 18
 
4.1%
Other values (3) 43
9.8%
Space Separator
ValueCountFrequency (%)
542
100.0%
Other Punctuation
ValueCountFrequency (%)
, 86
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2697
81.1%
Common 628
 
18.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 495
18.4%
t 278
 
10.3%
h 223
 
8.3%
o 197
 
7.3%
n 173
 
6.4%
a 135
 
5.0%
c 101
 
3.7%
f 94
 
3.5%
i 86
 
3.2%
k 83
 
3.1%
Other values (23) 832
30.8%
Common
ValueCountFrequency (%)
542
86.3%
, 86
 
13.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3325
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
542
16.3%
e 495
14.9%
t 278
 
8.4%
h 223
 
6.7%
o 197
 
5.9%
n 173
 
5.2%
a 135
 
4.1%
c 101
 
3.0%
f 94
 
2.8%
i 86
 
2.6%
Other values (25) 1001
30.1%

vehicles
Text

MISSING 

Distinct10
Distinct (%)90.9%
Missing76
Missing (%)87.4%
Memory size3.3 KiB
2023-12-30T09:20:10.996156image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length36
Median length21
Mean length19.818182
Min length5

Characters and Unicode

Total characters218
Distinct characters38
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)81.8%

Sample

1st rowSnowspeeder, Imperial Speeder Bike
2nd rowImperial Speeder Bike
3rd rowTribubble bongo
4th rowZephyr-G swoop bike, XJ-6 airspeeder
5th rowAT-ST
ValueCountFrequency (%)
speeder 4
13.8%
bike 4
13.8%
tribubble 2
 
6.9%
bongo 2
 
6.9%
airspeeder 2
 
6.9%
imperial 2
 
6.9%
snowspeeder 2
 
6.9%
zephyr-g 1
 
3.4%
swoop 1
 
3.4%
xj-6 1
 
3.4%
Other values (8) 8
27.6%
2023-12-30T09:20:11.169419image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 38
17.4%
18
 
8.3%
r 18
 
8.3%
i 13
 
6.0%
o 13
 
6.0%
p 13
 
6.0%
b 10
 
4.6%
d 9
 
4.1%
s 9
 
4.1%
l 7
 
3.2%
Other values (28) 70
32.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 167
76.6%
Uppercase Letter 23
 
10.6%
Space Separator 18
 
8.3%
Dash Punctuation 5
 
2.3%
Decimal Number 3
 
1.4%
Other Punctuation 2
 
0.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 38
22.8%
r 18
10.8%
i 13
 
7.8%
o 13
 
7.8%
p 13
 
7.8%
b 10
 
6.0%
d 9
 
5.4%
s 9
 
5.4%
l 7
 
4.2%
n 6
 
3.6%
Other values (11) 31
18.6%
Uppercase Letter
ValueCountFrequency (%)
S 6
26.1%
T 5
21.7%
B 2
 
8.7%
I 2
 
8.7%
E 1
 
4.3%
K 1
 
4.3%
F 1
 
4.3%
X 1
 
4.3%
A 1
 
4.3%
J 1
 
4.3%
Other values (2) 2
 
8.7%
Decimal Number
ValueCountFrequency (%)
6 2
66.7%
2 1
33.3%
Space Separator
ValueCountFrequency (%)
18
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%
Other Punctuation
ValueCountFrequency (%)
, 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 190
87.2%
Common 28
 
12.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 38
20.0%
r 18
 
9.5%
i 13
 
6.8%
o 13
 
6.8%
p 13
 
6.8%
b 10
 
5.3%
d 9
 
4.7%
s 9
 
4.7%
l 7
 
3.7%
n 6
 
3.2%
Other values (23) 54
28.4%
Common
ValueCountFrequency (%)
18
64.3%
- 5
 
17.9%
, 2
 
7.1%
6 2
 
7.1%
2 1
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 218
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 38
17.4%
18
 
8.3%
r 18
 
8.3%
i 13
 
6.0%
o 13
 
6.0%
p 13
 
6.0%
b 10
 
4.6%
d 9
 
4.1%
s 9
 
4.1%
l 7
 
3.2%
Other values (28) 70
32.1%

starships
Text

MISSING 

Distinct15
Distinct (%)75.0%
Missing67
Missing (%)77.0%
Memory size3.8 KiB
2023-12-30T09:20:11.280350image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length104
Median length35
Mean length23.7
Min length6

Characters and Unicode

Total characters474
Distinct characters41
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)60.0%

Sample

1st rowX-wing, Imperial shuttle
2nd rowTIE Advanced x1
3rd rowX-wing
4th rowJedi starfighter, Trade Federation cruiser, Naboo star skiff, Jedi Interceptor, Belbullab-22 starfighter
5th rowNaboo fighter, Trade Federation cruiser, Jedi Interceptor
ValueCountFrequency (%)
naboo 6
 
9.7%
x-wing 5
 
8.1%
falcon 4
 
6.5%
jedi 4
 
6.5%
starfighter 4
 
6.5%
millennium 4
 
6.5%
imperial 3
 
4.8%
shuttle 3
 
4.8%
fighter 3
 
4.8%
belbullab-22 2
 
3.2%
Other values (18) 24
38.7%
2023-12-30T09:20:11.448663image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
42
 
8.9%
i 38
 
8.0%
e 38
 
8.0%
a 32
 
6.8%
r 30
 
6.3%
t 29
 
6.1%
l 26
 
5.5%
n 24
 
5.1%
o 21
 
4.4%
s 14
 
3.0%
Other values (31) 180
38.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 361
76.2%
Uppercase Letter 45
 
9.5%
Space Separator 42
 
8.9%
Other Punctuation 11
 
2.3%
Dash Punctuation 9
 
1.9%
Decimal Number 6
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 38
 
10.5%
e 38
 
10.5%
a 32
 
8.9%
r 30
 
8.3%
t 29
 
8.0%
l 26
 
7.2%
n 24
 
6.6%
o 21
 
5.8%
s 14
 
3.9%
g 13
 
3.6%
Other values (13) 96
26.6%
Uppercase Letter
ValueCountFrequency (%)
N 7
15.6%
I 6
13.3%
F 6
13.3%
X 5
11.1%
J 4
8.9%
M 4
8.9%
T 3
6.7%
S 3
6.7%
A 2
 
4.4%
B 2
 
4.4%
Other values (3) 3
6.7%
Decimal Number
ValueCountFrequency (%)
2 4
66.7%
1 2
33.3%
Space Separator
ValueCountFrequency (%)
42
100.0%
Other Punctuation
ValueCountFrequency (%)
, 11
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 406
85.7%
Common 68
 
14.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 38
 
9.4%
e 38
 
9.4%
a 32
 
7.9%
r 30
 
7.4%
t 29
 
7.1%
l 26
 
6.4%
n 24
 
5.9%
o 21
 
5.2%
s 14
 
3.4%
g 13
 
3.2%
Other values (26) 141
34.7%
Common
ValueCountFrequency (%)
42
61.8%
, 11
 
16.2%
- 9
 
13.2%
2 4
 
5.9%
1 2
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 474
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
42
 
8.9%
i 38
 
8.0%
e 38
 
8.0%
a 32
 
6.8%
r 30
 
6.3%
t 29
 
6.1%
l 26
 
5.5%
n 24
 
5.1%
o 21
 
4.4%
s 14
 
3.0%
Other values (31) 180
38.0%

Interactions

2023-12-30T09:20:08.732274image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-30T09:20:08.417644image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-30T09:20:08.611608image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-30T09:20:08.773814image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-30T09:20:08.461933image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-30T09:20:08.652789image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-30T09:20:08.813761image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-30T09:20:08.574120image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-12-30T09:20:08.693290image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Correlations

2023-12-30T09:20:11.508052image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
birth_yeareye_colorfilmsgenderhair_colorheightmasssexskin_colorspecies
birth_year1.0000.0000.2800.0000.0930.1470.1480.5600.5780.707
eye_color0.0001.0000.0000.1680.246-0.071-0.1110.2770.3890.345
films0.2800.0001.0000.0000.3470.145-0.0860.5070.0000.000
gender0.0000.1680.0001.0000.0890.2820.4100.9550.2740.000
hair_color0.0930.2460.3470.0891.0000.2350.0220.0000.0000.000
height0.147-0.0710.1450.2820.2351.0000.7190.2860.4000.602
mass0.148-0.111-0.0860.4100.0220.7191.0000.6820.7320.687
sex0.5600.2770.5070.9550.0000.2860.6821.0000.6270.610
skin_color0.5780.3890.0000.2740.0000.4000.7320.6271.0000.578
species0.7070.3450.0000.0000.0000.6020.6870.6100.5781.000

Missing values

2023-12-30T09:20:08.874965image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-30T09:20:08.968853image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-30T09:20:09.063230image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

nameheightmasshair_colorskin_coloreye_colorbirth_yearsexgenderhomeworldspeciesfilmsvehiclesstarships
0Luke Skywalker172.077.0blondfairblue19.0malemasculineTatooineHumanA New Hope, The Empire Strikes Back, Return of the Jedi, Revenge of the Sith, The Force AwakensSnowspeeder, Imperial Speeder BikeX-wing, Imperial shuttle
1C-3PO167.075.0NaNgoldyellow112.0nonemasculineTatooineDroidA New Hope, The Empire Strikes Back, Return of the Jedi, The Phantom Menace, Attack of the Clones, Revenge of the SithNaNNaN
2R2-D296.032.0NaNwhite, bluered33.0nonemasculineNabooDroidA New Hope, The Empire Strikes Back, Return of the Jedi, The Phantom Menace, Attack of the Clones, Revenge of the Sith, The Force AwakensNaNNaN
3Darth Vader202.0136.0nonewhiteyellow41.9malemasculineTatooineHumanA New Hope, The Empire Strikes Back, Return of the Jedi, Revenge of the SithNaNTIE Advanced x1
4Leia Organa150.049.0brownlightbrown19.0femalefeminineAlderaanHumanA New Hope, The Empire Strikes Back, Return of the Jedi, Revenge of the Sith, The Force AwakensImperial Speeder BikeNaN
5Owen Lars178.0120.0brown, greylightblue52.0malemasculineTatooineHumanA New Hope, Attack of the Clones, Revenge of the SithNaNNaN
6Beru Whitesun Lars165.075.0brownlightblue47.0femalefeminineTatooineHumanA New Hope, Attack of the Clones, Revenge of the SithNaNNaN
7R5-D497.032.0NaNwhite, redredNaNnonemasculineTatooineDroidA New HopeNaNNaN
8Biggs Darklighter183.084.0blacklightbrown24.0malemasculineTatooineHumanA New HopeNaNX-wing
9Obi-Wan Kenobi182.077.0auburn, whitefairblue-gray57.0malemasculineStewjonHumanA New Hope, The Empire Strikes Back, Return of the Jedi, The Phantom Menace, Attack of the Clones, Revenge of the SithTribubble bongoJedi starfighter, Trade Federation cruiser, Naboo star skiff, Jedi Interceptor, Belbullab-22 starfighter
nameheightmasshair_colorskin_coloreye_colorbirth_yearsexgenderhomeworldspeciesfilmsvehiclesstarships
77Grievous216.0159.0nonebrown, whitegreen, yellowNaNmalemasculineKaleeKaleeshRevenge of the SithTsmeu-6 personal wheel bikeBelbullab-22 starfighter
78Tarfful234.0136.0brownbrownblueNaNmalemasculineKashyyykWookieeRevenge of the SithNaNNaN
79Raymus Antilles188.079.0brownlightbrownNaNmalemasculineAlderaanHumanA New Hope, Revenge of the SithNaNNaN
80Sly Moore178.048.0nonepalewhiteNaNNaNNaNUmbaraNaNAttack of the Clones, Revenge of the SithNaNNaN
81Tion Medon206.080.0nonegreyblackNaNmalemasculineUtapauPau'anRevenge of the SithNaNNaN
82FinnNaNNaNblackdarkdarkNaNmalemasculineNaNHumanThe Force AwakensNaNNaN
83ReyNaNNaNbrownlighthazelNaNfemalefeminineNaNHumanThe Force AwakensNaNNaN
84Poe DameronNaNNaNbrownlightbrownNaNmalemasculineNaNHumanThe Force AwakensNaNX-wing
85BB8NaNNaNnonenoneblackNaNnonemasculineNaNDroidThe Force AwakensNaNNaN
86Captain PhasmaNaNNaNnonenoneunknownNaNfemalefeminineNaNHumanThe Force AwakensNaNNaN