Overview

Dataset statistics

Number of variables16
Number of observations16650
Missing cells0
Missing cells (%)0.0%
Duplicate rows3746
Duplicate rows (%)22.5%
Total size in memory1.9 MiB
Average record size in memory121.0 B

Variable types

Categorical9
Boolean3
Numeric4

Alerts

possui_celular has constant value "1" Constant
Dataset has 3746 (22.5%) duplicate rowsDuplicates
qtd_filhos is highly correlated with qt_pessoas_residenciaHigh correlation
qt_pessoas_residencia is highly correlated with qtd_filhosHigh correlation
qtd_filhos is highly correlated with qt_pessoas_residenciaHigh correlation
idade is highly correlated with tempo_empregoHigh correlation
tempo_emprego is highly correlated with idadeHigh correlation
qt_pessoas_residencia is highly correlated with qtd_filhosHigh correlation
qtd_filhos is highly correlated with qt_pessoas_residenciaHigh correlation
qt_pessoas_residencia is highly correlated with qtd_filhosHigh correlation
possui_celular is highly correlated with educacao and 10 other fieldsHigh correlation
educacao is highly correlated with possui_celularHigh correlation
mau is highly correlated with possui_celularHigh correlation
posse_de_imovel is highly correlated with possui_celularHigh correlation
possui_fone_comercial is highly correlated with possui_celularHigh correlation
estado_civil is highly correlated with possui_celularHigh correlation
possui_email is highly correlated with possui_celularHigh correlation
sexo is highly correlated with possui_celularHigh correlation
posse_de_veiculo is highly correlated with possui_celularHigh correlation
tipo_residencia is highly correlated with possui_celularHigh correlation
possui_fone is highly correlated with possui_celularHigh correlation
tipo_renda is highly correlated with possui_celularHigh correlation
sexo is highly correlated with posse_de_veiculoHigh correlation
posse_de_veiculo is highly correlated with sexoHigh correlation
qtd_filhos is highly correlated with qt_pessoas_residenciaHigh correlation
tipo_renda is highly correlated with idadeHigh correlation
idade is highly correlated with tipo_rendaHigh correlation
qt_pessoas_residencia is highly correlated with qtd_filhosHigh correlation
qtd_filhos has 11486 (69.0%) zeros Zeros

Reproduction

Analysis started2021-10-01 01:39:24.177542
Analysis finished2021-10-01 01:39:32.941231
Duration8.76 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

sexo
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size130.2 KiB
F
11201 
M
5449 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowF
4th rowM
5th rowF

Common Values

ValueCountFrequency (%)
F11201
67.3%
M5449
32.7%

Length

2021-09-30T22:39:33.036163image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-30T22:39:33.126130image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
f11201
67.3%
m5449
32.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

posse_de_veiculo
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.4 KiB
False
10178 
True
6472 
ValueCountFrequency (%)
False10178
61.1%
True6472
38.9%
2021-09-30T22:39:33.177097image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

posse_de_imovel
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.4 KiB
True
11176 
False
5474 
ValueCountFrequency (%)
True11176
67.1%
False5474
32.9%
2021-09-30T22:39:33.229069image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

qtd_filhos
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4331531532
Minimum0
Maximum14
Zeros11486
Zeros (%)69.0%
Negative0
Negative (%)0.0%
Memory size130.2 KiB
2021-09-30T22:39:33.314004image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum14
Range14
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7393953444
Coefficient of variation (CV)1.707006723
Kurtosis16.00543616
Mean0.4331531532
Median Absolute Deviation (MAD)0
Skewness2.331772945
Sum7212
Variance0.5467054754
MonotonicityNot monotonic
2021-09-30T22:39:33.447927image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
011486
69.0%
13393
 
20.4%
21552
 
9.3%
3189
 
1.1%
424
 
0.1%
142
 
< 0.1%
72
 
< 0.1%
52
 
< 0.1%
ValueCountFrequency (%)
011486
69.0%
13393
 
20.4%
21552
 
9.3%
3189
 
1.1%
424
 
0.1%
52
 
< 0.1%
72
 
< 0.1%
142
 
< 0.1%
ValueCountFrequency (%)
142
 
< 0.1%
72
 
< 0.1%
52
 
< 0.1%
424
 
0.1%
3189
 
1.1%
21552
 
9.3%
13393
 
20.4%
011486
69.0%

tipo_renda
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size130.2 KiB
Working
8565 
Commercial associate
3826 
Pensioner
2800 
State servant
1451 
Student
 
8

Length

Max length20
Median length7
Mean length10.84648649
Min length7

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWorking
2nd rowCommercial associate
3rd rowCommercial associate
4th rowWorking
5th rowWorking

Common Values

ValueCountFrequency (%)
Working8565
51.4%
Commercial associate3826
23.0%
Pensioner2800
 
16.8%
State servant1451
 
8.7%
Student8
 
< 0.1%

Length

2021-09-30T22:39:33.594858image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-30T22:39:33.689807image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
working8565
39.1%
associate3826
17.4%
commercial3826
17.4%
pensioner2800
 
12.8%
servant1451
 
6.6%
state1451
 
6.6%
student8
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

educacao
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size130.2 KiB
Secondary / secondary special
11245 
Higher education
4551 
Incomplete higher
 
649
Lower secondary
 
188
Academic degree
 
17

Length

Max length29
Median length29
Mean length24.80654655
Min length15

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSecondary / secondary special
2nd rowSecondary / secondary special
3rd rowSecondary / secondary special
4th rowHigher education
5th rowIncomplete higher

Common Values

ValueCountFrequency (%)
Secondary / secondary special11245
67.5%
Higher education4551
27.3%
Incomplete higher649
 
3.9%
Lower secondary188
 
1.1%
Academic degree17
 
0.1%

Length

2021-09-30T22:39:33.829723image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-30T22:39:33.923674image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
secondary22678
40.6%
special11245
20.2%
11245
20.2%
higher5200
 
9.3%
education4551
 
8.2%
incomplete649
 
1.2%
lower188
 
0.3%
degree17
 
< 0.1%
academic17
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

estado_civil
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size130.2 KiB
Married
11680 
Single / not married
2035 
Civil marriage
1283 
Separated
 
945
Widow
 
707

Length

Max length20
Median length7
Mean length9.156876877
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMarried
2nd rowSingle / not married
3rd rowSingle / not married
4th rowMarried
5th rowMarried

Common Values

ValueCountFrequency (%)
Married11680
70.2%
Single / not married2035
 
12.2%
Civil marriage1283
 
7.7%
Separated945
 
5.7%
Widow707
 
4.2%

Length

2021-09-30T22:39:34.071584image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-30T22:39:34.171531image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
married13715
57.1%
not2035
 
8.5%
2035
 
8.5%
single2035
 
8.5%
marriage1283
 
5.3%
civil1283
 
5.3%
separated945
 
3.9%
widow707
 
2.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

tipo_residencia
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size130.2 KiB
House / apartment
14974 
With parents
 
738
Municipal apartment
 
520
Rented apartment
 
227
Office apartment
 
120

Length

Max length19
Median length17
Mean length16.81147147
Min length12

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHouse / apartment
2nd rowHouse / apartment
3rd rowHouse / apartment
4th rowHouse / apartment
5th rowHouse / apartment

Common Values

ValueCountFrequency (%)
House / apartment14974
89.9%
With parents738
 
4.4%
Municipal apartment520
 
3.1%
Rented apartment227
 
1.4%
Office apartment120
 
0.7%
Co-op apartment71
 
0.4%

Length

2021-09-30T22:39:34.346412image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-30T22:39:34.454365image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
apartment15912
33.0%
14974
31.0%
house14974
31.0%
parents738
 
1.5%
with738
 
1.5%
municipal520
 
1.1%
rented227
 
0.5%
office120
 
0.2%
co-op71
 
0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

idade
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct5298
Distinct (%)31.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44.31951277
Minimum22.03013699
Maximum68.90958904
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size130.2 KiB
2021-09-30T22:39:34.616277image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum22.03013699
5-th percentile27.69315068
Q134.8739726
median43.49315068
Q353.4109589
95-th percentile63.14780822
Maximum68.90958904
Range46.87945205
Interquartile range (IQR)18.5369863

Descriptive statistics

Standard deviation11.2288368
Coefficient of variation (CV)0.2533610163
Kurtosis-1.032189251
Mean44.31951277
Median Absolute Deviation (MAD)9.250684932
Skewness0.1792052356
Sum737919.8877
Variance126.0867759
MonotonicityNot monotonic
2021-09-30T22:39:34.779179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34.7287671222
 
0.1%
40.1835616422
 
0.1%
41.4794520521
 
0.1%
45.9397260320
 
0.1%
37.0410958920
 
0.1%
57.8794520519
 
0.1%
42.9452054818
 
0.1%
34.218
 
0.1%
27.8821917817
 
0.1%
43.3945205517
 
0.1%
Other values (5288)16456
98.8%
ValueCountFrequency (%)
22.030136991
 
< 0.1%
22.071232881
 
< 0.1%
22.221917811
 
< 0.1%
22.416438361
 
< 0.1%
22.569863012
< 0.1%
22.668493151
 
< 0.1%
22.868493153
< 0.1%
22.884931511
 
< 0.1%
23.002739733
< 0.1%
23.128767122
< 0.1%
ValueCountFrequency (%)
68.909589042
 
< 0.1%
68.764383561
 
< 0.1%
68.520547951
 
< 0.1%
68.41095891
 
< 0.1%
68.345205482
 
< 0.1%
68.306849321
 
< 0.1%
68.257534252
 
< 0.1%
68.06301372
 
< 0.1%
68.030136991
 
< 0.1%
68.002739736
< 0.1%

tempo_emprego
Real number (ℝ)

HIGH CORRELATION

Distinct3005
Distinct (%)18.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-161.4164463
Minimum-1000.665753
Maximum42.90684932
Zeros0
Zeros (%)0.0%
Negative2793
Negative (%)16.8%
Memory size130.2 KiB
2021-09-30T22:39:34.944085image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-1000.665753
5-th percentile-1000.665753
Q11.183561644
median4.691780822
Q39.088356164
95-th percentile20.19452055
Maximum42.90684932
Range1043.572603
Interquartile range (IQR)7.904794521

Descriptive statistics

Standard deviation376.8439122
Coefficient of variation (CV)-2.334606671
Kurtosis1.16175938
Mean-161.4164463
Median Absolute Deviation (MAD)3.831506849
Skewness-1.777557843
Sum-2687583.83
Variance142011.3342
MonotonicityNot monotonic
2021-09-30T22:39:35.106992image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1000.6657532793
 
16.8%
4.21643835641
 
0.2%
4.79726027429
 
0.2%
6.93424657528
 
0.2%
5.21643835628
 
0.2%
0.547945205528
 
0.2%
0.29589041126
 
0.2%
1.26027397326
 
0.2%
3.93424657526
 
0.2%
3.60273972625
 
0.2%
Other values (2995)13600
81.7%
ValueCountFrequency (%)
-1000.6657532793
16.8%
0.11780821921
 
< 0.1%
0.17808219181
 
< 0.1%
0.19178082191
 
< 0.1%
0.29
 
0.1%
0.21643835621
 
< 0.1%
0.24109589041
 
< 0.1%
0.24383561644
 
< 0.1%
0.24931506853
 
< 0.1%
0.25205479452
 
< 0.1%
ValueCountFrequency (%)
42.906849322
 
< 0.1%
41.212
0.1%
40.786301373
 
< 0.1%
40.575342476
< 0.1%
40.479452051
 
< 0.1%
39.824657534
 
< 0.1%
39.652054794
 
< 0.1%
39.487671233
 
< 0.1%
39.282191781
 
< 0.1%
38.704109591
 
< 0.1%

possui_celular
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size130.2 KiB
1
16650 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
116650
100.0%

Length

2021-09-30T22:39:35.287873image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-30T22:39:35.383833image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
116650
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

possui_fone_comercial
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size130.2 KiB
0
12900 
1
3750 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
012900
77.5%
13750
 
22.5%

Length

2021-09-30T22:39:35.467770image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-30T22:39:35.560732image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
012900
77.5%
13750
 
22.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

possui_fone
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size130.2 KiB
0
11727 
1
4923 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
011727
70.4%
14923
29.6%

Length

2021-09-30T22:39:35.659675image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-30T22:39:35.750623image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
011727
70.4%
14923
29.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

possui_email
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size130.2 KiB
0
15170 
1
 
1480

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
015170
91.1%
11480
 
8.9%

Length

2021-09-30T22:39:35.845568image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-30T22:39:35.935517image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
015170
91.1%
11480
 
8.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

qt_pessoas_residencia
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.211891892
Minimum1
Maximum15
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size130.2 KiB
2021-09-30T22:39:36.019459image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum15
Range14
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9037546698
Coefficient of variation (CV)0.4085889881
Kurtosis5.750133465
Mean2.211891892
Median Absolute Deviation (MAD)0
Skewness1.191339607
Sum36828
Variance0.8167725032
MonotonicityNot monotonic
2021-09-30T22:39:36.135402image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
29042
54.3%
13022
 
18.2%
32887
 
17.3%
41489
 
8.9%
5180
 
1.1%
625
 
0.2%
152
 
< 0.1%
92
 
< 0.1%
71
 
< 0.1%
ValueCountFrequency (%)
13022
 
18.2%
29042
54.3%
32887
 
17.3%
41489
 
8.9%
5180
 
1.1%
625
 
0.2%
71
 
< 0.1%
92
 
< 0.1%
152
 
< 0.1%
ValueCountFrequency (%)
152
 
< 0.1%
92
 
< 0.1%
71
 
< 0.1%
625
 
0.2%
5180
 
1.1%
41489
 
8.9%
32887
 
17.3%
29042
54.3%
13022
 
18.2%

mau
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.4 KiB
False
16260 
True
 
390
ValueCountFrequency (%)
False16260
97.7%
True390
 
2.3%
2021-09-30T22:39:36.230347image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Interactions

2021-09-30T22:39:30.924871image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:28.951001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:29.619636image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:30.219288image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:31.094792image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:29.121917image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:29.778526image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:30.376185image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:31.819539image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:29.299800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:29.924444image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:30.587064image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:32.005446image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:29.454731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:30.068361image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-30T22:39:30.739976image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-09-30T22:39:36.316303image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-09-30T22:39:36.591141image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-09-30T22:39:36.832007image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-09-30T22:39:37.082863image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-09-30T22:39:37.337717image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-09-30T22:39:32.277295image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-09-30T22:39:32.739026image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

sexoposse_de_veiculoposse_de_imovelqtd_filhostipo_rendaeducacaoestado_civiltipo_residenciaidadetempo_empregopossui_celularpossui_fone_comercialpossui_fonepossui_emailqt_pessoas_residenciamau
0MYY0WorkingSecondary / secondary specialMarriedHouse / apartment58.8328773.10684910002.0False
1FNY0Commercial associateSecondary / secondary specialSingle / not marriedHouse / apartment52.3561648.35890410111.0False
2FNY0Commercial associateSecondary / secondary specialSingle / not marriedHouse / apartment52.3561648.35890410111.0False
3MYY0WorkingHigher educationMarriedHouse / apartment46.2246582.10684911112.0False
4FYN0WorkingIncomplete higherMarriedHouse / apartment29.2301373.02191810002.0False
5FYN0WorkingIncomplete higherMarriedHouse / apartment29.2301373.02191810002.0False
6FNY0WorkingSecondary / secondary specialMarriedHouse / apartment27.4821924.02465810102.0False
7FNY0WorkingSecondary / secondary specialMarriedHouse / apartment27.4821924.02465810102.0False
8FNY1WorkingSecondary / secondary specialSingle / not marriedHouse / apartment30.0493154.43835610002.0False
9FNY1WorkingSecondary / secondary specialSingle / not marriedHouse / apartment30.0493154.43835610002.0False

Last rows

sexoposse_de_veiculoposse_de_imovelqtd_filhostipo_rendaeducacaoestado_civiltipo_residenciaidadetempo_empregopossui_celularpossui_fone_comercialpossui_fonepossui_emailqt_pessoas_residenciamau
16640MNN1WorkingSecondary / secondary specialMarriedMunicipal apartment35.4684936.62465810013.0True
16641MYN0WorkingIncomplete higherMarriedWith parents24.9972602.63013711002.0True
16642MYN0WorkingIncomplete higherMarriedWith parents24.9972602.63013711002.0True
16643FNY0PensionerSecondary / secondary specialMarriedHouse / apartment60.304110-1000.66575310002.0True
16644FNY1WorkingSecondary / secondary specialSingle / not marriedHouse / apartment34.8575343.10137011101.0True
16645FNY0WorkingSecondary / secondary specialCivil marriageHouse / apartment54.1095899.88493210002.0True
16646FNY0Commercial associateSecondary / secondary specialMarriedHouse / apartment43.3890417.38082211102.0True
16647MYY0WorkingSecondary / secondary specialMarriedHouse / apartment30.0054799.80000011002.0True
16648MYY0WorkingSecondary / secondary specialMarriedHouse / apartment30.0054799.80000011002.0True
16649FNY0PensionerHigher educationMarriedHouse / apartment33.9369863.63013710112.0True

Duplicate rows

Most frequently occurring

sexoposse_de_veiculoposse_de_imovelqtd_filhostipo_rendaeducacaoestado_civiltipo_residenciaidadetempo_empregopossui_celularpossui_fone_comercialpossui_fonepossui_emailqt_pessoas_residenciamau# duplicates
1304FNY0WorkingSecondary / secondary specialMarriedHouse / apartment37.04109615.03561610002.0False20
2652MNN2WorkingHigher educationCivil marriageHouse / apartment45.9397268.46027411004.0False20
1048FNY0PensionerSecondary / secondary specialWidowHouse / apartment57.879452-1000.66575310001.0False19
2298FYY0WorkingSecondary / secondary specialMarriedHouse / apartment46.56164419.90137010002.0False16
2691MNY0Commercial associateSecondary / secondary specialMarriedHouse / apartment34.20000011.06575310002.0False16
3699MYY2WorkingHigher educationMarriedHouse / apartment36.0931516.78630110004.0False16
687FNY0Commercial associateSecondary / secondary specialMarriedHouse / apartment53.3753427.88493210002.0False14
731FNY0Commercial associateSecondary / secondary specialSingle / not marriedRented apartment42.5178088.86027410001.0False14
1974FYN0WorkingSecondary / secondary specialMarriedHouse / apartment53.26301411.13424710002.0False14
972FNY0PensionerSecondary / secondary specialMarriedHouse / apartment65.123288-1000.66575310002.0False13