Chapter 3. Dataframe Transformation
using CSV, DataFrames, RDatasets, Statisticsdf = dataset("datasets", "iris")
first(df, 5)Row
SepalLength
SepalWidth
PetalLength
PetalWidth
Species
1. Select a Subset (review)
df[:, [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]]Row
SepalLength
SepalWidth
PetalLength
PetalWidth
Row
SepalLength
PetalLength
PetalWidth
Species
2. Split and Combine
By Row

Row
col_a
col_b
Row
col_a
col_b
Row
col_a
col_b
By Column

Row
col_a
col_b
col_c
col_d
Row
id
col_a
col_b
Row
id
col_a
col_b
Row
id
col_a
col_b
Row
id
col_a
col_b
3. Group by
Row
SepalLength
SepalWidth
PetalLength
PetalWidth
Species
Row
Species
SepalLength_mean
Row
Species
SepalLength_mean
SepalWidth_mean
4. Sorting
Row
Species
SepalLength_mean
SepalWidth_mean
Row
Species
SepalLength_mean
SepalWidth_mean
5. Transforming between long and wide tables
Row
SepalLength
SepalWidth
PetalLength
PetalWidth
Species
Row
SepalLength
SepalWidth
PetalLength
PetalWidth
Species
id
Row
Species
id
variable
value
Row
id
Species
SepalLength
SepalWidth
PetalLength
PetalWidth
Row
SepalLength
SepalWidth
PetalLength
PetalWidth
Species
id
6. Missing values
Row
id
col_a
col_b
Fina NA values
Row
id
col_a
col_b
Fill NA values
Row
id
col_a
col_b
Drop NA values
Row
id
col_a
col_b
Last updated