Chapter 2. Loading Data in Julia
1. Load common datasets
Firstly, we need to load some sample data, so we can install a common package for convenience:
5×5 DataFrame
Float64
Float64
Float64
Float64
Cat…
1
5.1
3.5
1.4
0.2
setosa
2
4.9
3.0
1.4
0.2
setosa
3
4.7
3.2
1.3
0.2
setosa
4
4.6
3.1
1.5
0.2
setosa
5
5.0
3.6
1.4
0.2
setosa
Here, we are using first()
to see the first several rows of the dataframe.
2. Load *.csv files locally
3×5 DataFrame
Float64
Float64
Float64
Float64
String15
1
5.1
3.5
1.4
0.2
setosa
2
4.9
3.0
1.4
0.2
setosa
3
4.7
3.2
1.3
0.2
setosa
3. Load datasets online
3×5 DataFrame
Float64
Float64
Float64
Float64
String15
1
5.1
3.5
1.4
0.2
setosa
2
4.9
3.0
1.4
0.2
setosa
3
4.7
3.2
1.3
0.2
setosa
4. Creating a data frame from scratch:
3×3 DataFrame
String
Int64
String
1
A
1
Rongxin
2
B
2
Rongxin
3
C
3
Rongxin
Selecting Data in Julia
1. Indexing a subset
We can select a subset using a pair of row-column indexes. For example, if we want to select the first row to the second row, with all columns, we can:
2×5 DataFrame
Float64
Float64
Float64
Float64
String15
1
5.1
3.5
1.4
0.2
setosa
2
4.9
3.0
1.4
0.2
setosa
2. Select by column names
150×2 DataFrame125 rows omitted
Float64
Float64
1
3.5
1.4
2
3.0
1.4
3
3.2
1.3
4
3.1
1.5
5
3.6
1.4
6
3.9
1.7
7
3.4
1.4
8
3.4
1.5
9
2.9
1.4
10
3.1
1.5
11
3.7
1.5
12
3.4
1.6
13
3.0
1.4
⋮
⋮
⋮
139
3.0
4.8
140
3.1
5.4
141
3.1
5.6
142
3.1
5.1
143
2.7
5.1
144
3.2
5.9
145
3.3
5.7
146
3.0
5.2
147
2.5
5.0
148
3.0
5.2
149
3.4
5.4
150
3.0
5.1
And the powerful part of it is, we can directly using regex to select columns!
For instance, if we only care about the columns ended with length
, we can:
150×2 DataFrame125 rows omitted
Float64
Float64
1
5.1
1.4
2
4.9
1.4
3
4.7
1.3
4
4.6
1.5
5
5.0
1.4
6
5.4
1.7
7
4.6
1.4
8
5.0
1.5
9
4.4
1.4
10
4.9
1.5
11
5.4
1.5
12
4.8
1.6
13
4.8
1.4
⋮
⋮
⋮
139
6.0
4.8
140
6.9
5.4
141
6.7
5.6
142
6.9
5.1
143
5.8
5.1
144
6.8
5.9
145
6.7
5.7
146
6.7
5.2
147
6.3
5.0
148
6.5
5.2
149
6.2
5.4
150
5.9
5.1
3. Conditional filtering
It's common in data analysis that we want to subset a dataframe according to a condition.
In this case, we can define a condition, e.g., find out the rows whose species
is virginica
, as the following lines:
50×5 DataFrame25 rows omitted
Float64
Float64
Float64
Float64
String15
1
6.3
3.3
6.0
2.5
virginica
2
5.8
2.7
5.1
1.9
virginica
3
7.1
3.0
5.9
2.1
virginica
4
6.3
2.9
5.6
1.8
virginica
5
6.5
3.0
5.8
2.2
virginica
6
7.6
3.0
6.6
2.1
virginica
7
4.9
2.5
4.5
1.7
virginica
8
7.3
2.9
6.3
1.8
virginica
9
6.7
2.5
5.8
1.8
virginica
10
7.2
3.6
6.1
2.5
virginica
11
6.5
3.2
5.1
2.0
virginica
12
6.4
2.7
5.3
1.9
virginica
13
6.8
3.0
5.5
2.1
virginica
⋮
⋮
⋮
⋮
⋮
⋮
39
6.0
3.0
4.8
1.8
virginica
40
6.9
3.1
5.4
2.1
virginica
41
6.7
3.1
5.6
2.4
virginica
42
6.9
3.1
5.1
2.3
virginica
43
5.8
2.7
5.1
1.9
virginica
44
6.8
3.2
5.9
2.3
virginica
45
6.7
3.3
5.7
2.5
virginica
46
6.7
3.0
5.2
2.3
virginica
47
6.3
2.5
5.0
1.9
virginica
48
6.5
3.0
5.2
2.0
virginica
49
6.2
3.4
5.4
2.3
virginica
50
5.9
3.0
5.1
1.8
virginica
Now, you know how to load and select dataframes upon your interests, it's time to know how to transform your data and calculate your variables
Last updated