I recently faced an efficiency challenge in my data analysis pipeline: Python and R were too slow, even with parallelization. To address this, I began learning Julia.
Surprisingly, I discovered a method to integrate all major data analysis languages in a single notebook. Now we can seamlessly combine machine learning (in Python), fast data preparation (using Julia), and result visualization with ggplot2 (supported in R).
Prerequisites
Python, R, Julia installed
Jupyter Notebook installed
Install Julia kernel:
Enter Julia REPL
For example, we want to add a Julia kernel in Jupyter using multi-threads: installkernel("Julia (4 threads)", env=Dict("JULIA_NUM_THREADS"=>"4"))
[32m[1m Resolving[22m[39m package versions...
[32m[1m No Changes[22m[39m to `~/.julia/environments/v1.11/Project.toml`
[32m[1m No Changes[22m[39m to `~/.julia/environments/v1.11/Manifest.toml`
[32m[1m Resolving[22m[39m package versions...
[32m[1m No Changes[22m[39m to `~/.julia/environments/v1.11/Project.toml`
[32m[1m No Changes[22m[39m to `~/.julia/environments/v1.11/Manifest.toml`
R"""library(ggplot2)library(dplyr)data=head(mtcars, 30)# 1/ add text with geom_text, use nudge to nudge the textggplot(data, aes(x=wt, y=mpg))+geom_point()+# Show dotsgeom_text( label=rownames(data), nudge_x =0.25, nudge_y =0.25, check_overlap = T )"""
[33m[1m┌ [22m[39m[33m[1mWarning: [22m[39mRCall.jl:
[33m[1m│ [22m[39mAttaching package: ‘dplyr’
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mThe following objects are masked from ‘package:stats’:
[33m[1m│ [22m[39m
[33m[1m│ [22m[39m filter, lag
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mThe following objects are masked from ‘package:base’:
[33m[1m│ [22m[39m
[33m[1m│ [22m[39m intersect, setdiff, setequal, union
[33m[1m│ [22m[39m
[33m[1m└ [22m[39m[90m@ RCall ~/.julia/packages/RCall/0ggIQ/src/io.jl:172[39m