Learning Machine Learning (1) – Machine Learning Journal Club

This section of our website hosts a list of useful resources for getting started in Machine Learning or learning more about specific topics. When MLJC starts working on a new project new learning resources about the field are made available. This material is also intended to serve as a syllabus of the required skills for joining our group in the work. Motivated and enthusiast beginners are always welcome in MLJC.

Index

1. Get started
- Mathematics
- Python
- Julia
Basic courses in ML
2. Course “How to tackle a ML competition” (Italian)
- Deep learning (https://www.youtube.com/playlist?list=PLqYmG7hTraZCDxZ44o4p3N5Anz3lLRVZF)
3. Advanced topics:
- Natural Language Processing
- Medical AI
- Theoretical ML
- Scientific Machine Learning
- Signal Processing

Get started

Doing machine learning means writing a software that is able to train a mathematical model to solve a problem; nowadays higher and higher level libraries have been written for a large amount of programming languages, these makes able also for neophytes to run sophisticated models, although to go deeper is still required to learn some concepts of mathematical analysis, linear algebra and statistic.

Mathematics

Mathematical analysis
Neural networks are at the core of ML and are the basic brick to build-up a large spectrum of applications. It is not possible to the backpropagation algorithm, the error estimation and the basic concepts of statistic without confidence with derivatives and integrals. A large quantity of material to fully understand mathematical analysis is available, for example https://www.coursera.org/learn/introduction-to-calculus? (Available for free)

We suggest to have a look to the very interesting and clear videos on Youtube by 3Blue1Brown that help to visualize the main ideas of calculus. (The animations are generated by a Python library called Manim, written by the author of the videos https://github.com/3b1b/manim)

For a more rigorous approach we suggest some books, used in faculty of Physics and Engineering in University and Polytechnic of Turin:

- Mathematical Analysis I - Claudio Canuto, Anita Tabacco - Springer Verlag, 2008
- Mathematical Analysis II - Claudio Canuto, Anita Tabacco - Springer Verlag, 2008

Linear algebra and geometry

If calculus is fundamental to understand the operations at the core of the ML algorithms, the theory of vector, spaces and matrices is the language spoken by machine learning.

We suggest this very rigorous and complete textbook by Prof. Abbena, Fino, Gianella of University of Turin that can be find searching on Google (PDF publicly hosted on the website of University of Turin)

- Geometria e Algebra Lineare IE. Abbena, A.M. Fino, G.M. Gianella - Aracne

Statistics

Can you describe what doing machine learning means in only one sentence? Probably not, but the nearest sentence could be “doing huge fits”: fitting training data with a complex, multivariable, non-linear, function and trying to extrapolate from it a prediction. Fitting data is a not trivial process that is studied by statistic, therefore is fundamental to have some skills in this subject to approach machine learning effectively. This article explore some interesting books to learn statistics: https://towardsdatascience.com/5-free-books-to-learn-statistics-for-data-science-768d27b8215

It is not necessary to become statisticians to do machine learning: knowing the theory of distributions, the error estimation, the data interpolation tecniques and the basic statistic tests could be enought!

Coding

Theory is important, but our association is born exacly to encourage students to start applying to real problem the acquired knowledge. Furthermore, writing a piece of code to solve a problem can be an excellent test to understand the comprehension of the theory and it’s very motivating! So let’s start writing code!
The main language to do Machine Learning is Python! It’s really easy to learn, it’s an high level language without strong typization, with a very user-friendly syntax. One of the best environment to start coding in a second without installing nothind is colab.research.google.com/.

To learn the language the internet is plenty of well-written guides. We suggest the introductory course by Prof. E. Maina of University of Turin(http://personalpages.to.infn.it/~maina/didattica/TIF_2020/).

To start doing machine learning we must introduce some basic libraries for working with numbers, statistic, structured data and mathematical methods. In particular we need:

Numpy (https://numpy.org/learn/)
Scipy (https://www.scipy.org/getting-started.html)
Sympy (https://docs.sympy.org/latest/tutorial/index.html)
Matplotlib (https://matplotlib.org/2.0.2/users/pyplot_tutorial.html)
Pandas (https://www.kaggle.com/learn/pandas)

You can start from these articles written by Simone Azeglio (MLJC): https://medium.com/mljcunito/scientific-programming-chapter-3-numpy-scipy-and-matplotlib-8e215b4ffe99

https://medium.com/mljcunito/scientific-programming-chapter-2-1-kung-fu-pandas-a6e715e0753d

A complete course by MLJC:

Dark Mode

Lectures (this link opens in a new window) by MachineLearningJournalClub (this link opens in a new window)

Lectures on Python

Other resources:

https://github.com/jakevdp/PythonDataScienceHandbook

https://jakevdp.github.io/PythonDataScienceHandbook/

We would like to underline that is impossible to rembember all the commands, the parameters and the functions of the libraries: is more important being aware of the available tools and being able to find the right command quickly on the documentation!

Another language that should be mentioned is Julia: it is less diffused, younger and less supported than Python, but offers very specific advantages in performances and some exceptional tools for scientific computing. If you are intrigued, please see the “Scientific Machine Learning” section below.

Basic courses

Now all the tools are ready and we can start Learning Machine Learning!
A very well-prepared introductive course is the following, by prof. Andrew Ng:

https://www.coursera.org/learn/machine-learning

Assignments can be found here: https://github.com/dibgerge/ml-coursera-python-assignments

We suggest also for this section some very fascinating videos by 3Blue1Brown to visualize neural networks using advanced animations:

It is not necessary to reinvent the wheel!

It is important to be aware of the algorithms and of the most important supervised and unsupervised architectures, but you won’t be asked to reimplement the models for your application!

The following list presents the most important libraries (Python-based) for deploying applications:

Sci-Kit Learn (https://scikit-learn.org/stable/tutorial/index.html)
TensorFlow
Keras
PyTorch

You can explore these articles, extracted from some lectures held at the Physics Department of University of Turin by MLJC

https://medium.com/mljcunito/tagged/tensorflow

While we update this portal you can keep on learning on our GitHub: discover how to tackle a ML competition here.