Profile photo

James Barthélémy

Machine learning engineer

Ennery, France

Passionate about Data science, after 25 years of experience in finance, IT and management, I just graduated as Machine Learning engineer. Today, I want to put my skills at the service of a new challenge in this field.

.NET SQL Oracle MySQL Data Science Machine learning Deep learning Scikit-learn TensorFlow Keras Python PHP Javascript HTML5 CSS3 Git

Projects

Project photo

Data analysis

Data exploration and cleaning

In this project, I am going to walk you through the end-to-end data analysis process with Python and the OpenFoodFacts Dataset.

The goals are to:

  1. 1) Process the dataset, by identifying relevant variables for future processing
  2. 2) Clean data by highlighting any missing values and identifying and quantifying possible outliers for each variable
  3. 3) Automate these treatments to avoid repeating these operations
  4. 4) Throughout the analysis, produce visualizations to better understand the data
  5. 5) Perform a univariate analysis for each variable of interest, in order to synthesize its behavior
  6. 6) Confirm or refute hypotheses using multivariate analysis
  7. 7) Perform the appropriate statistical tests to verify the significance of the results

Numpy Pandas Seaborn Matplotlib Scipy
Project photo

Modelling

Anticipate building consumption needs

As global warming due to human activities is now recognized as proven by most of us, several major cities around the world are now trying to act to reduce their impact.

This project will focus on the city of Seattle that target to be a carbon-neutral city by 2050. Thanks to meticulous surveys carried out by city officials in 2016, I will first have a close look at the consumption and emissions of non-residential buildings. Then I will test different regression models to predict them. To go further, I will also evaluate the interest of the "ENERGY STAR Score" for the forecast of emissions.

Pandas Data analysis Data visualization Scikit-Learn Modelling Pipeline Permutation importance
Project photo

Clustering

Segment the customers of an e-commerce site

"A good level of customer knowledge allows the company to better know those who contribute to its commercial prosperity, in particular information on their profiles, their needs, their centers of interest and their expectations."

During this this project I will use the KMeans Clustering technique to provide actionable customer segments to an e-commerce site. Then I will proceed to the evaluation of the maintenance frequency based on an analysis of the stability of the segments over time.

pandasql Principal component analysis KMeans SSE Silhouette score K elbow Adjusted rand score
Project photo

Natural language processing

Automatically categorize questions

Stack Overflow is a website offering questions and answers on a wide range of computer programming topics. A member wishing to ask a question must use the dedicated form in which he must fill in a title, a question and 1 to 5 tags in order to categorize it. For experienced users this is not a problem, but for new users it would be a good idea to suggest some tags related to the question asked.

The proposed solution consists of setting up a tag suggestion tool. It will be based on the title and content of the question. After a first pre-processing, a machine learning model will propose a series of tags depending on the content of the question asked.

Stopwords Tokenization Lemmatization Latent Dirichlet Allocation Bag of words TF-Idf Word Embedding Universal sentence encoder
Project photo

Computer vision

Classify images using deep learning algorithms

Most computer vision algorithms use a convolution neural network, or CNN. Like basic feedforward neural networks, CNNs learn from inputs, adjusting their parameters to make a prediction. However, what makes CNNs special is their ability to extract features from images.

In this project which aims to classify dog ​​images according to the dog's breed, I will first implement my own CNN inspired by the famous InceptionV1 model. Next, I'll demonstrate how transfer learning outperforms this baseline using other popular pre-trained models.

TensorFlow Keras CNN ImageDataGenerator Multiclass classification Data augmentation
Project photo

Time series

Long Short-Term Memory Neural Network for Financial Time Series

Nowadays, the emergence of Machine Learning and more recently Deep Learning has brought a new dimension to the theory of time series. Indeed, Deep learning methods offer many promises for time series forecasting, such as the automatic learning of time dependence and automatic management of time structures (trends and seasonality).

In this project, implementing the attached arXiv paper, I will present to you the different avenues explored concerning the development of a prototype using a recurrent neural network LSTM for the prediction of the evolution of stock prices.

TimeSeriesSplit LSTM Seasonality Bayesian optimization Walk forward cross-validation Voting ensemble
Project photo

Audio classification

BirdCLEF 2022

Birds are all around us, and just by listening to them we can learn a lot about our surroundings. Ecologists use birds to understand food systems and the health of environments - for example, if there are more woodpeckers in a forest, it means that there is a lot of dead wood. On the other hand, because birds communicate and mark their territory with songs and calls, it is more effective to identify them based on audio.

This Kaggle competition proposes to use automatic audio classification to identify bird species by sound. More specifically, it involves developing a model capable of processing continuous audio data and then acoustically recognizing species. To do so, thanks to Melspectrogram transformation, I will first explore transfer learning and computer vision models. Next, I'll implement and test dedicated audio CNNs like VGGish and TRILL.

Multilabel classification Imbalanced data Transfer learning Melspectrogram TRILL VGGish Stacking