Intro
I am a data scientist / machine learning engineer working for Ekimetrics in Paris, France.
I lead complex data projects on a wide range of subjects such as NLP, churn and computer
vision.
Previously, I was working at Shell Energy in Coventry, United Kingdom, for over 3 years. As
data scientist there, I worked in cross-functional teams and learned about building NLP
and churn models, deploying them in data pipelines using CI/CD and MLOps. Since then,
I've been particularly interested in applying data engineering tools and techniques to
make model predictions available and create / adapt dev ops principles to machine learning
based projects.
Originally from Paris, I went to study in Nantes to get a master's degree in
engineering from Ecole Centrale de Nantes and another in business management from Audencia
Business School. I then moved to the United States for a semester in Boston University,
before joining Shell Energy in September 2017.
My data science journey began when I interned in Telecom ParisTech in 2016. I was working
on creating an experiment to record EEG signals of people listening to music to retrieve the
instrument they were focusing on (a paper was published!).
At Shell Energy, my initial focus was Natural Language Processing, mainly sentiment
analysis and text classification, often using deep learning (for which I have a keen interest).
However, my projects vary in topic and recently my main focus has been binary classification
in a problem where feature engineering is key.
I am also a Cloud enthusiast. To understand
it better and to be able to use cloud services, I have started learning AWS services
a little over a year ago. This website is actually hosted on S3!
On a more personal note, I love music and I have been practicing the guitar for around
20 years. I particularly enjoy listening to progressive rock styles and jazz, playing
in bands (in rehearsed settings or improvised jams), and writing music (I've written an
album, of which two tracks were recorded!).
Close
Work
Github
Link to my GitHub account
Projects on my github include:
- Kaggle's titanic dataset with Spark on databricks
(link)
- Catboost test: how it performs and model inspection intefrations (e.g. SHAP)
(link)
- Quarantine popups, a Kivy (python) app that reminds me to exercise when I work from home.
(link)
- This website, for which every push to master triggers a build and pushes it
to s3.
(link)
- Code for a text classifier using deep learning and Keras.
(link)
- treegoat: helper functions to help with kaggle competitions or personal projects.
Currently contains an NLP helper class that when inherited allows to build custom
tensorflow/keras models while handling the tokenizing and label formatting.
(link)

Link to my Kaggle profile
Projects on Kaggle:
- Image classification for gender detection using tensorflow v2 trained on >200,000 images (GPU), 96% accuracy.
(link)
- Catboost notebook on the titanic dataset (link)
- spooky author identification (using fasttext, no notebook available)
Publications
Workshop on Speech, Music and Mind 2019:
MAD-EEG: an EEG dataset for decoding auditory attention to a target instrument in polyphonic music.
Close
About
Linked In
https://www.linkedin.com/in/gabriel-trégoat/
Technology stack
General
|
|
 |
Python has been my main programming language since 2016. I use it for creating
machine learning models, data engineering and more general applications (creating
web apps, UIs, dashboards...). Pluralsight score for core language: 240 (265
for data analysis).
|
Pycharm |
Pycharm is my IDE of choice. I use it for python as well as for SQL, shell scripts,
and web development.
|
Bash & Linux |
I use linux on a daily basis and write shell scripts to automate tasks and
orchestrate jobs.
|
Office tools |
Microsoft office suite, Google business suite, Lyx (LaTeX editor)
|
Machine learning toolkit
|
|
 |
Scikit learn is one of the main libraries that I use, and I always write my code
in accordance with the scikit learn API.
|
Algorithms with sklearn |
I use algorithms from various libraries in conjunction with scikit learn (e.g. in a pipeline). These
include XGBoost and Catboost (and I have a catboost test on my kaggle / git account!)
|
 Deep learning |
I use keras and tensorflow for building deep learning models. I have worked mainly
on NLP projects (sentiment analysis, text classification, topic modelling).
|
Cloud tools |
I use AWS Sagemaker and its built in algorithms (e.g. DeepAR), as well as Databricks.
|
Explainability |
My tool of choice for explaining models is SHAP: it helps me explore how a variable influences
a model's output, how important the variables are, and gives an idea of what variables
explain a prediction and how they influenced it. Other good tools: LIME, ELI5.
|
Data engineering toolkit
|
|
 |
When transforming data, my main tool for heavy lifting is Hadoop, that I use mainly
through Impala and Hive SQL languages. I also use web HDFS in custom python libraries
for optimised data manipulation and loading outside of a hadoop cluster.
|
 |
Pandas is my main data manipulation tool when using python, usually when the heavy lifting
has been done in another parallel processing optimised tool (e.g. SQL).
|
 |
I occasionally use spark instead of the combination impala SQL + pandas. If I'm
on a machine learning task, I usually end up using toPandas().
|
 |
I mainly use serverless services such as s3 and lambda. For example, this website uses no server
that I maintain, gets automatically published when I push to github thanks to CodePipeline, which
also invalidates the Cloudfront cache. I also am pursuing the cloud practitioner certification.
|
 |
I have basic docker knowledge that allows me to build an app once to run everywhere without
fighting with python for dependencies or versions.
|
 |
I use joblib as my main parallel compute tool when coding in python. I also have an interest in Dask.
|
Data Viz'
|
|
Tableau |
I use Tableau for creating dashboards and stories to deliver insights to business stakeholders.
|
 |
I use Dash when I need custom data calculations or interactions that I could not get using Tableau.
|
Education
Master: Data processing applied to imagery, biomedical and audio
After learning the basics of various engineering topics, I specialised in data processing,
mainly out of interest for audio at the time (I am a musician, and I was sound engineer
at a few concerts). This computer science speciality was heavy in Matlab and
included machine learning courses, which led me to pursue an internship
then a career in this field.
Master: business management
I studied business in parallel to my engineering education. After general business courses
on a number of topics such as marketing and finance, I majored in management control. This helped
me understand how operations are run, how all parts of a company tie together and
how I can help with my technical skills.
Business management semester
I went to Boston University (Massachusetts, USA) to improve my business knowledge in
several fields I was interested in and see how things are done abroad. Among other topics, I
followed a course on startup finance. There, I learned to draft business plans to evaluate
the benefits of a project, perfecting my financial piloting skills acquired at Audencia.
I also studied project management techniques, of which I now follow the Agile methodology.
Close
Contact
Close
Elements
Text
This is bold and this is strong. This is italic and this is emphasized.
This is superscript text and this is subscript text.
This is underlined and this is code: for (;;) { ... }
. Finally, this is a link.
Heading Level 2
Heading Level 3
Heading Level 4
Heading Level 5
Heading Level 6
Blockquote
Preformatted
i = 0;
while (!deck.isInOrder()) {
print 'Iteration ' + i;
deck.shuffle();
i++;
}
print 'It took ' + i + ' iterations to sort the deck.';
Lists
Unordered
- Dolor pulvinar etiam.
- Sagittis adipiscing.
- Felis enim feugiat.
Alternate
- Dolor pulvinar etiam.
- Sagittis adipiscing.
- Felis enim feugiat.
Ordered
- Dolor pulvinar etiam.
- Etiam vel felis viverra.
- Felis enim feugiat.
- Dolor pulvinar etiam.
- Etiam vel felis lorem.
- Felis enim et feugiat.
Icons
Actions
Table
Default
Name |
Description |
Price |
Item One |
Ante turpis integer aliquet porttitor. |
29.99 |
Item Two |
Vis ac commodo adipiscing arcu aliquet. |
19.99 |
Item Three |
Morbi faucibus arcu accumsan lorem. |
29.99 |
Item Four |
Vitae integer tempus condimentum. |
19.99 |
Item Five |
Ante turpis integer aliquet porttitor. |
29.99 |
|
100.00 |
Alternate
Name |
Description |
Price |
Item One |
Ante turpis integer aliquet porttitor. |
29.99 |
Item Two |
Vis ac commodo adipiscing arcu aliquet. |
19.99 |
Item Three |
Morbi faucibus arcu accumsan lorem. |
29.99 |
Item Four |
Vitae integer tempus condimentum. |
19.99 |
Item Five |
Ante turpis integer aliquet porttitor. |
29.99 |
|
100.00 |
Close