Gabriel Tregoat

Welcome! I am a French data scientist working in Paris, France.
On this website, I talk about myself, the technologies I enjoy using and present my work. To contact me, please use the contact form!

I wish you a good visit and a lovely day,

Gabriel

  • Intro
  • Work
  • About
  • Contact

Intro

I am a data scientist / machine learning engineer working for Ekimetrics in Paris, France. I lead complex data projects on a wide range of subjects such as NLP, churn and computer vision.

Previously, I was working at Shell Energy in Coventry, United Kingdom, for over 3 years. As data scientist there, I worked in cross-functional teams and learned about building NLP and churn models, deploying them in data pipelines using CI/CD and MLOps. Since then, I've been particularly interested in applying data engineering tools and techniques to make model predictions available and create / adapt dev ops principles to machine learning based projects.

Originally from Paris, I went to study in Nantes to get a master's degree in engineering from Ecole Centrale de Nantes and another in business management from Audencia Business School. I then moved to the United States for a semester in Boston University, before joining Shell Energy in September 2017.

My data science journey began when I interned in Telecom ParisTech in 2016. I was working on creating an experiment to record EEG signals of people listening to music to retrieve the instrument they were focusing on (a paper was published!).

At Shell Energy, my initial focus was Natural Language Processing, mainly sentiment analysis and text classification, often using deep learning (for which I have a keen interest). However, my projects vary in topic and recently my main focus has been binary classification in a problem where feature engineering is key.

I am also a Cloud enthusiast. To understand it better and to be able to use cloud services, I have started learning AWS services a little over a year ago. This website is actually hosted on S3!

On a more personal note, I love music and I have been practicing the guitar for around 20 years. I particularly enjoy listening to progressive rock styles and jazz, playing in bands (in rehearsed settings or improvised jams), and writing music (I've written an album, of which two tracks were recorded!).

Close

Work

Github

Link to my GitHub account

Projects on my github include:

  • Kaggle's titanic dataset with Spark on databricks (link)
  • Catboost test: how it performs and model inspection intefrations (e.g. SHAP) (link)
  • Quarantine popups, a Kivy (python) app that reminds me to exercise when I work from home. (link)
  • This website, for which every push to master triggers a build and pushes it to s3. (link)
  • Code for a text classifier using deep learning and Keras. (link)
  • treegoat: helper functions to help with kaggle competitions or personal projects. Currently contains an NLP helper class that when inherited allows to build custom tensorflow/keras models while handling the tokenizing and label formatting. (link)

Link to my Kaggle profile

Projects on Kaggle:

  • Image classification for gender detection using tensorflow v2 trained on >200,000 images (GPU), 96% accuracy. (link)
  • Catboost notebook on the titanic dataset (link)
  • spooky author identification (using fasttext, no notebook available)

Publications

  • Workshop on Speech, Music and Mind 2019: MAD-EEG: an EEG dataset for decoding auditory attention to a target instrument in polyphonic music.
  • Close

    About

    Linked In

    https://www.linkedin.com/in/gabriel-trégoat/

    Technology stack

    General

    Python has been my main programming language since 2016. I use it for creating machine learning models, data engineering and more general applications (creating web apps, UIs, dashboards...). Pluralsight score for core language: 240 (265 for data analysis).
    Pycharm Pycharm is my IDE of choice. I use it for python as well as for SQL, shell scripts, and web development.
    Bash & Linux I use linux on a daily basis and write shell scripts to automate tasks and orchestrate jobs.
    Office tools Microsoft office suite, Google business suite, Lyx (LaTeX editor)

    Machine learning toolkit

    Scikit learn is one of the main libraries that I use, and I always write my code in accordance with the scikit learn API.
    Algorithms with sklearn I use algorithms from various libraries in conjunction with scikit learn (e.g. in a pipeline). These include XGBoost and Catboost (and I have a catboost test on my kaggle / git account!)

    Deep learning
    I use keras and tensorflow for building deep learning models. I have worked mainly on NLP projects (sentiment analysis, text classification, topic modelling).
    Cloud tools I use AWS Sagemaker and its built in algorithms (e.g. DeepAR), as well as Databricks.
    Explainability My tool of choice for explaining models is SHAP: it helps me explore how a variable influences a model's output, how important the variables are, and gives an idea of what variables explain a prediction and how they influenced it. Other good tools: LIME, ELI5.

    Data engineering toolkit

    When transforming data, my main tool for heavy lifting is Hadoop, that I use mainly through Impala and Hive SQL languages. I also use web HDFS in custom python libraries for optimised data manipulation and loading outside of a hadoop cluster.
    Pandas is my main data manipulation tool when using python, usually when the heavy lifting has been done in another parallel processing optimised tool (e.g. SQL).
    I occasionally use spark instead of the combination impala SQL + pandas. If I'm on a machine learning task, I usually end up using toPandas().
    I mainly use serverless services such as s3 and lambda. For example, this website uses no server that I maintain, gets automatically published when I push to github thanks to CodePipeline, which also invalidates the Cloudfront cache. I also am pursuing the cloud practitioner certification.
    I have basic docker knowledge that allows me to build an app once to run everywhere without fighting with python for dependencies or versions.
    I use joblib as my main parallel compute tool when coding in python. I also have an interest in Dask.

    Data Viz'

    Tableau I use Tableau for creating dashboards and stories to deliver insights to business stakeholders.
    I use Dash when I need custom data calculations or interactions that I could not get using Tableau.

    Education

    Master: Data processing applied to imagery, biomedical and audio

    After learning the basics of various engineering topics, I specialised in data processing, mainly out of interest for audio at the time (I am a musician, and I was sound engineer at a few concerts). This computer science speciality was heavy in Matlab and included machine learning courses, which led me to pursue an internship then a career in this field.

    Master: business management

    I studied business in parallel to my engineering education. After general business courses on a number of topics such as marketing and finance, I majored in management control. This helped me understand how operations are run, how all parts of a company tie together and how I can help with my technical skills.

    Business management semester

    I went to Boston University (Massachusetts, USA) to improve my business knowledge in several fields I was interested in and see how things are done abroad. Among other topics, I followed a course on startup finance. There, I learned to draft business plans to evaluate the benefits of a project, perfecting my financial piloting skills acquired at Audencia. I also studied project management techniques, of which I now follow the Agile methodology.

    Close

    Contact

    Close

    Elements

    Text

    This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


    Heading Level 2

    Heading Level 3

    Heading Level 4

    Heading Level 5
    Heading Level 6

    Blockquote

    Preformatted

    i = 0;
    
    while (!deck.isInOrder()) {
        print 'Iteration ' + i;
        deck.shuffle();
        i++;
    }
    
    print 'It took ' + i + ' iterations to sort the deck.';

    Lists

    Unordered

    • Dolor pulvinar etiam.
    • Sagittis adipiscing.
    • Felis enim feugiat.

    Alternate

    • Dolor pulvinar etiam.
    • Sagittis adipiscing.
    • Felis enim feugiat.

    Ordered

    1. Dolor pulvinar etiam.
    2. Etiam vel felis viverra.
    3. Felis enim feugiat.
    4. Dolor pulvinar etiam.
    5. Etiam vel felis lorem.
    6. Felis enim et feugiat.

    Icons

    • Twitter
    • Facebook
    • Instagram
    • Github

    Actions

    • Default
    • Default
    • Default
    • Default

    Table

    Default

    Name Description Price
    Item One Ante turpis integer aliquet porttitor. 29.99
    Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
    Item Three Morbi faucibus arcu accumsan lorem. 29.99
    Item Four Vitae integer tempus condimentum. 19.99
    Item Five Ante turpis integer aliquet porttitor. 29.99
    100.00

    Alternate

    Name Description Price
    Item One Ante turpis integer aliquet porttitor. 29.99
    Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
    Item Three Morbi faucibus arcu accumsan lorem. 29.99
    Item Four Vitae integer tempus condimentum. 19.99
    Item Five Ante turpis integer aliquet porttitor. 29.99
    100.00

    Buttons

    • Primary
    • Default
    • Default
    • Small
    • Icon
    • Icon
    • Disabled
    • Disabled

    Form

    Close

    © Design: HTML5 UP.

    Static website created by me using AWS services