By integrating profound academic knowledge with hands-on industry experience, I have established a specialised field in developing groundbreaking solutions utilising Machine Learning, Software Engineering, and Big Data technologies. During my career, I have been involved in a variety of significant initiatives, including efforts to prevent financial misconduct, detect online radicalisation, and fathom the complexities of online user behaviour. My affiliation with esteemed institutions such as HSBC and The European Research Council (ERC) serves as evidence of my dedication and expertise.


Hassan Saif

Lead Data Scientist with over 15 years' experience in Machine Learning, Computational Social Science, Software Engineering, and Big Data

  • Milton Keynes, UK


PhD in Computational Data Science & NLP

2011 - 2015

The Open University, Knowledge Media Institue, UK

Thesis: Semantic Sentiment Analysis of Microblogs

A study that leveraged Semantic Web techniques together with Machine Learning and NLP to enhance sentiment analysis performance on social media. I designed several methods to enrich both supervised and unsupervised Machine Learning models with the conceptual and contextual semantics of words. Additionally, I developed SentiCircles, a semantic vector representation of words that uses Trigonometry and Euclidian Geometry to better assign sentiment to words based on their contextual and conceptual meanings. The scientific community has made extensive use of SentiCircles to improve both traditional and deep learning models for sentiment analysis.

The doctoral thesis was honoured with the prestigious International Semantic Web Distinguished PhD Thesis award (Japan, Kobe 2016) and was later published as a book in 2017.

B.Sc. in Computer Science & Artificial Intelligence

2003 - 2008

Damascus University, Damascus, Syria

Studied key courses including Machine Learning, Deep Neural Networks, Programming Languages, Game Theory, Applied Mathematics, and Software Engineering.

Technical Skills

Big Data

  • Strong experience with managing and processing large production databases using on-premise and cloud-based data solutions including, the Hadoop ecosystem (YARN, HDFS, MapReduce, Hive, Avro) and Google Cloud Datastores.
  • Experience with building and managing ETL and data flow pipelines using DBT
  • Good knowledge of RMDBs such as MySQL and SQL Server.
  • Familiarity with Cloud services particularly Amazon Web Services and Google Cloud Platform and associated analytics and ML capabilities (TFX, GKE, Kubeflow).
  • Experience with building event-driven microservices architectures using Apache Kafka.

Analytics & Machine Learning

  • Strong experience in quantitative and qualitative analytics, ML (pattern recognition, classification, clustering, dimensionality reduction, features engineering, and model evaluation)
  • Strong Experience in Natural Language Processing, Text Mining, Semantic Vector representations, Topic Modelling, and User Behaviour Analysis.
  • Familiarity with Semantic Web modelling, toolkits, and ontologies including, Semantic Graphs, RDF, OWL, DBpedia, etc.
  • Experience in ML and Deep Learning frameworks and packages including, scikit-learn, TensorFlow, Keras, TFLearn, Jupyter, Pandas, NumPy, etc.
  • Data visualization with Matplotlib, Pyplot, Seaborn, and ggplot2.

Software Design and Engineering

  • Strong experience in Python, and previous knowledge of R, Java, C++, C#, PHP, and JavaScript.
  • Good knowledge of functional and object-oriented programming.
  • Software design, development, testing, deployment, and architecture design.
  • Experience with development CI/CD and version control tools, including Git, Gerrit, and Travis.
  • Good knowledge of web development packages and frameworks such as Dash and the MEAN stack (MongoDB, Express Angular, and Node.js).

International Awards

  • International SWSA Distinguished PhD Dissertation Award, “Semantic Sentiment Analysis of Microblogs”, ISWC, Kobe, Japan, Oct 2016.
  • Best Research Paper Award (Nominee), “Mining Pro-ISIS Radicalisation Signals from Social Media Users”, ICWSM, Cologne, Germany, 2016.
  • Best Journal Paper of the Year (Honourable Mention), “Contextual semantics for sentiment analysis of Twitter”, Information Processing and Management, 2015.
  • Best Research Paper Award, “Adapting sentiment lexicons using contextual semantics for sentiment analysis of Twitter” 1st workshop on Semantic Sentiment Analysis. Crete, Greece, 2014.
  • Best Research Paper Award, “Alleviating Data Sparsity for Twitter Sentiment Analysis”, 2nd workshop on Making Sense of Microposts. Leon, France, 2012.

Professional Experience

Senior Data Science Manager

2018 - Present

HSBC Group, Research & Development, Chief Compliance Office, London, UK

Lead the design and development of innovative, large-scale, and production-ready anti-financial crime systems. This is an interdisciplinary role at the intersection of quantitative analysis/ML, software development, and pipeline infrastructure engineering. I am also responsible for managing multiple cross-jurisdiction analytics teams, as well as preparing and delivering a broad range of technical briefings for a range of senior HSBC executives. Key responsibilities and achievements include:

  • Managing the design, development, and delivery of HSBC's global anti-financial crime ML-based ecosystems.
  • Working with extremely large, complex data sets (+30Bn records), generating insights, and identifying and modelling historical and new emerging financial crime risk behaviours and trends.
  • Building an end-to-end NLP-powered ML pipeline for identifying confirmed financial crime risk typologies within a global population of the HBSC known-bad actors. The developed pipeline helped deliver auto-generated gold-standard evaluation datasets for model training and evaluation.
  • Building ML-based anti-money laundering models that are ~5 times better and detect risk ~0.7 months faster than existing systems.
  • Working jointly with IT and engineering teams on the integration and deployment of large-scale ML pipelines.
  • Leading model performance analysis and evaluating business metrics to generate actionable insights and provide compelling visualisations for both business and technical stakeholders to support evidence-based decision-making processes within the bank.
  • Managing key stakeholder relationships to determine and translate business needs into research requirements and development workflows.

Lead Data Scientist

2017 - 2018

True212, Engineering Dept, London, UK

Led the design and development of project RER (Real-Time Editorial Resource), a recommender system that provides journalists and editors with real-time insights, recommendations, and monitoring services on social media. Main responsibilities included:

  • Design scalable ML and NLP pipelines (NER, WSD, Topic Modelling, and Semantic Vector Representations) for identifying relevant insights, trends, and breaking news from social media.
  • Using Big Data technologies and event streaming frameworks (Hadoop, Spark, Kafka) along with cloud computing services (AWS, Atlas) to design and build an event-driven and fault-tolerant system architecture for the RER's ML framework.
  • Managing the RER software development lifecycle across internal and overseas teams.

ML Research Associate

2015 - 2017

The Open University, Knowledge Media Institue, Milton Keynes, UK

Led and managed two EU-funded research and innovation projects: COMRADES and TRIVALENT:

  • In TRIVALENT, I worked closely with over 20 research teams and policing organizations across Europe to target the problem of online radicalisation and counter-terrorism on social media. This involved leveraging techniques from ML, User Behavior Analysis, and Big Data to build a large-scale ML framework for detecting radicalized users on social media, tracking the divergence of user behaviours during the radicalisation process, and analyzing what influences users to adopt a radicalized stance. Evaluation of over 100 million tweets showed that the developed models produce ~7% better accuracy than existing models.
  • In COMRADES, I designed a semantically enriched Deep Learning model for crisis-related event detection on social media. The developed model showed to outperform existing ML models by up to 3.7%. This model is currently used by multiple international humanitarian organizations to identify crisis-related information in social streams

Research Assistant

2014 - 2015

The Open University, Knowledge Media Institue, Milton Keynes, UK

Leveraged NLP, Machine Learning, Topic Modelling, and Semantic Web techniques to develop prediction models to capture the dynamics of citizen's policy discussions and the spread of polarised sentiment on social media. These models have been used by a wide range of political scientists and governmental organisations (including Members of Parliament in both Germany and the UK) as part of the EU-funded project SENSE4US.

Software Engineer (PT)

2013 - 2013

The Open University, Knowledge Media Institue, Milton Keynes, UK

Developed a web-based framework for assessing the quality of research publications within the Open University (OU) as part of the UK Research Excellence Framework (REF). This framework has been continuously used to assess the research performance of the OU and help aid the research grant allocations from multiple Higher Education funding bodies. Programmed in PHP, JavaScript, MySQL, and SPARQL.