Data Engineer / AI Developer (Python, Airflow, LLMs)

 Location: Remote (preferably CET time zone)

About Startup Researcher

Startup Researcher is building the most reliable intelligence platform for private tech investments — combining media, data, and AI.

 Our products span:

Media Hub → editorial news & signals on startups, investors, and markets

Data Hub → a live database of startups, investors, and funding rounds

Data Engine → the backend AI system that collects, cleans, and structures the world’s startup data

We’re hiring a dedicated developer for the Data Engine — the core AI-powered pipeline behind our entire ecosystem.


Role Overview

You’ll be the main developer of the Data Engine repository, responsible for designing, maintaining, and evolving our end-to-end ETL pipelines.

You’ll work closely with the founders, experimenting with LLM-based extraction, web scraping, and data enrichment pipelines — and have full freedom to explore, test, and deploy new ideas.

This is a hands-on, highly autonomous, and exploratory role for a builder who loves both Python craftsmanship and AI research.


Responsibilities

* Design and build modular ETL pipelines using Airflow (2.x / 3.x): ingest → normalize → enrich → load.
* Develop custom scrapers for startup-related sources (e.g., news websites, VC portfolios, startup websites…).
* Integrate LLMs (Gemini etc.) to extract structured information (e.g., funding rounds, acquisitions) from unstructured text.
* Transform raw text data (HTML, RSS, JSON, PDFs) into structured database entities aligned with our Data Hub schema.
* Maintain clean, production-ready code — versioned, tested, and easy to extend for future engineers.
* Collaborate asynchronously: share progress, results, and new ideas for improvement.
* Continuously research new AI/LLM techniques relevant to automated data extraction and summarization.

Example Projects You’ll Work On

* Build a news scraper that extracts articles from multiple African tech media sites, cleans the content, and loads it into our CMS.
* Create a funding-round detector: use keyword filters + LLM reasoning to identify and structure funding events from news articles and load the data to the Data Hub.
* Enrich startup profiles by scraping, and standardizing data to inject it to the Data Hub.
* Implement AI-powered classification of startups by sector, stage, and market using embeddings and few-shot models.
* Build a data validation framework to detect duplicates or inconsistencies across incoming pipelines.


Must-Have Skills

* Excellent Python (async programming, typing, modular design)
* Strong experience with Airflow and ETL orchestration. Knowledge of Airflow 3.x is a huge plus
* Hands-on experience with LLMs / Generative AI — fine-tuning, prompt design, or API integration.
* Web scraping expertise (BeautifulSoup, Requests, Selenium, or Async methods).
* Experience with Git / GitHub workflows, CI/CD, and code versioning best practices.
* Strong understanding of data transformation & normalization principles.
* Comfortable working autonomously, taking ownership of a complex repository.


Bonus Points

* Own projects or demos involving LLMs, embeddings, or data pipelines.
* Familiarity with startup ecosystems — funding rounds, VCs, acquisitions, exits.
* Experience with Docker, PostgreSQL (SQLAlchemy, alembic).
* Interest in data accuracy, knowledge graphs, or AI-driven data enrichment.

Salaire

_

À la tâche (ou Basé sur le projet) based

Localisation

Rabat, Rabat-Salé-Kénitra, Morocco

Aperçu du poste
Emploi affiché:
il y a 2 semaines
Expiration:
dans 2 semaines
Type d'emploi
Contractuel
Rôle de l'emploi
Ingénieur(e) en intelligence artificielle
Catégorie
Informatique et développement
L'éducation
Bac+4
Expérience
Débutant < 2 ans
Total des postes vacants
1
Vues
21

Partager cet emploi:

Localisation

Rabat, Rabat-Salé-Kénitra, Morocco

Quitter Joobaz Vous êtes sur le point de visiter l'URL suivante URL invalide

Chargement...
Commentaires


Commentaire créé.
Veuillez vous connecter pour commenter !