Data Engineer / AI Developer (Python, Airflow, LLMs)


  Location: Remote (preferably CET time zone)


About Startup Researcher

Startup Researcher is building the most reliable intelligence platform for private tech investments — combining media, data, and AI.

Our products span:

* Media Hub → editorial news & signals on startups, investors, and markets
* Data Hub → a live database of startups, investors, and funding rounds
* Data Engine → the backend AI system that collects, cleans, and structures the world’s startup data

We’re hiring a dedicated developer for the Data Engine — the core AI-powered pipeline behind our entire ecosystem.


Role Overview

You’ll be the main developer of the Data Engine repository, responsible for designing, maintaining, and evolving our end-to-end ETL pipelines.

You’ll work closely with the founders, experimenting with LLM-based extraction, web scraping, and data enrichment pipelines — and have full freedom to explore, test, and deploy new ideas.

This is a hands-on, highly autonomous, and exploratory role for a builder who loves both Python craftsmanship and AI research.


Responsibilities
* Design and build modular ETL pipelines using Airflow (2.x / 3.x): ingest → normalize → enrich → load.
* Develop custom scrapers for startup-related sources (e.g., news websites, VC portfolios, startup websites…).
* Integrate LLMs (Gemini etc.) to extract structured information (e.g., funding rounds, acquisitions) from unstructured text.
* Transform raw text data (HTML, RSS, JSON, PDFs) into structured database entities aligned with our Data Hub schema.
* Maintain clean, production-ready code — versioned, tested, and easy to extend for future engineers.
* Collaborate asynchronously: share progress, results, and new ideas for improvement.
* Continuously research new AI/LLM techniques relevant to automated data extraction and summarization.


Example Projects You’ll Work On
* Build a news scraper that extracts articles from multiple African tech media sites, cleans the content, and loads it into our CMS.
* Create a funding-round detector: use keyword filters + LLM reasoning to identify and structure funding events from news articles and load the data to the Data Hub.
* Enrich startup profiles by scraping, and standardizing data to inject it to the Data Hub.
* Implement AI-powered classification of startups by sector, stage, and market using embeddings and few-shot models.
* Build a data validation framework to detect duplicates or inconsistencies across incoming pipelines.

Must-Have Skills
* Excellent Python (async programming, typing, modular design)
* Strong experience with Airflow and ETL orchestration. Knowledge of Airflow 3.x is a huge plus
* Hands-on experience with LLMs / Generative AI — fine-tuning, prompt design, or API integration.
* Web scraping expertise (BeautifulSoup, Requests, Selenium, or Async methods).
* Experience with Git / GitHub workflows, CI/CD, and code versioning best practices.
* Strong understanding of data transformation & normalization principles.
* Comfortable working autonomously, taking ownership of a complex repository.

Bonus Points
* Own projects or demos involving LLMs, embeddings, or data pipelines.
* Familiarity with startup ecosystems — funding rounds, VCs, acquisitions, exits.
* Experience with Docker, PostgreSQL (SQLAlchemy, alembic).
* Interest in data accuracy, knowledge graphs, or AI-driven data enrichment.

Salaire

_

Annuel based

Localisation

Rabat, Rabat-Salé-Kénitra, Morocco

Aperçu du poste
Emploi affiché:
il y a 2 semaines
Expiration:
dans 1 semaine
Type d'emploi
Temps plein
Rôle de l'emploi
Développeur(euse) full-stack
Catégorie
Intelligence artificielle et data
L'éducation
Bac+5 et plus
Expérience
Débutant < 2 ans
Total des postes vacants
1
Vues
18

Partager cet emploi:

Localisation

Rabat, Rabat-Salé-Kénitra, Morocco

Quitter Joobaz Vous êtes sur le point de visiter l'URL suivante URL invalide

Chargement...
Commentaires


Commentaire créé.
Veuillez vous connecter pour commenter !