Matchmaker

Describe a research topic, theme, or question in your own words. The tool uses AI to find the most relevant THUAS publications and, through them, the researchers working in that area.

This searches by topic, not by person name. To search researchers by name, use the Research Explorer tab.

Try:

Knowledge Graph

Interactive research landscape — Domain → Field → Subfield → Topic → Publication → Researcher

Click to expand: domain → field → subfield → topic → publication → author → co-authors
×
Domain
Field
Subfield
Topic
Publication
Research Group

Research Explorer

Browse the CERIF research database — publications, researchers, topics, organisations & SQL

Dashboard Publications Researchers Topics Organisations SQL
TitleYearTypeLanguage
NamePublications
KeywordPublications
NameTypeResearchers
Ctrl+Enter
Example Queries
Schema

Data Source Evaluation

Overview of investigated data sources, their accessibility, and impact on the matchmaker PoC

Data SourceDomainAccessChallengeImpact on PoC
SURF Sharekit Research Available Weak raw keywords, no topic hierarchy in source data Primary source — 4,400+ THUAS publication records via OAI-PMH API
OpenAlex Research Available ~15% of THUAS publications not found (title matching gaps) Topic enrichment — standardised 4-level taxonomy + keyword generation via LLM
SharePoint
Lectoraat & Onderzoek
Research Partial Manual export only, limited fields, anonymisation needed Supplemental — adds internal metadata fields not available in Sharekit
Google Scholar Research Not viable No official API; scraping violates ToS, unreliable at scale Not used — no reliable production path
ResearchGate Research Not viable No API; requires manual profile reconciliation per researcher Not used — effort-to-value ratio too high
Osiris Education Blocked — FZIT Requires FZIT portfolio approval process; not yet arranged Missing — formal curriculum, course catalogue, learning outcomes
Brightspace Education Blocked — FZIT Requires FZIT portfolio approval process; governance stalled Missing — teaching materials, assignments, course content
Internal Employee Data Identity Blocked — FZIT Requires FZIT portfolio approval process; no structured API route Missing — staff identity linking across research and education

Architecture & Roadmap

How the system works today, what is still missing, and what could be built next

Current Architecture

Data Harvest

SURF Sharekit OAI-PMH API delivers ~4,400 THUAS publication records. OpenAlex enriches each with a standardised 4-level topic taxonomy.

LLM Classification

Azure OpenAI reviews every publication, scores keyword relevance, and assigns confidence-weighted topics using a 3-step pipeline (OpenAlex → LLM → scoring).

Embedding Index

BAAI/bge-m3 (1024-dim) encodes topics, keywords, and publication titles into L2-normalised vectors for cosine similarity search.

CERIF Database

PostgreSQL database structured to the CERIF standard (cfPers, cfResPubl, cfOrgUnit, cfProj) with VIVO ontology views.

Matchmaker

Topic-based semantic search: user describes a research area → AI finds the most relevant publications → researchers are discovered through their work.

Knowledge Graph

Pre-built graph (12,337 nodes / 16,767 edges) spanning Domain → Field → Subfield → Topic → Publication → Researcher hierarchy.

How the Matchmaker Works

1
Query Embedding

User describes a topic in natural language; bge-m3 encodes it into a 1024-dim vector

2
Topic Matching

Cosine similarity between the query and every topic in the OpenAlex taxonomy

3
Keyword Boosts

OpenAlex taxonomy keywords and LLM-curated per-publication keywords each nudge scores up to 30%

4
Ranked Results

Publications ranked by relevance × classification confidence; low-confidence matches filtered out

Current limitation: The matchmaker searches by topic only. It finds publications relevant to a research area and surfaces the researchers behind them. It cannot yet search by person name, expertise profile, or answer conversational questions.

What Is Still Missing

Critical

Person Search & Researcher Profiles

The matchmaker cannot search by researcher name or expertise profile. A future version should allow searching for people directly — by name, skills, or research history — not only by topic. This requires embedding researcher profiles alongside publication topics.

Critical

Conversational Queries

Users currently need to formulate topic-style searches. A conversational interface (e.g. “Who at THUAS can help me with a project on climate adaptation in coastal cities?”) would be far more natural. This requires an LLM layer that interprets intent, extracts topics, and synthesises answers from the search results.

Critical

Verification & Evaluation

The ranking algorithm has not been formally evaluated. There is no ground-truth dataset of “correct” matches, no precision/recall metrics, and no user study confirming that results are useful. Evaluation should include: relevance judgements by domain experts, comparison against baseline keyword search, and A/B testing of scoring parameters (thresholds, boost weights).

Critical

No Education Data

Osiris (curriculum, learning outcomes) and Brightspace (course content, materials) are blocked behind FZIT portfolio approval. Without education data, the tool cannot match research to teaching.

Critical

No Identity Bridge

Internal employee data is FZIT-blocked. Staff cannot be linked across research publications and teaching roles without ORCID or employee-ID reconciliation.

Moderate

DB & Embeddings Not Merged

The CERIF PostgreSQL database and the embedding-based search engine run independently. A unified query layer would enable combined structured + semantic search (e.g. “find publications about AI from Faculteit IT & Design since 2022”).

Moderate

VIVO Ontology Partial

VIVO is implemented as 10 SQL views, not a proper triple store. No SPARQL endpoint or RDF export exists yet for interoperability with other research information systems.

Low

~15% OpenAlex Miss Rate

About 15% of THUAS publications are not found in OpenAlex (title matching gaps), leaving them with weaker topic classification.

What Could Be Built Next

Enhancement

People Search

Embed researcher profiles (aggregated from their publications, keywords, and organisational metadata) so users can search for people directly. Enable queries like “who works on machine learning?” returning ranked researcher cards with their top publications.

Enhancement

Conversational Interface

Add an LLM-powered chat layer that interprets natural language questions, calls the matchmaker API internally, and presents a synthesised answer. This would support multi-turn dialogue: “Find me AI researchers” → “Now only from the health faculty” → “Who among them has collaborated with external partners?”

Enhancement

Algorithm Evaluation Framework

Build a test harness with labelled query–result pairs rated by domain experts. Measure precision@k, nDCG, and mean reciprocal rank. Use this to tune scoring thresholds, keyword boost weights, and the 0.45 relevance cut-off with evidence rather than intuition.

Enhancement

Education Data Integration

If FZIT access is granted: ingest Osiris (curricula, learning outcomes) and Brightspace (course content). Build an identity bridge linking teaching staff to their research output. Enable cross-domain queries like “which courses cover topics related to this research group?”

Enhancement

Hybrid Structured + Semantic Search

Merge the PostgreSQL database and embedding engine into a single query pipeline. Allow filters (year, faculty, type) to be applied alongside semantic ranking in one query, rather than requiring users to switch between tabs.

Enhancement

Live Data & Incremental Updates

Automate the SURF Sharekit harvest on a schedule. Incrementally update embeddings and graph data when new publications arrive, instead of requiring a full rebuild.