Describe a research topic, theme, or question in your own words. The tool uses AI to find the most relevant THUAS publications and, through them, the researchers working in that area.
This searches by topic, not by person name. To search researchers by name, use the Research Explorer tab.
Interactive research landscape — Domain → Field → Subfield → Topic → Publication → Researcher
Browse the CERIF research database — publications, researchers, topics, organisations & SQL
| Title | Year | Type | Language |
|---|
| Name | Publications |
|---|
| Keyword | Publications |
|---|
| Name | Type | Researchers |
|---|
Overview of investigated data sources, their accessibility, and impact on the matchmaker PoC
| Data Source | Domain | Access | Challenge | Impact on PoC |
|---|---|---|---|---|
| SURF Sharekit | Research | Available | Weak raw keywords, no topic hierarchy in source data | Primary source — 4,400+ THUAS publication records via OAI-PMH API |
| OpenAlex | Research | Available | ~15% of THUAS publications not found (title matching gaps) | Topic enrichment — standardised 4-level taxonomy + keyword generation via LLM |
| SharePoint Lectoraat & Onderzoek |
Research | Partial | Manual export only, limited fields, anonymisation needed | Supplemental — adds internal metadata fields not available in Sharekit |
| Google Scholar | Research | Not viable | No official API; scraping violates ToS, unreliable at scale | Not used — no reliable production path |
| ResearchGate | Research | Not viable | No API; requires manual profile reconciliation per researcher | Not used — effort-to-value ratio too high |
| Osiris | Education | Blocked — FZIT | Requires FZIT portfolio approval process; not yet arranged | Missing — formal curriculum, course catalogue, learning outcomes |
| Brightspace | Education | Blocked — FZIT | Requires FZIT portfolio approval process; governance stalled | Missing — teaching materials, assignments, course content |
| Internal Employee Data | Identity | Blocked — FZIT | Requires FZIT portfolio approval process; no structured API route | Missing — staff identity linking across research and education |
How the system works today, what is still missing, and what could be built next
SURF Sharekit OAI-PMH API delivers ~4,400 THUAS publication records. OpenAlex enriches each with a standardised 4-level topic taxonomy.
Azure OpenAI reviews every publication, scores keyword relevance, and assigns confidence-weighted topics using a 3-step pipeline (OpenAlex → LLM → scoring).
BAAI/bge-m3 (1024-dim) encodes topics, keywords, and publication titles into L2-normalised vectors for cosine similarity search.
PostgreSQL database structured to the CERIF standard (cfPers, cfResPubl, cfOrgUnit, cfProj) with VIVO ontology views.
Topic-based semantic search: user describes a research area → AI finds the most relevant publications → researchers are discovered through their work.
Pre-built graph (12,337 nodes / 16,767 edges) spanning Domain → Field → Subfield → Topic → Publication → Researcher hierarchy.
User describes a topic in natural language; bge-m3 encodes it into a 1024-dim vector
Cosine similarity between the query and every topic in the OpenAlex taxonomy
OpenAlex taxonomy keywords and LLM-curated per-publication keywords each nudge scores up to 30%
Publications ranked by relevance × classification confidence; low-confidence matches filtered out
The matchmaker cannot search by researcher name or expertise profile. A future version should allow searching for people directly — by name, skills, or research history — not only by topic. This requires embedding researcher profiles alongside publication topics.
Users currently need to formulate topic-style searches. A conversational interface (e.g. “Who at THUAS can help me with a project on climate adaptation in coastal cities?”) would be far more natural. This requires an LLM layer that interprets intent, extracts topics, and synthesises answers from the search results.
The ranking algorithm has not been formally evaluated. There is no ground-truth dataset of “correct” matches, no precision/recall metrics, and no user study confirming that results are useful. Evaluation should include: relevance judgements by domain experts, comparison against baseline keyword search, and A/B testing of scoring parameters (thresholds, boost weights).
Osiris (curriculum, learning outcomes) and Brightspace (course content, materials) are blocked behind FZIT portfolio approval. Without education data, the tool cannot match research to teaching.
Internal employee data is FZIT-blocked. Staff cannot be linked across research publications and teaching roles without ORCID or employee-ID reconciliation.
The CERIF PostgreSQL database and the embedding-based search engine run independently. A unified query layer would enable combined structured + semantic search (e.g. “find publications about AI from Faculteit IT & Design since 2022”).
VIVO is implemented as 10 SQL views, not a proper triple store. No SPARQL endpoint or RDF export exists yet for interoperability with other research information systems.
About 15% of THUAS publications are not found in OpenAlex (title matching gaps), leaving them with weaker topic classification.
Embed researcher profiles (aggregated from their publications, keywords, and organisational metadata) so users can search for people directly. Enable queries like “who works on machine learning?” returning ranked researcher cards with their top publications.
Add an LLM-powered chat layer that interprets natural language questions, calls the matchmaker API internally, and presents a synthesised answer. This would support multi-turn dialogue: “Find me AI researchers” → “Now only from the health faculty” → “Who among them has collaborated with external partners?”
Build a test harness with labelled query–result pairs rated by domain experts. Measure precision@k, nDCG, and mean reciprocal rank. Use this to tune scoring thresholds, keyword boost weights, and the 0.45 relevance cut-off with evidence rather than intuition.
If FZIT access is granted: ingest Osiris (curricula, learning outcomes) and Brightspace (course content). Build an identity bridge linking teaching staff to their research output. Enable cross-domain queries like “which courses cover topics related to this research group?”
Merge the PostgreSQL database and embedding engine into a single query pipeline. Allow filters (year, faculty, type) to be applied alongside semantic ranking in one query, rather than requiring users to switch between tabs.
Automate the SURF Sharekit harvest on a schedule. Incrementally update embeddings and graph data when new publications arrive, instead of requiring a full rebuild.