image4D — Knowledge Representation

When Geometry Alone Is Not Enough

Photogrammetry can reconstruct a site with millimetre precision — every stone, every joint, every surface faithfully captured in three dimensions. Yet a point cloud, however dense, remains silent. It records shape but not meaning. It knows where a wall stands, but not when it was built, what it replaced, or how it relates to the structure next to it.

The real challenge is not to measure more accurately, but to connect geometry with knowledge — to make a 3D model that understands what it represents.

Shawbak Castle, Jordan — Crusader fortress in the Ma'an region

Shawbak Castle (Krac de Montréal), Jordan — a Crusader fortress surveyed over 16 years of collaboration between Aix-Marseille Université and the University of Florence

Ontologies: Giving Structure to Knowledge

An ontology is a formal model of a domain of knowledge — a shared vocabulary that defines concepts, their properties, and the relationships between them. In the context of architectural and archaeological survey, two kinds of knowledge must be captured simultaneously: how measurements are made (the photogrammetric process) and what is being measured (the archaeological objects and their history).

Our approach extends the CIDOC Conceptual Reference Model (CRM), the ISO standard for cultural heritage documentation, with domain-specific classes for architectural elements — ashlar blocks, bricks, vaults, stratigraphic units — each linked to its geometric representation through the photogrammetric measurement that produced it.

CIDOC CRM ontology hierarchy — from Thing to architectural and archaeological classes

The ontology hierarchy: CIDOC CRM classes extended with architectural and archaeological concepts — every measured block is both a geometric entity and a knowledge object

Space and Time: the Ontological Fabric of Archaeology

Archaeology is, at its core, a discipline that reads space to reconstruct time. The position of a stone within a wall, the way two courses of masonry meet, the mortar that binds or separates them — these spatial observations are the raw material from which chronological narratives are built. The ontology must therefore capture not only what an object is, but where it sits and when it was placed.

At Shawbak, each wall face (or parement) is decomposed into stratigraphic masonry units (USM). Each USM groups stones that share the same construction phase, material, and technique. The ontology records the spatial relationships between these units — adjacency, superposition, abutment — and from these physical contacts, temporal relationships can be inferred.

Shawbak wall with stratigraphic units colour-coded and spatial relations drawn between blocks

A wall face at Shawbak: each colour represents a stratigraphic masonry unit (USM), and the lines trace the spatial relationships between numbered blocks — the geometric evidence from which chronology is deduced

The key insight is bidirectionality: the ontology does not merely annotate the 3D model after the fact. It is woven into the measurement process itself. When a stone is surveyed, the system records not only its coordinates but also the photogrammetric method, the camera parameters, the operator, and the date — all as first-class knowledge entities linked through formal relations.

Where Geometry Meets Knowledge

At the heart of this work lies a deceptively simple idea: if you know what you are looking at, you can measure it with fewer observations. A rectangular ashlar block, for instance, can be fully described by a reference plane and an extrusion depth. Rather than scanning every surface, the system uses the ontological description of the object — its morphological class — to constrain the photogrammetric reconstruction.

This is the I-MAGE process (Image processing and Measure Assisted by GEometrical primitive): a priori knowledge about the shape of an object guides and simplifies its 3D survey. A single photograph, combined with the right geometric model, can yield a complete dimensional record. Knowledge informs geometry, and geometry in turn enriches knowledge — a virtuous circle where each discipline strengthens the other.

The Same Arch, Two Views of Knowledge

The pair of images below illustrates this convergence. On the left, the photogrammetric model renders the arch of Shawbak as the eye would see it — realistic textures, faithful geometry, but every stone is anonymous. On the right, the same arch has been segmented: each block carries its own identity, its own colour, its own entry in the ontological knowledge base. The geometry is identical; the knowledge is radically different.

Photogrammetric 3D model of the Shawbak arch — realistic texture, anonymous geometry

The photogrammetric model: precise geometry, realistic texture — but every stone is anonymous

Segmented 3D model of the Shawbak arch — each stone individually identified and colour-coded

The knowledge-enriched model: each stone individually identified, classified, and linked to the ontology

From Manual Annotation to Automatic Recognition

With over 250,000 photographs accumulated across multiple sites, manual analysis becomes impractical. The challenge is to teach a machine to recognise archaeological objects — stone blocks, architectural elements — directly in the images, and then transfer that recognition into 3D space.

The process begins with collaborative annotation using CVAT (Computer Vision Annotation Tool), an open-source platform developed by Intel. Archaeologists and photogrammetrists work together to outline each object instance as a polygon mask, building the training dataset that the neural network will learn from.

CVAT annotation interface — stone blocks individually outlined on the Shawbak Castle arch

Collaborative annotation in CVAT — each stone block is individually outlined as a polygon instance, building the training dataset for the neural network

The AI Pipeline: Detectron2 & Mask R-CNN

The annotated images feed into Detectron2, Facebook AI Research's open-source framework for object detection and instance segmentation. At its core, Mask R-CNN combines object detection with pixel-level mask prediction: for each object in an image, the network simultaneously predicts a bounding box, a class label, and a precise silhouette mask.

The training pipeline follows a cyclic workflow: Dataset → Annotation (CVAT) → Data Augmentation → Training (Detectron2 / Mask R-CNN) → Testing — and back again, refining the model with each iteration.

Mask R-CNN architecture

From 2D Recognition to 3D Segmentation

The crucial step is the bridge between 2D image analysis and 3D geometry. Objects recognised in individual photographs must be projected onto the photogrammetric 3D model. Because the camera positions and orientations are known from the photogrammetric reconstruction, each pixel mask can be traced back to a precise location in 3D space.

The result is a complete 3D model where every archaeological object is individually identified and segmented — not as an anonymous cluster of points, but as a named, classified, measurable entity linked to the ontological knowledge base.

The 3D model is no longer a static picture. It becomes a dynamic analytical tool: query the knowledge base to display only the blocks belonging to a given construction phase, highlight a specific stratigraphic unit, or colour each stone according to its material type — all from the same underlying model.

Reasoning About Time: Allen's Algebra

Archaeology is fundamentally a science of time. The traditional tool for recording temporal relations between stratigraphic units is the Harris Matrix — a directed graph where each node represents a unit of stratification and each edge a "before/after" relation. It is elegant but limited: it treats events as points on a timeline, unable to express duration, overlap, or concurrency.

We replace this point-based model with James F. Allen's interval algebra, a formalism from artificial intelligence that defines thirteen possible relations between two time intervals: precedes, meets, overlaps, starts, during, finishes, equals — and their inverses. Where the Harris Matrix can only say "A was built before B", Allen's algebra can express that "A was being built while B was being demolished", or that "the construction of C started at the same time as D but finished later".

From Harris Matrix to Allen's interval algebra — 10 physical relations mapped to 13 temporal relations

From Harris to Allen: 10 physical relations observed in the masonry are translated into 13 possible temporal interval relations — a richer, more expressive temporal language

This is where geometry and knowledge truly converge. The spatial position of each stone — measured by photogrammetry — constrains the temporal reasoning. Two walls that share a bonded joint must be contemporary; a wall that abuts another must be later. The 3D model becomes not just a geometric record but a temporal reasoning engine, where physical contact implies chronological relationship.

By encoding these relations as Qualitative Constraint Networks (QCN), the system can automatically check the consistency of the archaeologist's chronological hypotheses. If a proposed sequence of construction phases contradicts the geometric evidence — a wall claimed to be earlier actually overlaps one claimed to be later — the system detects the inconsistency and signals it.

3D model with stratigraphic units (USM) identified and labelled

Stratigraphic units (USM) represented in 3D — their spatial relationships, measured by photogrammetry, constrain the temporal reasoning through Allen's interval algebra

The 3D model thus becomes an active participant in the interpretive process, not a passive illustration. Space, time, and knowledge are seamlessly integrated — asking "show me everything built by the Ayyubids after the 1261 earthquake" produces an instant, geometrically accurate answer.

When Knowledge Meets Geometry

The work presented here sits at the crossroads of two traditions in artificial intelligence that are too often opposed: the connectionist approach and the symbolic approach.

On the connectionist side, deep neural networks — Mask R-CNN trained via Detectron2 — learn to recognise archaeological objects directly from pixel data. They excel at perception: detecting, segmenting, and classifying stones, blocks, and artefacts across thousands of photographs with a precision that no manual process could match at scale.

On the symbolic side, formal ontologies grounded in the CIDOC CRM and Allen's temporal algebra provide the reasoning framework. They encode domain knowledge, enforce logical consistency, and allow queries that cross the boundary between space and time — "show me every Ayyubid wall that predates the 1261 earthquake" is a question that no neural network alone could answer.

The true power emerges at their intersection. The neural network feeds the ontology with identified objects; the ontology gives those objects meaning, context, and temporal depth. Geometry, measured by photogrammetry, is the bridge between both worlds: it is the raw material that the connectionist system processes and the spatial evidence on which the symbolic system reasons.

Neither approach alone is sufficient. Connectionist AI sees but does not understand; symbolic AI understands but does not see. By weaving them together through the common thread of 3D geometry, we build systems that can both perceive the physical world and reason about its history — bringing us closer to a true digital understanding of cultural heritage.

image 4D