How StreetGap Classifies Italian Street Names at Scale

StreetGap is built on a simple question: who gets remembered in Italian street names?

In traditional approaches, classifying an odonym usually means breaking the problem into four tasks:

split the street label into its denominazione urbanistica generica (DUG) and denominazione ufficiale (DUF)
decide whether the DUF is commemorative
decide whether the commemoration refers to a human and classify gender
identify the dedicatee

That decomposition is useful for rule-based, fuzzy, and structured-source methods. The alternative is to hand the full odonym directly to an LLM and ask it to classify the street by category, infer gender when relevant, and identify who or what it is named after. StreetGap experimented with both approaches.

Gender classification is one of the central outputs. Identifying the dedicatee matters too, but it is slightly less central to the project goal than deciding whether a street contributes to the observed gender gap.

At national scale, this means dealing with ambiguity, inconsistent naming, spelling errors, devotional variants, and uneven source quality.

The starting point: open geographic data

StreetGap starts from OpenStreetMap, which provides the street geometries and street names used throughout the project.

It was not the only source considered. A more institutional option was the Archivio Nazionale dei Numeri Civici e delle Strade Urbane (ANNCSU), which is in some respects a more official source for street naming data in Italy.

For StreetGap, though, ANNCSU had a critical limitation: it does not provide the street geometries needed for the project. Those geometries were essential because the main goal was not just to count names, but to make the gender gap visible on a national map people could actually explore.

At the current stage, the working dataset covers roughly 874,000 street records in Italy. Of these, about 832,000 have already been classified in some form, while about 42,000 remain unclassified. Around 478,000 are currently treated as consolidated or otherwise high-confidence classifications.

This is not a hand-curated list of commemorative streets. It is a national-scale classification system operating on raw street labels and geometries.

DUG, DUF, and linkwords

The first operation is structural parsing.

Italian street names are generally composed of two main parts:

DUG, short for denominazione urbanistica generica
DUF, short for denominazione ufficiale

In practice, the DUG is the generic street-type label, while the DUF is the specific part of the name that carries the actual dedication or reference. In Via Giuseppe Garibaldi, for example, Via is the DUG and Giuseppe Garibaldi is the DUF.

StreetGap starts from this split.

The DUG is the generic urban label: Via, Viale, Corso, Piazza, and so on. You can probably name twenty off the top of your head. The official registry at registry.geodati.gov.it/dug lists 87 official DUGs.

That sounds like a finite and manageable vocabulary. In practice it is broader. Real street data also contains many unofficial or local variants such as Campo, Viuzzo, and Ponticello. Someone has counted more than 300 generic urban denominations in use in Italy (laputa.it).

There is another complication: some odonyms contain short connector words between the DUG and the DUF. In StreetGap these are treated as linkwords. They are usually prepositions or other short grammatical connectors. They do not form a third top-level component of the odonym, but they do affect parsing because they effectively extend the DUG side of the split.

Example:

Viale = DUG
dei = linkword
Martiri per la Libertà = DUF

These linkwords are not just parsing noise. They can also become useful signals later in classification, including for distinguishing some commemorative patterns and, in some cases, for inferring gendered forms.

Once the DUG side is parsed, the project can focus on the real semantic payload: the DUF.

Classifying the DUF

After the DUG/DUF split, the main tasks all sit on the DUF.

1. Commemorative vs non-commemorative

The first question is whether the DUF is commemorative at all.

Some street names are not commemorative at all. They are descriptive, topographic, or otherwise non-dedicatory.

But commemorative does not mean human. A DUF can be commemorative and still point to a date, an event, a value, an institution, or a collective subject rather than to a person.

In practice, StreetGap ended up using a more explicit taxonomy. The commemorative side includes:

person_single
group
place
date
event
other

Everything outside that space is treated as either functional or unknown.

For StreetGap’s main purpose, the critical distinction is not just commemorative vs non-commemorative, but also human vs non-human within the commemorative space. Only the human branch enters the gender analysis.

Still, the non-human side cannot be treated lazily. If a classifier says “not human”, the result has to remain explainable. In practice the system still needs to know, at least roughly, what kind of non-human dedication it is dealing with and why it was excluded from the human-focused path.

This split is already large enough to matter on its own. Among classified streets, roughly 399,000 are treated as person dedications and roughly 433,000 as non-person dedications.

2. Human vs non-human, and gender

If the DUF is commemorative, the most important question for StreetGap is whether it refers to a human and, if so, how to classify gender.

This is one of the main outputs of the project. StreetGap is not trying to solve every possible historical or semantic question around a street dedication. It is trying, first of all, to decide whether a commemorative label belongs inside the measurable space of the gender gap.

The current methodology uses a binary female/male model based on sex assigned at birth as reported or inferable from reliable public data at scale. This is a methodological simplification tied to data availability and consistency, not a claim that the underlying social reality is exhausted by that model.

On the current person-dedicated subset, the imbalance is large. Roughly:

26,000 streets are classified as dedicated to women
331,000 are classified as dedicated to men
41,000 fall into mixed or unknown cases

Using the person-dedication denominator, that is about 7% women, 83% men, and 10% mixed or unknown.

3. Recipient identification

After that comes recipient identification: who exactly is being commemorated?

This matters for enrichment, aggregation, verification, and normalization. It lets StreetGap collapse variants such as abbreviated forms, surname-only forms, or slightly corrupted spellings into a more coherent public record.

For the project’s main goal, identifying the dedicatee mattered most for human dedications. That is where the question of gender classification is most direct.

In practice, though, the classifiers were also allowed to work beyond the human branch. A non-human result still needed to be meaningful and verifiable. A system that only says “not human” without saying whether the name refers to a date, a place, a concept, a religious form, or something else is much harder to validate.

Same tasks, different techniques

These tasks do not belong to a single technical family.

Once the problem has been decomposed, the same DUF can be approached through:

deterministic rules
fuzzy or heuristic matching
structured-source lookup
LLM delegation through targeted prompts

For deterministic and fuzzy approaches, the DUG/DUF split is usually a necessary preprocessing step. For LLM-based approaches, it does not have to be. The model can be asked to classify the full odonym directly, performing the split and the downstream attribution implicitly inside the same task.

This is important because the system is not organized around one canonical classifier per task. StreetGap tried both decomposition-based methods and direct LLM delegation, then combined multiple routes depending on the case, the ambiguity level, and the kind of evidence available.

Classifier families

Deterministic and fuzzy methods

The first family includes deterministic and fuzzy methods:

DUG lexicons
unofficial DUG extensions
linkwords
title and prefix patterns
known name lists
commemorative vs non-commemorative heuristics

This is where a lot of the pipeline’s discipline comes from. Titles such as San, Santa, Re, or Regina, known first names, and recurring commemorative patterns all provide signals that are cheap, explainable, and often strong enough to guide the next classification step.

Some categories are easier than others:

date: relatively easy to detect with regular expressions across Arabic numerals, Roman numerals, textual forms, day-month combinations, or month-year combinations
place: can be flagged by prefixes such as Lago di or Monte; usually decent precision, lower recall
person_single: may be suggested by known first names, titles, and honorific forms, with obvious ambiguity problems
group: can be signaled by plural linkwords, plural titles, or keywords such as Fratelli
functional: tends to be marked by prefixes such as Svincolo, Ingresso, or Bivio
event: relies on a smaller set of keywords such as Battaglia, with fewer strong deterministic cues
other vs unknown: usually the residue left when no deterministic clue is strong enough

Some of the hardest cases are single-word dedications that can look just as much like a surname as like a local toponym. Marozzo is one of many examples.

These cases were difficult not only for deterministic methods, but also for LLMs. The models were often pulled by the same bias the project is trying to measure: it looks like a surname, it appears in a street name, so it must refer to a man. In some cases they went further and hallucinated a full identity and biography, along the lines of “Giuseppe Marozzo, known for his military merits,” with no real evidence behind it.

Review was decisive here. These ambiguous single-word cases are exactly where cross-checking and high-cost review improved quality the most.

Wiki and structured-source matching

The second family uses Wikidata and Wikipedia as evidence sources.

They help in two ways:

they support identity resolution when a DUF is incomplete or ambiguous
they provide structured evidence that a proposed attribution points to a human, and that the proposed identification is coherent

They also support downstream enrichment of public-facing records once a classification is strong enough.

In principle, this looks straightforward. Given a DUF, search for candidate entities and inspect structured properties such as instance of.

In practice, the candidate space is noisy. Ambiguous names, homonyms, and name reuse create a long tail of false positives. Looking up Giuseppe Garibaldi, for example, can return not only the historical figure but also entities such as the aircraft carrier Q742121 or the bronze statue in New York Q16579812.

There is another common failure mode: confusing the dedicatee with institutions or objects that were themselves named after that dedicatee, such as schools, monuments, or other derived entities.

LLM-based classification

The third family delegates some tasks to language models.

This can be useful when the odonym is messy, abbreviated, weakly structured, or otherwise hard to classify with rules alone. Instead of solving one subproblem at a time, the model can be asked to classify the whole label at once: category, human-vs-non-human status, gender when relevant, and named-after attribution. This helped recover cases such as Piazza Giuspepe Garbibaldi in Gardone Val Trompia, later corrected by me in OpenStreetMap, and Piazzale Eroe dei Due Mondi in Savona, where Eroe dei Due Mondi (“Hero of Two Worlds”) is one of Garibaldi’s best-known epithets, referring to his campaigns in South America and Europe.

That does not make LLMs the primary truth source. They are one source of evidence among others.

Reconciliation

StreetGap is not a serial chain of rules -> Wikidata -> AI -> final label. It is a reconciliation problem.

Different classifiers can produce competing or converging outputs on the same street. Rule-based methods, Wiki-based matching, and LLM-based classification are crossed and compared. Agreement strengthens a result. Conflict weakens it or sends it to further checks. Source quality matters too.

This reconciliation layer is where the system becomes more than a simple rules engine and more than a thin wrapper around an LLM.

Confidence is cross-cutting

Confidence is not a late pipeline step. It is produced throughout the system.

Each classifier can emit:

a proposed class
a confidence signal, reliability level, or evidential strength

Confidence is then updated through reconciliation. A rule-based result, a Wiki match, and an LLM output do not all carry the same weight by default, and they do not remain unchanged once they are cross-checked against one another.

Some automatic checks can also move a result toward a more consolidated state. Examples already described in StreetGap’s public methodology include:

the presence of a proper name coherent with the assigned gender and a second identifying element
a match to a Wikidata entry classified as human
the presence of a specific Italian Wikipedia page coherent with the attribution

This is one reason the project distinguishes between the broader classified set and the smaller high-confidence subset of about 478,000 records.

AI supervision

The AI layer is supervised.

Model outputs are compared against already confirmed classifications, and that comparison informs how much weight they get in later decisions.

Published classifications therefore come from a mix of structured data, automatic rules, AI support, and human review.

Review, manual review, and consolidation

Review is a distinct phase of the system, especially when classifiers disagree.

Some cases can be accepted automatically because the available evidence is aligned and strong enough. Others cannot. When rule-based signals, structured-source lookups, and model outputs produce discordant results, the classification has to be reviewed.

That review is expensive, whether it is done by humans or through automatic techniques. In the harder cases, it means handing the full result set to a stronger model, often with web access, and asking it to search for confirming or disconfirming evidence before the classification can be promoted.

Manual review is one part of that broader review layer, not just a final fallback.

It serves two roles:

a fallback for unresolved or conflicting cases
a consolidation channel that can confirm, correct, or strengthen a classification

A classification is considered consolidated when it has either been manually approved or supported by sufficiently reliable automatic evidence.

This matters because street-name attribution does not reduce cleanly to one pass of automation. It mixes language, local history, religion, naming conventions, and source-data noise.

What the public outputs show

The map is the main product. The project exists first of all to make the gender gap visible on a national exploratory interface.

For readers who want the interactive charts, rankings, and aggregate views, the right place is the StreetGap data page.

The basic public output is the count of streets dedicated to women, men, and mixed or unknown cases. On the current person-dedicated subset, that is roughly 26,000 women, 331,000 men, and 41,000 mixed or unknown cases.

Street length is a secondary measure, not the main point of the project. It is still useful because it shows that the imbalance remains very large even when dedications are weighted by geometry rather than by street count.

Across person-dedicated streets:

streets dedicated to women account for roughly 11,000 km
streets dedicated to men account for roughly 110,000 km
mixed or unknown cases account for roughly 17,000 km

The female share rises slightly when weighting by length, but only slightly.

Recipient identification also makes concentration visible: the female side is much narrower and more repetitive than the male one. The details are best explored on the data page, which is where StreetGap publishes the interactive rankings and charts.

Where the data is still fragile

The hardest part of this kind of project is the huge mass of local odonyms.

Italian toponymy is full of street names whose meaning may still be legible in a local context, but is often opaque outside it, and sometimes no longer clear even to the people who live there. In those cases, neither the surface form nor a general-purpose model gives enough evidence. Single-word dedications like Marozzo are only one example of a much broader class.

This is where the system becomes fragile: local references tend to be weakly signaled, poorly represented in structured data, and easy to misread through generic heuristics or model priors.

What these cases really require is targeted toponymic investigation. That is also the direction the project is taking, including attempts to equip LLM-based review with web-search capabilities. The results are promising, but not yet fully mature.

The dataset is useful already, but it still needs validation, revision, and local knowledge to keep those cases from collapsing into generic guesses.

Final point

The difficulty of these local cases is also why the current phase of StreetGap is less about expanding geographic coverage and more about validation, rule refinement, and revision of uncertain cases.

This is not a bug in Italian toponymy. It is one of its strengths. Street naming is a form of territorial memory, so it is natural that many references are local, partial, and legible mainly to the communities that produced them.

The risk is that this memory gets lost, or worse, bent by inherited historical defaults. One of those defaults is central to StreetGap: public space has remembered men far more than women, and that imbalance still shapes the symbolic geography of everyday life.

At street level, the challenge is not just to classify names responsibly. It is to make that inequality visible, so that gender equality can also be discussed as a question of public memory.