Crime doesn't stop during a pandemic. It adapts. When California issued mandatory stay-at-home orders in March 2020, it didn't eliminate property crime in Los Angeles — it redirected it. This is the core idea behind crime displacement theory: when one criminal opportunity closes, offenders shift to another.
This post walks through an analysis I ran on over one million LAPD incident records to test whether that displacement actually happened. Did burglaries drop because homes were occupied? Did vehicle thefts rise because cars sat unattended on empty streets? The data has a clear answer, and it tells a more interesting story than I expected.
The Dataset
The Los Angeles Open Data Portal publishes a rolling crime dataset going back to 2020. The full file contains 1,004,991 reported incidents across 28 columns — temporal, geospatial, and demographic data for every crime the LAPD logged from January 2020 through mid-2025.
For this analysis, I narrowed the scope in two ways:
- Geography: The Van Nuys police division. Restricting to a single division keeps the analysis locally coherent and lets displacement patterns surface without being washed out by city-wide noise.
- Crime types: Burglary and Grand Theft Auto (recorded as "Vehicle – Stolen"). These two crimes are natural candidates for displacement analysis because they target the same general category of property — things people own — but in different environments. They also have higher reporting rates than other property crimes because insurance typically requires a police report.
After filtering and cleaning, the working dataset came to 7,854 records spanning January 2020 to December 2024. The average victim age was 41 — consistent with typical home and vehicle ownership demographics. Crimes were distributed across all hours, but the interquartile range ran from 8 AM to 7 PM, concentrated in daylight and early evening.
Cleaning the Data
Raw LAPD data requires some work before it's usable. A few specific issues came up:
Age outliers. A surprising number of victim age records were logged as 0. These aren't actual ages — they're placeholder values for incidents where age wasn't collected or applicable (property-only crimes where there's no direct victim present). Rather than dropping those rows and losing valid temporal data, I imputed them with the median age for that crime type. This preserves row count for time-series analysis while keeping demographic statistics honest.
Missing premise descriptions. Thirteen records had no location type logged. Since premise description is central to the displacement analysis — it's how I distinguish a street theft from a residential burglary — those rows were dropped rather than guessed.
Temporal parsing. The raw date column came in as a string. Converting it to a proper datetime object was straightforward, but there's a privacy consideration here that matters: precise crime timestamps, combined with rounded coordinates, can narrow down individual incidents to a specific block at a specific hour. To reduce re-identification risk, I dropped the precise date after extracting the features I needed (month, year, day of week, hour), and I'll come back to this in more detail in the next post.
The Lockdown Effect
The clearest signal in the data is the temporal divergence during the lockdown period — roughly March 2020 through June 2021.
During that window, monthly vehicle theft incidents in Van Nuys rose sharply, while burglary counts stayed relatively flat. Once lockdown restrictions eased, both crime types converged back toward each other and tracked more closely through 2022 and beyond.
This pattern is exactly what displacement theory predicts. When people are home around the clock, residential burglary becomes harder. Occupied homes are louder, riskier, and more likely to result in immediate detection. At the same time, streets and parking lots emptied out during lockdown — and the cars parked on them went unattended for longer stretches than normal. The path of least resistance shifted from residential break-ins to vehicle theft.
It's worth noting what the data doesn't show: a total increase in criminal behavior. The point isn't that the pandemic created more criminals. It's that the same criminal effort redistributed to different targets. Burglary didn't spike; vehicle theft did. The sum shifted more than the aggregate.
When Crimes Happen
Breaking crimes down by hour of the day reveals two very different behavioral patterns for the two crime types.
Vehicle theft is heavily concentrated in the evening and overnight hours — 5 PM through 11 PM accounts for the majority of incidents. The explanation is straightforward: cars sit unattended on residential streets overnight. There's low foot traffic, minimal surveillance, and plenty of time.
Burglary tells a different story. It's remarkably flat across the entire 24-hour cycle. There's no strong nighttime peak, no quiet morning valley. Burglary is a constant background rate, not a time-of-day phenomenon. This likely reflects the variety of burglary contexts — commercial burglaries happen during off-hours, residential ones often target homes that appear unoccupied during the day, and opportunistic entries can happen anytime.
For resource allocation purposes, these patterns matter. Increased vehicle theft requires different patrol timing than burglary suppression. Treating them as interchangeable because they're both "property crime" would be a mistake.
Where Crimes Happen
To map the geographic dimension of displacement, I grouped crime premise descriptions into two broad categories: Public Spaces (streets, parking lots, commercial areas) and Residential Zones (single-family homes, apartments, driveways).
The resulting spatial scatter plot shows a clear separation. Vehicle theft incidents cluster linearly along the major arterial roads and commercial corridors of Van Nuys — the city grid. Burglaries cluster in the residential blocks nestled between those corridors. The geography of each crime type matches its behavioral logic: vehicles get stolen where vehicles park en masse, and homes get burglarized in the neighborhoods where homes are.
Areas where the two categories overlap — dense mixed-use zones, commercial streets adjacent to residential blocks — show the highest incident concentrations. These are the natural "hot spots" where opportunity for both crime types converges.
Predicting Crime Type from Context
A secondary question in this analysis was whether environmental context alone — time, day, location type, victim age — is enough information to predict which crime type an incident is. This was framed as a binary classification task: given those features, was this a burglary or a vehicle theft?
Two models were trained on a 75/25 train-test split:
- Logistic Regression (baseline): accuracy 91.24%, F1-score 0.9236
- Random Forest (100 trees): accuracy 98.12%, F1-score 0.9842
The Random Forest's biggest improvement was in recall — it reduced false negatives (actual vehicle thefts misclassified as burglaries) from 130 down to 16. The linear model struggled to capture the non-linear interaction between time of day and location type that distinguishes vehicle theft from burglary. The Random Forest handled that complexity better.
But here's where it gets interesting — and where I'd caution against reading too much into those numbers.
The Data Leakage Problem
Inspecting the Random Forest's feature importances revealed that victim age was by far the most influential predictor — accounting for nearly half the model's decision weight, far more than location type or hour of day.
That's a red flag. Why would victim age be so predictive of crime type?
The answer is the imputation step from earlier. During data cleaning, I replaced the placeholder age value of 0 with the median age for each crime type separately. Burglary victims had one median; vehicle theft victims had another. The model learned this injected signal and used it as a near-perfect shortcut. It wasn't learning the relationship between age and crime type from the real data — it was learning the artifact of our preprocessing decision.
This is a textbook example of data leakage: information about the target variable getting encoded into a feature during preprocessing. The model's 98% accuracy is real on the training and test sets, but it would almost certainly collapse on genuinely new data where victim ages are unknown or unimputed. A model that accurate because of a preprocessing artifact isn't a model you can trust in production.
The fix is to impute missing ages globally (using the overall median regardless of crime type) rather than crime-type-specifically. That would remove the leak and give a more honest picture of what the model actually learned.
Ethical Dimensions
Any analysis of crime data that touches on geography and demographics carries ethical weight, and it's worth being direct about it.
Reporting bias. The LAPD dataset only contains crimes that were reported. Property crimes in lower-income neighborhoods are systematically underreported relative to wealthier areas — partly due to distrust of law enforcement, partly because the insurance requirement that drives burglary and vehicle theft reporting is itself income-correlated. The analysis restricts to burglary and vehicle theft specifically because these have higher reporting rates, but the bias doesn't disappear entirely. Any model trained on this data will inherit the reporting gaps in the underlying dataset.
Predictive policing risks. If you took the displacement patterns here and used them to deploy police resources, you'd be reinforcing existing patrol patterns rather than responding to actual crime distribution. Over-policed neighborhoods generate more arrests and more reported incidents — which then justify further over-policing. Models trained on reported crime create feedback loops. Resource allocation based on these predictions requires careful human judgment, not automation.
Privacy. Even rounded coordinates, combined with hour and premise type, can narrow incidents down to a small number of real locations. The decision to aggregate dates into month/year rather than keep precise timestamps, and to group locations into broad categories rather than specific addresses, reflects a deliberate effort to limit re-identification risk for individual victims. Public data is still data about people.
What the Data Confirms
The displacement hypothesis holds up. The lockdown did shift crime. Residential burglary stayed flat while vehicle theft spiked, and the pattern reversed as restrictions eased. The geographic separation between public and residential crime types is real and spatially coherent. The hourly patterns for each crime type have different shapes that reflect different criminal logics.
What the data also shows is how quickly a well-intentioned preprocessing decision can corrupt a machine learning model — and how important it is to inspect feature importances before claiming a model is production-ready. A 98% accurate model with a data leakage problem is worse than a 91% accurate model you understand, because it creates false confidence.
If you're working with similar datasets — public incident records, crime data, social service logs — the same questions apply: what's missing from the underlying data, what did your cleaning steps inadvertently encode, and what assumptions does a predictive model inherit from both?