Overview
Dataset of 20,000 procedurally generated personas representing San Francisco population. Includes demographics, behavioral patterns, social networks, and mobility profiles for urban simulation.
Dataset Composition
Demographics
Statistically accurate distribution of:
- Age and gender
- Household composition
- Income levels
- Employment sectors
- Educational attainment
Behavioral Profiles
Individual characteristics including:
- Daily activity patterns
- Transportation preferences
- Commercial behaviors
- Social interaction frequency
- Routine schedules
Social Networks
Generated social graphs with:
- Family relationships
- Workplace connections
- Community ties
- Friendship networks
- Influence propagation paths
Mobility Patterns
Synthetic movement data:
- Home and work locations
- Daily travel routes
- Mode choice preferences
- Time-of-day patterns
- Weekend behaviors
Generation Methodology
Procedural generation pipeline ensures statistical alignment with census data while maintaining individual variation. Privacy-preserving by design as all individuals are synthetic.
Applications
Urban Planning
Evaluate infrastructure changes and policy impacts on representative population. Assess equity and accessibility across demographic groups.
Transportation Modeling
Simulate traffic patterns, transit demand, and mode shift scenarios with realistic population distribution.
Epidemiology
Disease spread modeling with realistic social contact networks and mobility patterns.
Public Health
Intervention targeting and resource allocation based on demographic and behavioral segmentation.
Emergency Response
Evacuation planning and resource distribution accounting for population heterogeneity.
Format
JSON dataset with per-persona entries. Includes spatial coordinates, temporal patterns, and network connections. Compatible with agent-based modeling frameworks.
Validation
Generated distributions validated against:
- US Census data
- American Community Survey
- SF transportation surveys
- Employment statistics