03. Generate Data
The data generator is written in Python. Install Python and run the generator to create the learning database.
Install Python
Python 3.10 or higher is required, with Python 3.14 as the recommended version (used for development and testing of this tutorial).
Check if it's already installed:
If Python 3.10.x or higher is displayed, skip to Install Dependencies.
How to Install Python
1. Download
Click the Download Python 3.x.x button at python.org/downloads. The 64-bit Windows version is automatically selected.
2. Install
Most Important Step
You must check the "Add python.exe to PATH" checkbox at the bottom of the first installation screen. If you miss this, the python command won't work in the terminal.
Click Install Now to proceed. If "Disable path length limit" appears on the completion screen, click it (prevents long path issues).
3. Verify Installation
Open Command Prompt:
How to Open Command Prompt
- Search for
cmdin the Start menu, or - Win+R → type
cmd→ Enter
If Python 3.12.x or similar is displayed, the installation was successful.
If python Cannot Be Found
- You may have missed "Add to PATH" during installation. Uninstall and reinstall Python, making sure to check the checkbox.
- Or try
py --version. If the Windows Python Launcher is installed, you can also run with thepycommand. In this case, usepyinstead ofpythonin subsequent commands.
1. Check Homebrew
If not installed, install it first:
2. Install Python
3. Verify Installation
Use python3
On macOS, use python3 instead of python. Also use pip3 instead of pip in subsequent commands.
Command Line Basics
From this step onward, you'll enter commands in a terminal (or Command Prompt). Even if you're not familiar, just copy and paste the commands below exactly as shown.
| Command | Meaning | Example |
|---|---|---|
cd folder_name |
Navigate to the folder | cd sql-tutorial |
ls (macOS/Linux) / dir (Windows) |
List files in the current folder | |
| Enter | Execute the command |
How to paste:
- Windows Command Prompt: Right-click
- macOS Terminal: Cmd+V
- Linux Terminal: Ctrl+Shift+V
Install Dependencies
Navigate to the project folder and install the required libraries:
Use pip3 on macOS/Linux
If you get pip: command not found, use pip3 instead:
What is a Virtual Environment (venv)?
A virtual environment creates an isolated Python package space for each project. It's safe because it doesn't affect the system Python. Use it when you encounter Permission denied errors or want to avoid conflicts with system packages.
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate
# Install in activated environment
pip install -r requirements.txt
When the virtual environment is activated, (.venv) appears before your prompt.
Generate Data
Creates output/ecommerce-ko.db (~80MB, 680K rows). Takes about 20 seconds.
Creates schema.sql, data.sql, and procedures.sql in the output/mysql/ directory.
To apply directly to the server:
Enter the root password you set in step 02.
Creates schema.sql, data.sql, and procedures.sql in the output/postgresql/ directory.
To apply directly to the server:
Enter the postgres password you set in step 02.
Use python3 on macOS/Linux
If you get python: command not found, run python -m src.cli.generate --size small instead.
Verify Generation
When completed successfully, you'll see output like this:
Data generation complete: 696,771 total records (12.8s)
Exporting to SQLite...
-> ./output/ecommerce-ko.db (80.7 MB)
Export complete (11.2s)
Total elapsed time: 24.0s
Check that files were created in the output/ folder:
If You Need More Options
Advanced options for changing data size, generating English data, adding noise, custom config files, etc. are covered in Advanced Generator Options.
Manual Application for MySQL / PostgreSQL
If you generated SQL files without --apply, apply them to the server manually:
Troubleshooting
Common Issues
python/pip Command Not Found
- Windows: Try
pyorpy -3 - macOS/Linux: Use
python3,pip3 - If PATH isn't set: Reinstall Python and check "Add to PATH"
ModuleNotFoundError
You haven't run pip install -r requirements.txt, or the virtual environment is deactivated.
Out of Memory During Generation
--size medium requires about 4GB of memory. Start with --size small.
Connection Error with MySQL/PG --apply
Check if the DB server is running: