refactor: 重构项目结构，将geo_tools重命名为app并更新相关引用

- 将主包名从geo_tools改为app - 更新所有模块中的引用路径 - 迁移并更新测试用例 - 添加项目规则文档 - 保持原有功能不变，仅进行结构调整
2026-04-12 19:49:56 +08:00
parent fcb8e1f255
commit db51d41aef
41 changed files with 4132 additions and 808 deletions
--- a/TUTORIAL.md
+++ b/TUTORIAL.md
@@ -0,0 +1,468 @@
+# Geo-Tools 使用教程
+
+## 1. 安装与环境准备
+
+### 1.1 前提条件
+
+在安装 Geo-Tools 之前，您需要确保已经安装了以下依赖：
+
+- Python 3.8 或更高版本
+- pip（Python 包管理工具）
+
+### 1.2 安装方法
+
+1. **克隆项目仓库**
+
+   ```bash
+   git clone <项目地址>
+   cd geo_tools
+   ```
+
+2. **安装依赖**
+
+   ```bash
+   pip install -r requirements.txt
+   ```
+
+3. **安装 Geo-Tools**
+
+   ```bash
+   pip install -e .
+   ```
+
+### 1.3 验证安装
+
+运行以下命令验证 Geo-Tools 是否安装成功：
+
+```python
+from app.io.readers import read_vector
+print("Geo-Tools 安装成功！")
+```
+
+## 2. 5分钟体验
+
+### 2.1 读取 Shapefile
+
+```python
+from app.io.readers import read_vector
+from pathlib import Path
+
+# 读取 Shapefile 文件
+# 这里使用示例数据，您可以替换为自己的文件路径
+data_path = Path("data/sample/sample_points.geojson")
+gdf = read_vector(data_path)
+
+# 查看数据基本信息
+print(f"数据形状：{gdf.shape}")
+print(f"数据列名：{list(gdf.columns)}")
+print(f"坐标系：{gdf.crs}")
+```
+
+### 2.2 预览数据
+
+```python
+# 预览前5行数据
+print("\n数据预览：")
+print(gdf.head())
+
+# 或者使用 rows 参数直接只读取前5行
+print("\n使用 rows 参数预览：")
+gdf_preview = read_vector(data_path, rows=5)
+print(gdf_preview)
+```
+
+### 2.3 制作缓冲区
+
+```python
+from app.core.geometry import buffer_geometry
+
+# 为每个点创建缓冲区
+gdf["buffer"] = gdf.geometry.apply(lambda geom: buffer_geometry(geom, distance=0.1))
+
+# 查看缓冲区结果
+print("\n缓冲区创建完成！")
+print(f"原始几何类型：{gdf.geometry.geom_type.unique()}")
+print(f"缓冲区几何类型：{gdf['buffer'].geom_type.unique()}")
+```
+
+### 2.4 保存结果
+
+```python
+from app.io.writers import write_vector
+
+# 保存为 GeoJSON 文件
+output_path = Path("output/buffered_points.geojson")
+write_vector(gdf, output_path)
+print(f"\n结果已保存至：{output_path}")
+```
+
+## 3. 进阶功能详解
+
+### 3.1 大文件处理
+
+#### 3.1.1 使用 rows 参数预览数据
+
+当处理大文件时，您可以使用 `rows` 参数只读取前几行数据，快速了解数据结构，而不需要加载整个文件：
+
+```python
+from app.io.readers import read_vector
+from pathlib import Path
+
+# 只读取前10行数据进行预览
+large_file_path = Path("path/to/large_file.shp")
+gdf_preview = read_vector(large_file_path, rows=10)
+
+print(f"预览数据包含 {len(gdf_preview)} 条记录")
+print(gdf_preview.head())
+print(gdf_preview.columns)
+```
+
+#### 3.1.2 使用 chunk_size 分块读取
+
+对于非常大的文件，您可以使用 `chunk_size` 参数进行分块读取，逐块处理数据，避免内存溢出：
+
+```python
+from app.io.readers import read_vector
+from pathlib import Path
+
+# 分块读取，每块10000条数据
+large_file_path = Path("path/to/large_file.shp")
+
+# 使用 for 循环处理每个数据块
+for i, chunk in enumerate(read_vector(large_file_path, chunk_size=10000)):
+    print(f"处理第 {i+1} 块数据，包含 {len(chunk)} 条记录")
+    
+    # 在这里进行您的处理操作
+    # 例如：计算缓冲区
+    chunk["buffer"] = chunk.geometry.apply(lambda geom: buffer_geometry(geom, distance=0.1))
+    
+    # 保存当前块的结果
+    output_path = Path(f"output/chunk_{i+1}.geojson")
+    write_vector(chunk, output_path)
+    print(f"第 {i+1} 块结果已保存")
+```
+
+### 3.2 坐标系转换
+
+#### 3.2.1 什么时候需要转换坐标系？
+
+- 当您需要进行距离、面积计算时，地理坐标系（如 EPSG:4326）的单位是度，不适合直接计算
+- 当您需要将数据与其他不同坐标系的数据集叠加时
+- 当您需要使用特定坐标系的工具或服务时
+
+#### 3.2.2 如何转换坐标系
+
+```python
+from app.io.readers import read_vector
+from app.core.projection import reproject
+from pathlib import Path
+
+# 读取数据
+data_path = Path("data/sample/sample_points.geojson")
+gdf = read_vector(data_path)
+print(f"原始坐标系：{gdf.crs}")
+
+# 转换到 Web Mercator 坐标系（EPSG:3857）
+gdf_3857 = reproject(gdf, "EPSG:3857")
+print(f"转换后坐标系：{gdf_3857.crs}")
+
+# 或者在读取时直接指定目标坐标系
+gdf_direct = read_vector(data_path, crs="EPSG:3857")
+print(f"直接指定坐标系：{gdf_direct.crs}")
+```
+
+### 3.3 空间分析
+
+#### 3.3.1 缓冲区分析
+
+```python
+from app.io.readers import read_vector
+from app.core.geometry import buffer_geometry
+from pathlib import Path
+
+# 读取数据
+data_path = Path("data/sample/sample_points.geojson")
+gdf = read_vector(data_path)
+
+# 创建不同距离的缓冲区
+gdf["buffer_05"] = gdf.geometry.apply(lambda geom: buffer_geometry(geom, distance=0.05))
+gdf["buffer_10"] = gdf.geometry.apply(lambda geom: buffer_geometry(geom, distance=0.1))
+
+# 保存结果
+output_path = Path("output/buffers.geojson")
+write_vector(gdf, output_path)
+print(f"缓冲区分析结果已保存至：{output_path}")
+```
+
+#### 3.3.2 空间叠加分析
+
+```python
+from app.io.readers import read_vector
+from app.analysis.spatial_ops import overlay
+from pathlib import Path
+
+# 读取两个数据集
+points_path = Path("data/sample/sample_points.geojson")
+regions_path = Path("data/sample/sample_regions.geojson")
+
+points = read_vector(points_path)
+regions = read_vector(regions_path)
+
+# 执行空间叠加（交集）
+overlay_result = overlay(points, regions, how="intersection")
+print(f"叠加结果包含 {len(overlay_result)} 条记录")
+
+# 保存结果
+output_path = Path("output/overlay_result.geojson")
+write_vector(overlay_result, output_path)
+print(f"叠加分析结果已保存至：{output_path}")
+```
+
+#### 3.3.3 最近邻查找
+
+```python
+from app.core.geometry import distance_between
+from app.io.readers import read_vector
+from pathlib import Path
+
+# 读取点数据集
+points_path = Path("data/sample/sample_points.geojson")
+gdf = read_vector(points_path)
+
+# 定义目标点
+target_point = gdf.geometry.iloc[0]
+print(f"目标点：{target_point}")
+
+# 计算每个点到目标点的距离
+gdf["distance"] = gdf.geometry.apply(lambda geom: distance_between(geom, target_point))
+
+# 找到最近的点
+nearest_point = gdf.loc[gdf["distance"].idxmin()]
+print(f"最近的点：{nearest_point.geometry}")
+print(f"距离：{nearest_point['distance']}")
+```
+
+## 4. 常见问题排查
+
+### 4.1 报错“文件打不开”怎么办？
+
+**可能原因：**
+- 文件路径不存在
+- 文件路径包含中文或特殊字符
+- 文件格式不支持
+- 文件损坏
+
+**解决方案：**
+1. 检查文件路径是否正确，使用绝对路径
+2. 确保文件路径不包含中文或特殊字符
+3. 确认文件格式是否被支持（GeoJSON、Shapefile、GPKG、GDB等）
+4. 尝试使用其他软件打开文件，确认文件是否损坏
+
+**示例代码：**
+
+```python
+from app.io.readers import read_vector
+from pathlib import Path
+
+# 使用绝对路径
+try:
+    file_path = Path("c:/data/my_shapefile.shp")
+    gdf = read_vector(file_path)
+    print("文件读取成功！")
+except Exception as e:
+    print(f"文件读取失败：{e}")
+    # 检查路径是否存在
+    if not file_path.exists():
+        print("错误：文件路径不存在")
+    # 检查文件扩展名
+    if file_path.suffix not in [".shp", ".geojson", ".gpkg"]:
+        print("错误：文件格式可能不支持")
+```
+
+### 4.2 报错“几何无效”怎么办？
+
+**可能原因：**
+- 几何数据损坏
+- 几何自相交
+- 几何为空
+
+**解决方案：**
+1. 使用 `fix_geometry` 函数尝试修复无效几何
+2. 使用 `is_valid_geometry` 函数检查几何有效性
+3. 过滤掉无效几何
+
+**示例代码：**
+
+```python
+from app.core.geometry import fix_geometry, is_valid_geometry
+from app.io.readers import read_vector
+from pathlib import Path
+
+# 读取数据
+data_path = Path("data/sample/sample_points.geojson")
+gdf = read_vector(data_path)
+
+# 检查并修复几何
+print("检查几何有效性...")
+valid_count = 0
+fixed_count = 0
+
+for i, geom in enumerate(gdf.geometry):
+    if is_valid_geometry(geom):
+        valid_count += 1
+    else:
+        # 尝试修复
+        fixed_geom = fix_geometry(geom)
+        if fixed_geom is not None:
+            gdf.geometry.iloc[i] = fixed_geom
+            fixed_count += 1
+
+print(f"有效几何：{valid_count}")
+print(f"修复几何：{fixed_count}")
+print(f"无效几何：{len(gdf) - valid_count - fixed_count}")
+
+# 过滤掉仍然无效的几何
+gdf_valid = gdf[gdf.geometry.apply(is_valid_geometry)]
+print(f"过滤后剩余几何：{len(gdf_valid)}")
+```
+
+### 4.3 内存爆了怎么办？
+
+**可能原因：**
+- 文件太大，一次性加载到内存
+- 处理过程中创建了过多临时对象
+
+**解决方案：**
+1. 使用 `rows` 参数预览数据
+2. 使用 `chunk_size` 参数分块读取
+3. 处理完数据后及时释放内存
+4. 考虑使用更高效的数据结构和算法
+
+**示例代码：**
+
+```python
+from app.io.readers import read_vector
+from app.io.writers import write_vector
+from pathlib import Path
+
+# 分块处理大文件
+large_file = Path("path/to/large_file.shp")
+output_file = Path("output/processed_file.geojson")
+
+# 分块读取并处理
+chunks = []
+for i, chunk in enumerate(read_vector(large_file, chunk_size=10000)):
+    print(f"处理第 {i+1} 块...")
+    
+    # 在这里进行处理
+    # 例如：添加面积列
+    chunk["area"] = chunk.geometry.area
+    
+    chunks.append(chunk)
+
+# 合并所有块
+import geopandas as gpd
+result = gpd.GeoDataFrame(pd.concat(chunks, ignore_index=True))
+
+# 保存结果
+write_vector(result, output_file)
+print(f"处理完成，结果已保存至：{output_file}")
+
+# 释放内存
+import gc
+gc.collect()
+```
+
+## 5. 完整案例
+
+### 5.1 土地利用数据分析
+
+**场景：** 读取土地利用数据，筛选出耕地，计算耕地总面积，然后导出为 GeoJSON 文件。
+
+**步骤：**
+
+1. **读取数据**
+
+   ```python
+   from app.io.readers import read_vector
+   from pathlib import Path
+   
+   # 读取土地利用数据
+   landuse_path = Path("data/landuse.shp")
+   # 先预览数据结构
+   landuse_preview = read_vector(landuse_path, rows=10)
+   print("数据列名：", list(landuse_preview.columns))
+   print("土地利用类型：", landuse_preview["type"].unique())
+   
+   # 读取完整数据
+   landuse = read_vector(landuse_path)
+   print(f"总数据量：{len(landuse)}")
+   ```
+
+2. **筛选耕地**
+
+   ```python
+   # 假设耕地的类型代码是 "1" 或 "耕地"
+   # 根据实际数据结构调整条件
+   farmland = landuse[landuse["type"] == "耕地"]
+   print(f"耕地数量：{len(farmland)}")
+   ```
+
+3. **计算耕地总面积**
+
+   ```python
+   from app.core.projection import reproject
+   
+   # 检查坐标系
+   print(f"原始坐标系：{farmland.crs}")
+   
+   # 如果是地理坐标系，转换到投影坐标系以获得准确的面积
+   if farmland.crs and farmland.crs.to_epsg() == 4326:
+       # 转换到 UTM 坐标系（根据数据所在区域选择合适的 EPSG 代码）
+       farmland_proj = reproject(farmland, "EPSG:32649")  # 示例：UTM 49N
+       print(f"转换后坐标系：{farmland_proj.crs}")
+   else:
+       farmland_proj = farmland
+   
+   # 计算面积（单位：平方米）
+   farmland_proj["area"] = farmland_proj.geometry.area
+   total_area = farmland_proj["area"].sum()
+   print(f"耕地总面积：{total_area:.2f} 平方米")
+   print(f"耕地总面积：{total_area/10000:.2f} 公顷")
+   ```
+
+4. **导出结果**
+
+   ```python
+   from app.io.writers import write_vector
+   
+   # 导出为 GeoJSON
+   output_path = Path("output/farmland.geojson")
+   write_vector(farmland_proj, output_path)
+   print(f"耕地数据已导出至：{output_path}")
+   
+   # 导出面积统计
+   import pandas as pd
+   stats = pd.DataFrame({
+       "总耕地数量": [len(farmland)],
+       "总面积（平方米）": [total_area],
+       "总面积（公顷）": [total_area/10000]
+   })
+   stats_path = Path("output/farmland_stats.csv")
+   stats.to_csv(stats_path, index=False, encoding="utf-8-sig")
+   print(f"统计数据已导出至：{stats_path}")
+   ```
+
+## 6. 总结
+
+Geo-Tools 是一个功能强大的地理数据处理库，提供了丰富的空间分析工具和便捷的文件 I/O 功能。通过本教程，您应该已经掌握了：
+
+- 基本的文件读取和写入操作
+- 大文件的分块处理技巧
+- 坐标系转换的方法
+- 常见的空间分析操作
+- 常见问题的排查方法
+
+如果您在使用过程中遇到任何问题，可以参考本教程的常见问题排查部分，或者查看项目的详细文档。
+
+祝您使用愉快！