更新readme

This commit is contained in:
songjc 2025-07-03 11:05:55 +08:00
parent 40668c7f3f
commit e201f0e21e
1 changed files with 422 additions and 5 deletions

View File

@ -1,11 +1,428 @@
**Read in: [English](README_en.md) | [中文](README.md)**
# JCS-pubInter-Cloud Storage Public Infrastructure + Out-of-the-Box Client
# JCS-pubJointCloud Storage Public Infrastructure + Out-of-the-Box Client
## Introduction
JCS-pub (Inter-Cloud Storage Public Infrastructure + Out-of-the-Box Client) is an open storage service based on Inter-Cloud Storage technology. It provides functionalities including cross-cloud data migration, cross-cloud data storage, hybrid storage combining local and multi-cloud environments, out-of-the-box deployment, and public infrastructure for Inter-Cloud Storage. The development path of Inter-Cloud Storage is shown below.
## Project Overview
JointCloud Storage is a storage service model built on a peer-to-peer intercloud collaboration mechanism. It manages storage resources across multiple clouds and provides users with a unified data storage service. The core idea emphasizes the independent and equal status of each cloud, connecting them in a non-intrusive manner. It also promotes intercloud collaboration by integrating computing, storage, and networking resources across clouds to deliver high-quality storage services.
This project aims to turn JointCloud Storage into a public infrastructure that is easy to use for individuals and enterprises alike. With a ready-to-use client, users can quickly access efficient intercloud storage without worrying about deploying additional components. At the same time, the system allows flexible customization of its features to meet diverse needs.
## Evolution Roadmap
<center>
<img src="docs/figs/roadmap.jpg" width=100% /></center>
<img src="docs/figs/roadmap.png" width=100% /></center>
## Features
### 1、Data Migration
- **Cross-Cloud Migration Support**: Enables users to migrate data across multiple cloud storage providers.
- **Policy-Driven Migration Engine**
- Filtering Rules: Migrate files based on size, file extension, or directory path.
- Scheduling Control: Define migration time windows.
- Post-Migration Actions: Choose whether to retain or delete the original data after migration.
- **Migration Efficiency Optimization**: Improve migration performance and reduce bandwidth costs by leveraging the **JointCloud Storage Public Infrastructure**.
### 2、Cross-Cloud Data Storage
- **Unified Multi-Cloud View**: Store data across multiple cloud platforms while presenting a unified view to users, abstracting away multi-cloud complexity.
- **Multi-Level Disaster Recovery**: Provides cloud-level resilience through various redundancy strategies, including erasure coding, replication, and hybrid redundancy (erasure coding + replication).
- **Adaptive Redundancy Strategy**
- **Redundancy Scheme**: Customize the number of replicas, EC data blocks, and coding algorithms.
- **Data Placement**Configure where each replica or EC block is stored.
- **Cross-Cloud Command Set**: Supports custom commands such as upload, download, random read, cross-cloud scheduling, and cross-cloud repair.
- **Multiple Data Access Methods**
- REST API
- Command-line interface
- FUSE file system
- **Access Efficiency Optimization**: Boost cross-cloud data access speed and reduce traffic costs via the **JointCloud Storage Public Infrastructure**.
### 3、Hybrid Local + Multi-Cloud Storage
- **Unified Hybrid Storage View**: Store data across both **local file systems** and **multiple remote cloud platforms**, while presenting a unified view and hiding underlying complexities.
- **Intelligent Data Collaboration Strategy**: Offers flexible policies for syncing data between local and remote storage.
- **Data Filtering**: Dynamically select files for remote sync based on size, path, or extension.
- **Local Retention**: Configure whether to keep local copies of remote data.
- **Bidirectional Sync**: Independently configure local-to-cloud and cloud-to-local synchronization.
- **Collaboration Efficiency Optimization**: Enhance sync performance and reduce traffic costs using the **JointCloud Storage Public Infrastructure**.
- **Multiple Data Access Methods**
- REST API
- Command-line interface
- FUSE file system
### 4、Ready to Use
- Simply download and install the client — no additional components or configuration needed.
### 5、JointCloud Storage Public Infrastructure
- **Unified Hybrid Storage View**: Supports unified access across **local file systems** and **multiple remote cloud platforms**, abstracting the complexity behind a consistent interface.
- **Dual Operation Modes**:
- **Standalone Mode**: Fully offline usage without any external service dependencies.
- **Infrastructure-Connected Mode**: Operates with public infrastructure support(**enhances performance and reduces bandwidth cost**)
- **Open Source Public Infrastructure**:
- Users can self-deploy or connect to existing public infrastructure.
- Free Public Infrastructure Access:
- To obtain an account, password, and certificate, please email `song-jc@foxmail.com`. The application process is illustrated below.
<center>
<img src="docs/figs/application.png" width=90% /></center>
## Architecture Diagram
<center>
<img src="docs/figs/architecture.png" width=70% /></center>
### 1、Cloud Storage Space
- Before using the JointCloud Storage client, users must prepare their own cloud storage space. This typically refers to object storage buckets or directories in file storage services. Both public cloud storage services and private cloud deployments are supported.
- For tutorials on registering mainstream public cloud storage services, see: [Guide](docs/公有云注册及使用教程.md)
### 2、JointCloud Storage Public Infrastructure
- The public infrastructure consists of multiple proxy nodes. These nodes collaborate with clients to perform cross-cloud operations such as data migration, upload, and download. They also optimize data transfer routes and support concurrent access, improving performance and reducing bandwidth costs. This design prevents the client from becoming a performance bottleneck.
### 3、JointCloud Storage Client
- The JointCloud Storage Client is deployed on the user's server and serves as both a data service gateway and metadata management node.
- Users manage their cloud storage data through various interfaces provided by the client.
- All metadata and cloud storage credentials remain on the client managed by the user. When a proxy node needs to access a users cloud storage, the client temporarily grants access permissions as needed.
## Installation
### 1、Third-Party Dependencies
The following components must be installed manually:
- `MySQL`: Version 8.0 or above. Please create a database and user account for the JCS client to use.
### 2、Docker Installation (Recommended)
Currently, only the JCS client is available as a Docker image. The jcsctl tool must be used as a precompiled executable — [click here to download]().
Pull the Docker image
```bash
docker pull jcs:latest
```
If you already have your configuration and certificate files prepared, simply mount the directory containing these files into the container using the `-v` flag, and use the `-c` flag to specify the path to the configuration file (inside the container).
If you havent prepared the configuration yet and want to generate it using the clients `init` command, you should still mount a local directory using `-v` so that the generated configuration and certificate files can be saved to it.
Here is an example command:
```bash
# Assuming the config files are on the host at /etc/jcsconfs
docker run -v /etc/jcsconfs:/opt/confs \
jcs serve -c /opt/confs/config.json # Note: the config path is inside the container
```
### 3、Build from Source
Before compiling, make sure you have the following installed:
- `Go`: Version 1.23 or above
- `mage`: A Makefile-like build tool. GitHub repo:[click here](https://github.com/magefile/mage)
After installing dependencies, clone this repository and run the following command in the project root:
```powershell
mage bin
```
After successful execution, a build directory will be created in the project root containing the compiled executables:
- `jcs`: The main JCS client
- `jcsctl`: Command-line tool for the client (add it to your PATH for convenience)
- `coordinator`: Coordination node for the JCS system (can be ignored if using the public JCS infrastructure)
- `hub`: Central node of the JCS system (can also be ignored if using the public infrastructure)
### 4、Install Precompiled Executables
Download and extract the binaries that match your system environment — [click here to download]().
## Usage Guide
### 1、Generate Configuration File
Use the `init` command of the JCS client to start the configuration process and follow the prompts to fill in the required information.
**Note**: If you plan to run the JCS client using the Docker image, all paths in the configuration should refer to locations **inside the container**.
After completing the command, the following files will be generated:
- `config.json`: The full configuration file for the client.
- `ca_cert.pem`: Root certificate for the HTTP service.
- `ca_key.pem`: Root private key for the HTTP service(**must be securely stored by the user**).
- `server_cert.pem`、`server_key.pem`: Server certificate and key signed by the root key.
- `client_cert.pem`、`client_key.pem`: Client certificate and key signed by the root key, used by jcsctl or third-party programs.
All files except `ca_key.pem` will be used during client operation.
The configuration file fields are explained as follows:
```json
{
"hubRPC": {
"rootCA": "" // Path to root certificate for communication with the Hub service
},
"coordinatorRPC": {
"address": "127.0.0.1:5009", // Address of the Coordinator service
"rootCA": "" // Path to root certificate for communication with the Coordinator service, usually same as Hub's rootCA
},
"logger": {
"output": "stdout", // Log output mode: stdout or file
"outputFileName": "client", // Log file name (effective if output is file)
"outputDirectory": "log", // Directory for log files (effective if output is file)
"level": "debug" // Log level: debug, info, warn, error
},
"db": {
"address": "127.0.0.1:3306", // MySQL database address
"account": "root", // Database username
"password": "123456", // Database password
"databaseName": "cloudream" // Database name
},
"connectivity": {
"testInterval": 300 // Interval in seconds to test connectivity with the Hub
},
"downloader": {
"maxStripCacheCount": 100, // Maximum number of EC stripes cached during file reads
"ecStripPrefetchCount": 1 // Number of EC stripes to prefetch during read operations
},
"downloadStrategy": {
"highLatencyHub": 35 // Latency threshold in ms; Hubs above this latency are considered high latency
},
"tickTock": {
"ecFileSizeThreshold": 5242880, // Minimum file size in bytes to apply EC encoding
"accessStatHistoryWeight": 0.8 // Weight of previous day's access count when updating today's access statistics (range 0~1)
},
"http": {
"enabled": true, // Enable HTTP service
"listen": "127.0.0.1:7890", // HTTP service listen address
"rootCA": "", // Path to root certificate
"serverCert": "", // Path to server certificate
"serverKey": "", // Path to server private key
"clientCerts": [], // List of client certificates signed by the root CA
"maxBodySize": 5242880 // Maximum HTTP request body size in bytes
},
"mount": {
"enabled": false, // Enable FUSE mount
"mountPoint": "", // Mount directory path
"gid": 0, // GID for files and folders in the mount directory
"uid": 0, // UID for files and folders in the mount directory
"dataDir": "", // Cache data directory
"metaDir": "", // Metadata directory
"maxCacheSize": 0, // Max cache size in bytes (0 means unlimited; not enforced in real-time)
"attrTimeout": "10s", // OS cache timeout for file attributes
"uploadPendingTime": "30s", // Delay after file modification before upload starts
"cacheActiveTime": "1m", // Time cached data remains active in memory before cleanup
"cacheExpireTime": "1m", // Time cached files remain on disk after being removed from memory (effective only for fully uploaded files)
"scanDataDirInterval": "10m" // Interval to scan the cache directory
},
"accessToken": {
"account": "", // Account for logging into the public JCS system
"password": "" // Password for the public JCS system
}
}
```
**Note**: If manually editing this JSON file, remove all comments as JSON does not support comments.
### 2、Command Line
`jcsctl` is the command-line tool for managing the JCS client. It requires certificates for authentication, generated during client initialization. The required files are `ca_cert.pem`, `client_cert.pem`, and `client_key.pem`.
When starting, `jcsctl` searches for these files in the following order:
- Paths specified via command-line options `--ca`, `--cert`, and `--key`.
- The directory where the `jcsctl` executable resides
By default, `jcsctl` attempts to connect to the client at `https://127.0.0.1:7890`. Use the `--endpoint` option to specify a different client address.
### 3、API
See documentation: [Link](docs/JCS_pub_API.md)
## Testing & Evaluation
## Custom Redundancy Strategy
Redundancy strategies have a wide-ranging impact on the read/write process and are tightly coupled with various parts of the codebase. When implementing your own strategy, it's highly recommended to review and reference existing implementations to avoid omissions.
The following explanation is written in an order that prioritizes understanding — it may not reflect the most optimal implementation flow. Please read through the entire section before deciding where to begin.
### 1、Redundancy Transformation
When an object is first uploaded, it exists as a complete file in a single storage space, with a redundancy strategy of `None`. The transformation of its redundancy strategy is handled by the `ChangeRedundancy` task in the JCS client, which is triggered daily at midnight.
The transformation generally involves two steps:
- **Strategy Selection**: The system selects a redundancy strategy for the object based on predefined rules (e.g., object size). Note that transformation is not limited to non-redundant objects — an object can switch between different redundancy modes if needed.
- **Execution**: Based on the chosen strategy and the current one, a transformation plan is generated and executed.If you need to customize the execution plan or instructions, refer to the later section on Custom Instructions.
See the implementation in the `change_redundancy.go` and `redundancy_recover.go` files under: `gitlink.org.cn/cloudream/jcs-pub/client/internal/ticktock`.
### 2、Redundancy Shrinking
Redundancy shrinking uses a simulated annealing algorithm to optimize redundancy from three dimensions: **resilience**, **redundancy level**, and **access efficiency**.
Currently, it supports redundancy shrinking for **replication** and **erasure coding** strategies.
This feature is optional. If needed, refer to the `redundancy_shrink.go` file in: `gitlink.org.cn/cloudream/jcs-pub/client/internal/ticktock`.
### 3、Object Download
Any functionality that involves downloading objects (not just HTTP-based downloads) must be adapted to support the new redundancy strategies. The download process includes:
- **Strategy Selection**
- **Strategy Execution**
The code for **strategy selection** is mainly located in:
`gitlink.org.cn/cloudream/jcs-pub/client/internal/downloader/strategy`
The code for **strategy execution** is distributed across multiple parts of the project, including but not limited to:
`gitlink.org.cn/cloudream/jcs-pub/client/internal/downloader`
We recommend using your IDEs **search** functionality to ensure full coverage when modifying related logic.
## Custom Instructions
Most of the currently implemented instructions can be found in the `gitlink.org.cn/cloudream/jcs-pub/common/pkgs/ioswitch2` package and serve as good references during custom instruction development.
### 1、Writing Instruction Logic
Each instruction must implement the following interface:
```go
type Op interface {
Execute(ctx *ExecContext, e *Executor) error
String() string
}
```
- `String()`: Used for debugging, returns a description of the instruction.
- `Execute(ctx *ExecContext, e *Executor)error`The core execution logic.
- `ctx`: Execution context, which includes a cancelable context(`Context`) and custom values(`Values`) that can be accessed using functions like `GetValueByType` and `SetValueByType` from the `gitlink.org.cn/cloudream/common/pkgs/ioswitch/exec` package.
- `e`: The executor, which provides access to a variable table through functions like `BindVar`, `PutVar`, or generic variants such as `BindVar` and `BindArray` from the `gitlink.org.cn/cloudream/common/pkgs/ioswitch/exec` package.
If `Execute` returns a non-nil error, the entire plan is considered failed and all ongoing instructions will be aborted.
Typical implementation steps:
- **Read Parameters**: Use `BindVar` or related functions to read required inputs. The `VarIDs` should be defined when the instruction is created.
- **Process**: Implement your desired logic.
- **Output Results**: Use `PutVar` or similar to write results back to the variable table for use by subsequent instructions.
If you understand how the module works internally, you are **free to implement any custom logic** as needed.
The data structure used with `PutVar`/`BindVar` must implement:
```go
type VarValue interface {
Clone() VarValue
```
**Note**: All data transferred between instructions is serialized/deserialized using JSON. Avoid using non-serializable fields in custom structs.
If you want to pass a stream (`io.ReadCloser`) between instructions, use the built-in `StreamValue` type instead of defining your own, as streams require special handling.
If an instruction produces a stream, consider using `WaitGroup` or similar mechanisms to ensure the stream has been fully consumed before `Execute` returns.
Finally, register your new instruction and data types using `UseOp` and `UseVarValue` from the `gitlink.org.cn/cloudream/common/pkgs/ioswitch/exec` package.
### 2、Writing DAG Nodes for Instructions
The `ioswitch` module expresses execution plans as DAGs (Directed Acyclic Graphs), where each node corresponds to an instruction and edges represent data dependencies.
To define a custom node, implement the following interface:
```go
type Node interface {
Graph() *Graph
SetGraph(graph *Graph)
Env() *NodeEnv
InputStreams() *StreamInputSlots
OutputStreams() *StreamOutputSlots
InputValues() *ValueInputSlots
OutputValues() *ValueOutputSlots
GenerateOp() (exec.Op, error)
}
```
Key concepts:
- Each node has four types of **slots**: input/output for streams and input/output for values.
- Each input slot accepts data from **one** node; output slots can fan out to **multiple** nodes.
- `Env()` specifies the environment where the instruction should run (e.g., Driver, Hub, Any). Most instructions can use Any.
You may embed the `NodeBase` struct from the `gitlink.org.cn/cloudream/common/pkgs/ioswitch/dag` package, which implements all functions **except** `GenerateOp`.
### 3、Using Custom Instructions
The lifecycle of an execution plan includes:
- Writing `FromTo`
- Parsing `FromTo` into a DAG
- Optimizing the DAG
- Generating Instructions from the DAG
You can extend any of the first three stages to integrate your custom instructions.
The `From` and `To` models describe where data comes from and goes to. You may define your own as long as they implement:
The `FromTo` model is used to describe the structure of an execution plan.
- The `From` component defines **where the data originates**
- The `To` component defines **where the data is intended to go**.
The plan parser uses predefined rules to convert the FromTo description into a DAG (Directed Acyclic Graph), which represents data dependencies and logical flow between operations. This DAG will eventually be translated into a sequence of executable instructions.
If necessary, you can define your own custom From and To types, as long as they implement the following interfaces:
```go
type From interface {
GetStreamIndex() StreamIndex
}
type To interface {
// The range of the file stream required by this To node.
// The specific meaning depends on the value of DataIndex:
// If DataIndex == -1, it refers to the entire file range.
// If DataIndex >= 0, it refers to a specific shard (chunk) of the file.
GetRange() math2.Range
GetStreamIndex() StreamIndex
}
```
Youll also need to modify the DAG parsing logic in:
`gitlink.org.cn/cloudream/jcs-pub/common/pkgs/ioswitch2/parser/gen`
**Note**: Ensure that your custom DAG nodes implement:
```go
type FromNode interface {
dag.Node
GetFrom() ioswitch2.From
Output() dag.StreamOutputSlot
}
type ToNode interface {
dag.Node
GetTo() ioswitch2.To
Input() dag.StreamInputSlot
}
```
This ensures compatibility with DAG optimization algorithms that rely on these interfaces.
In the optimization phase, you can add your own steps to match certain patterns and replace or merge nodes to improve efficiency or reduce instruction count.
After implementing your optimizer, integrate it into the `Parse` function in:
`gitlink.org.cn/cloudream/jcs-pub/common/pkgs/ioswitch2/parser`
Carefully consider the interaction between your logic and other optimization steps.
## License
MulanPSL v2
## Development Team
- Planning & Architecture: Han Bao
- Design & Implementation: Xihua Gong, Jiancong Song, Zhishi Ren, Kaixin Zhang, Junhui Kan
- Advisors: YiJie Wang
- Technical Contact: Jiancong SongEmail: song-jc@foxmail.com
## Key Features