89 Commits

Author SHA1 Message Date
dailz
5591b67f75 feat(task): auto-inject scheduling params into script template via scheduling_map
Add scheduling_map field to ParameterSchema so Application creators can
declare that a parameter (e.g. NP) maps to a scheduling field (e.g. cpus).
The backend auto-injects the scheduling value into script template variables
before rendering, eliminating duplicate user input. The frontend hides
mapped parameters from the form and injects their values on submit.
2026-04-22 10:26:52 +08:00
dailz
435ab285c1 fix(task): prevent RecoverStuckTasks from re-enqueueing in-flight tasks
RecoverStuckTasks scans for tasks with updated_at > 5min ago and
re-enqueues them. This incorrectly matched tasks actively being
processed by the worker (e.g. slow downloads), causing
double-processing.

Add inflight sync.Map to track taskIDs currently inside ProcessTask.
RecoverStuckTasks skips tasks found in inflight. On server restart
inflight is empty (in-memory), so genuinely stuck tasks are still
correctly recovered.

Also: increase taskCh buffer 16→10000, add periodic RecoverStuckTasks
goroutine in TaskPoller (every 5min), and add status guard in
ProcessTask as defense-in-depth against duplicate enqueues.
2026-04-21 17:19:10 +08:00
dailz
8955e513aa fix(upload): add 30s timeout to background chunk cleanup goroutine
The cleanup goroutine used context.Background() with no timeout, so if
MinIO accepted TCP connections but never responded, the goroutine would
block indefinitely. Now uses context.WithTimeout to prevent leaks.
2026-04-21 13:35:21 +08:00
dailz
f13377ca7d fix(task): return empty string for unknown/empty Slurm states instead of defaulting to running
mapSlurmStateToTaskStatus previously defaulted to 'running' for empty
state arrays and unrecognized states. This was too aggressive — treating
unknown as actively running could cause incorrect status updates when
Slurm returns unexpected or empty state data.

Now empty/unknown states return an empty string, and refreshTaskStatus
skips the update in that case.
2026-04-21 13:23:40 +08:00
dailz
b90942de77 fix(task): prevent duplicate Slurm job submission on backend restart
RecoverStuckTasks now skips tasks that already have a slurm_job_id,
and ProcessTask adds a guard before the submitting step to prevent
re-submission even if a task is incorrectly re-enqueued.

Also deprecates POST /api/v1/jobs/submit endpoint (replaced by POST /tasks)
and comments out related handlers and tests.
2026-04-21 10:57:38 +08:00
dailz
4fd331ebd8 feat(application): add Environment field and inject into Slurm job submission 2026-04-21 10:23:31 +08:00
dailz
d9ca9233b3 fix(service): correct CPU/memory mapping and add TRES/memory_used extraction
- Map CPUs to CpusPerTask (not MinimumCpus) for consistent SlurmDBD history

- Add Set:true to memory Uint64NoVal on submission

- Filter number=0 in mapUint64NoValToInt64 to avoid false zeros

- Extract peak memory from Steps.Tres.Requested.Max across all steps

- Add formatTresList, parseGresDetail, extractMemoryFromSteps helpers

- Update mapJobInfo and mapSlurmdbJob with new field mappings

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-20 17:10:19 +08:00
dailz
d79656c728 feat(model): add resource fields to JobResponse (CPU, memory, TRES, elapsed)
Add CpusPerTask, MemoryPerCpu, MemoryPerNode, MemoryUsed, TRES strings,

GresDetail, and Elapsed fields to capture full Slurm job resource info.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-20 17:09:52 +08:00
dailz
08ca4da691 fix(service): inject WORK_DIR and map file_ids before param validation
Previously ValidateParams ran before WORK_DIR injection and file_ids mapping,
causing required parameter missing errors for auto-handled params. Now the
execution order is: inject WORK_DIR, map file_ids to file params, validate, resolve.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-20 13:41:28 +08:00
dailz
13ce86b1ef feat(mockserver): update mock server for file upload and task defaults
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-20 10:39:07 +08:00
dailz
9aea1ea710 feat(handler): add task defaults and file_ids support in task submission
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-20 10:38:59 +08:00
dailz
e90904cedb test(service): add tests for task defaults and job status
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-20 10:38:49 +08:00
dailz
166ca3092c feat(service): add task defaults, job status, and cluster helpers
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-20 10:38:41 +08:00
dailz
f894e870ed test(model): add tests for task defaults, job queries, and DTOs
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-20 10:38:30 +08:00
dailz
db06e99967 feat(model): add task defaults, job queries, and refine file/task DTOs
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-20 10:37:58 +08:00
dailz
9a04874847 feat(cmd/mockserver): add standalone mock server for frontend development
Reuse MockSlurm + MockMinIO + TestEnv wiring pattern to create a standalone
binary that serves all API endpoints with in-memory SQLite and seed data.

- internal/mockserver/server.go: assembly logic (New/Close/Run), option pattern,
  4 accessors for seed data injection
- cmd/mockserver/main.go: CLI flags (--port, --seed, MOCK_PORT), 6 seed jobs
  in all 5 states + 2 seed applications, signal handling

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-17 13:21:13 +08:00
dailz
6a7bde4801 test(service): add tests for WORK_DIR injection and file-type param resolution
- ValidateParams: file/directory type validation tests

- RenderScript: file-type not escaped, WORK_DIR injected without quotes

- ProcessTask: file_id→filename resolution, invalid ID, missing file scenarios

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-16 17:56:42 +08:00
dailz
07ae8ad6cd feat(service): auto-inject WORK_DIR and resolve file-type params in task script rendering
- ProcessTask injects $WORK_DIR only when script template uses it

- File/directory type params: resolves file_id to filename before rendering

- ValidateParams validates file/directory params as valid int64 file IDs

- RenderScript no longer shell-escapes file/directory type values

- Log rendered script before submitting to Slurm for debugging

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-16 17:56:28 +08:00
dailz
29525c3fa9 feat(handler): update GetFile and ListFiles handlers for new FileResponse fields
- GetFile uses new GetFileResponse instead of manual FileResponse construction

- ListFiles handler parses optional user_id query parameter

- Wire FolderStore into FileService in app.go, testenv, and file_test

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-16 16:04:43 +08:00
dailz
1d591efeba feat(model,service): add folder_path and user_id to FileResponse, add user_id filter to ListFiles
- FileResponse gains folder_path ("/" for root) and user_id fields

- folder_id no longer uses omitempty, root files return null

- ListFiles accepts optional userID parameter for filtering by owner

- New buildFileResponse helper populates folder_path from FolderStore

- New GetFileResponse method wraps GetFileMetadata + buildFileResponse

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-16 16:04:22 +08:00
dailz
9092278d26 refactor(test): update test expectations for removed submit route
- Comment out submit route assertions in main_test.go and server_test.go

- Comment out TestTask_OldAPICompatibility in task_test.go

- Update expected route count 31→30 in testenv env_test.go

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-16 15:16:07 +08:00
dailz
7c374f4fd5 refactor(handler,server): disable SubmitApplication endpoint, replaced by POST /tasks
- Comment out SubmitApplication handler method

- Comment out route registration in server.go (interface + router + placeholder)

- Comment out related handler tests

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-16 15:15:55 +08:00
dailz
36d842350c refactor(service): disable SubmitFromApplication fallback, fully replaced by POST /tasks
- Comment out SubmitFromApplication method and its fallback path

- Comment out 5 tests that tested the old direct-submission code

- Remove unused imports after commenting out the method

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-16 15:15:42 +08:00
dailz
b9b2f0d9b4 feat(testutil): add MockSlurm, MockMinIO, TestEnv and 37 integration tests
- mockminio: in-memory ObjectStorage with all 11 methods, thread-safe, SHA256 ETag, Range support
- mockslurm: httptest server with 11 Slurm REST API endpoints, job eviction from active to history queue
- testenv: one-line test environment factory (SQLite + MockSlurm + MockMinIO + all stores/services/handlers + httptest server)
- integration tests: 37 tests covering Jobs(5), Cluster(5), App(6), Upload(5), File(4), Folder(4), Task(4), E2E(1)
- no external dependencies, no existing files modified
2026-04-16 13:23:27 +08:00
dailz
73504f9fdb feat(app): add TaskPoller, wire DI, and add task integration tests 2026-04-15 21:31:17 +08:00
dailz
3f8a680c99 feat(handler): add TaskHandler endpoints and register task routes 2026-04-15 21:31:11 +08:00
dailz
ec64300ff2 feat(service): add TaskService, FileStagingService, and refactor ApplicationService for task submission 2026-04-15 21:31:02 +08:00
dailz
acf8c1d62b feat(store): add TaskStore CRUD and batch query methods for files and blobs 2026-04-15 21:30:51 +08:00
dailz
d46a784efb feat(model): add Task model, DTOs, and status constants for task submission system 2026-04-15 21:30:44 +08:00
dailz
79870333cb fix(service): tolerate concurrent pending-to-uploading status race in UploadChunk
When multiple chunk uploads race on the pending→uploading transition, ignore ErrRecordNotFound from UpdateSessionStatus since another request already completed the update.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-15 10:27:12 +08:00
dailz
d9a60c3511 fix(model): rename Application table to hpc_applications
Avoid table name collision with other systems.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-15 09:32:11 +08:00
dailz
c0176d7764 feat(app): wire file storage DI, cleanup worker, and integration tests
Add DI wiring with graceful MinIO fallback, background cleanup worker for expired sessions and leaked multipart uploads, and end-to-end integration tests.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-15 09:23:25 +08:00
dailz
2298e92516 feat(handler): add upload, file, and folder handlers with routes
Add UploadHandler (5 endpoints), FileHandler (4 endpoints), FolderHandler (4 endpoints) with Gin route registration in server.go.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-15 09:23:17 +08:00
dailz
f0847d3978 feat(service): add upload, download, file, and folder services
Add UploadService (dedup, chunk lifecycle, ComposeObject), DownloadService (Range support), FileService (ref counting), FolderService (path validation).

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-15 09:23:09 +08:00
dailz
a114821615 feat(server): add streaming response helpers for file download
Add ParseRange, StreamFile, StreamRange for full and partial content delivery.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-15 09:22:58 +08:00
dailz
bf89de12f0 feat(store): add blob, file, folder, and upload stores
Add BlobStore (ref counting), FileStore (soft delete + pagination), FolderStore (materialized path), UploadStore (idempotent upsert), and update AutoMigrate.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-15 09:22:44 +08:00
dailz
c861ff3adf feat(storage): add ObjectStorage interface and MinIO client
Add ObjectStorage interface (11 methods) with MinioClient implementation using minio-go Core.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-15 09:22:33 +08:00
dailz
0e4f523746 feat(model): add file storage GORM models and DTOs
Add FileBlob, File, Folder, UploadSession, UploadChunk models with validators.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-15 09:22:25 +08:00
dailz
44895214d4 feat(config): add MinIO object storage configuration
Add MinioConfig struct with connection, bucket, chunk size, and session TTL settings.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-15 09:22:18 +08:00
dailz
a65c8762af fix(service): add environment variables and fix work directory permissions for Slurm job submission
Slurm requires environment variables in job submission; without them it returns 'batch job cannot run without an environment'. Also chmod the entire directory path to 0777 to bypass umask, ensuring Slurm and compute node users can write.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-14 13:06:51 +08:00
dailz
32f5792b68 feat(service): pass work directory to Slurm job submission
Add WorkDir to SubmitJobRequest and pass it as CurrentWorkingDirectory to Slurm REST API. Fixes Slurm 500 error when working directory is not specified.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-13 17:12:28 +08:00
dailz
328691adff feat(config): add WorkDirBase for application job working directory
Add WorkDirBase config field for auto-generated job working directories. Pattern: {base}/{app_name}/{timestamp}_{random}/

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-13 17:11:48 +08:00
dailz
10bb15e5b2 feat(handler): add Application handler, routes, and wiring
Add ApplicationHandler with CRUD + Submit endpoints. Register 6 routes, wire in app.go, update main_test.go references. 22 handler tests.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-13 17:10:54 +08:00
dailz
d3eb728c2f feat(service): add Application service with parameter validation and script rendering
Add ApplicationService with ValidateParams, RenderScript, SubmitFromApplication. Includes shell escaping, longest-first parameter replacement, and work directory generation. 15 tests.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-13 17:10:09 +08:00
dailz
4a8153aa6c feat(model): add Application model and store
Add Application and ParameterSchema models with CRUD store. Includes 10 store tests and ParamType constants.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-13 17:08:24 +08:00
dailz
dd8d226e78 refactor: remove JobTemplate production code
Remove all JobTemplate model, store, handler, migrations, and wiring. Replaced by Application Definition system.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-13 17:07:46 +08:00
dailz
2cb6fbecdd feat(service): add pagination to GetJobs endpoint
GetJobs now accepts page/page_size query parameters and returns JobListResponse instead of raw array. Uses in-memory pagination matching GetJobHistory pattern.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-10 15:14:56 +08:00
dailz
35a4017b8e docs(model): add Chinese field comments to all model structs
Add inline comments to SubmitJobRequest, JobListResponse, JobHistoryQuery, JobTemplate, CreateTemplateRequest, and UpdateTemplateRequest fields, consistent with existing cluster.go and JobResponse style.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-10 13:53:54 +08:00
dailz
f4177dd287 feat(service): add GetJob fallback to SlurmDBD history and expand query params
GetJob now falls back to SlurmDBD history when active queue returns 404 or empty jobs. Expand JobHistoryQuery from 7 to 16 filter params (add SubmitTime, Cluster, Qos, Constraints, ExitCode, Node, Reservation, Groups, Wckey).

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-10 13:43:31 +08:00
dailz
b3d787c97b fix(slurm): parse structured errors from non-2xx Slurm API responses
Replace ErrorResponse with SlurmAPIError that extracts structured errors/warnings from JSON body when Slurm returns non-2xx (e.g. 404 with valid JSON). Add IsNotFound helper for fallback logic.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-10 13:43:17 +08:00