Responsibilities:
Responsible for the stable and efficient operation of each product line service.
Review the service architecture of each product line, and make design and planning suggestions.
Timely response to all kinds of fault alarm, can quickly solve the problem to restore business.
Track user access experience and continuously optimize the operation and maintenance architecture.
Accumulate the best practices of system and application operation and maintenance, formulate and document operation and maintenance specifications and policies.
Cost control and optimization through technical means, improve management efficiency through tool platform and process.
Key Qualifications:
Knowledge of Kubernetes, Ceph storage, routing and switching networks, deep knowledge of one kind is preferred.
Excellent learning ability and keen to learn new technology.
Familiar with Linux system, more than 5 years experience in Linux system management.
3+ years of experience architecting, designing, developing, and implementing cloud solutions on AWS/Azure/GCP.
Master at least one language in Shell, Python and Golang, be able to write operation and maintenance script, experience in automation operation and maintenance platform development is preferred.
Have good communication skills, document organization ability, strong self-driven ability, the pursuit of extreme responsibility.
2+ years of experience with Cloud Integration, Cloud Automation. Should be familiar with Infrastructure as Code (Terraform, Cloudformation etc);
Maintained/contributed open-source projects, familiar with the agile software development process, CICD workflow, ticket management, code-review, version control etc;
Practical experience in managing and leading application reliability practices;
Working understanding of IT service management (Incident, Problem, Change and Knowledge management)