Senior Site Reliability Engineer
Company: First Horizon
Location: Raleigh
Posted on: March 30, 2026
|
|
|
Job Description:
Weekly Schedule: Monday-Friday, 9am-5pm We are seeking a Senior
Site Reliability Engineer who will be the guardian of our Azure
infrastructure reliability. This role focuses on building
comprehensive observability platforms, implementing intelligent
monitoring systems, and proactively identifying issues before they
impact production. You will create the tools and automation that
predict, detect, and prevent problems rather than simply reacting
to them. Your primary mission is ensuring our Azure infrastructure
and applications never surprise us with failures. The ideal
candidate has deep expertise in Azure Monitor, Application
Insights, Log Analytics, and KQL, combined with strong scripting
skills in Python or PowerShell. You should have 5-7 years of
experience implementing observability platforms and a proven track
record of preventing incidents through proactive monitoring and
automation. Youll work with technologies like Prometheus, Grafana,
OpenTelemetry, and Azure services (AKS, App Services, Azure SQL,
Cosmos DB) while building self-healing automation and predictive
analytics tools that keep our systems healthy. Key
Responsibilities: Design and implement comprehensive observability
stack across all Azure resources and applications Build intelligent
alerting systems with anomaly detection and predictive capabilities
to prevent incidents Create self-healing automation and
auto-remediation tools that resolve issues without human
intervention Develop internal monitoring platforms, dashboards, and
CLI tools for engineering teams Write KQL queries and analyze
metrics/logs to identify optimization opportunities and predict
failures Implement continuous resource monitoring for Azure quotas,
costs, security posture, and service health Build capacity
forecasting and trend analysis tools to prevent resource exhaustion
Reduce alert noise while improving coverage and actionability of
monitoring systems Participate in light on-call rotation
(prevention-focused approach reduces reactive incidents) About Us
First Horizon Corporation is a leading regional financial services
company, dedicated to helping our clients, communities and
associates unlock their full potential with capital and counsel.
Headquartered in Memphis, TN, the banking subsidiary First Horizon
Bank operates in 12 states across the southern U.S. The Company and
its subsidiaries offer commercial, private banking, consumer, small
business, wealth and trust management, retail brokerage, capital
markets, fixed income, and mortgage banking services. First Horizon
has been recognized as one of the nations best employers by Fortune
and Forbes magazines and a Top 10 Most Reputable U.S. Bank. Benefit
Highlights • Medical with wellness incentives, dental, and vision •
HSA with company match • Maternity and parental leave • Tuition
reimbursement • Mentor program • 401(k) with 6% match • More
FirstHorizon.com/First-Horizon-National-Corporation/Careers/Our-Benefits
Keywords: First Horizon, Raleigh , Senior Site Reliability Engineer, IT / Software / Systems , Raleigh, North Carolina