Sensor-based remote healthcare monitoring offers a sustainable solution for detecting adverse health events in individuals with long-term conditions, directly in their homes. Traditional anomaly detection methods in noisy, multivariate real-world data often require large labelled datasets, complex AI models, extensive hyperparameter tuning, and frequent retraining to address data drift, limiting their scalability and explainability. Inspired by the simplicity and success of negative sample-free contrastive learning in computer vision, we propose a resource-efficient, self-supervised model that adapts to noise to improve anomaly detection. Our model has outperformed similar algorithms in detecting agitation and fall events in a real-world study of dementia patients. We enhanced model transparency through a ‘spatiotemporal attention map’ that pinpoints anomalies, fostering user trust and encouraging broader adoption. Our scalable, domain-agnostic solution can be applied across diverse healthcare, industrial, and urban environments, aligning with sustainable development goals, particularly in low-resource settings.