How fast is it to access TLS? I noticed in the kernel library it involves an array element access from call_trace(). Any concerns about array churn or does DGD optimize in some sort of way like status() does?