We are seeing errors coming through "file or directory /path/to/my.log does not exist". This is occurring exactly when the log files are being rotated. Looking through the code, the file delete and the file create are each locked, but the entire operation needs to be locked, not the separate parts. But also this won't help anyhow, cos the log-line-write is not locked anyhow :neutral_face:https://github.com/ColdBox/coldbox-platform/blob/development/system/logging/appenders/FileAppender.cfc#L228Am I reading all that correctly?
Actually it also looks to me like there's an issue with the rotation part: you are zipping a "live" log file by the looks, so that could receive more log lines after the zip and before the delete. Meaning data loss. All this stuff needs to be synchronised I think (incl. log-line-writing). And also implemented in a better order?
If I'm reading the code right, this rotating appender is not really fit for purpose I think..?
OK so I think the fix here is that you need to centralise your log file writing (into a SynchronisedLogWriter class or something). I mean it probs should be like this anyhow, rather than burying filesystem operations in the middle of log management business logic, yeah? That’s not great app design, ATM.
Within that: lock everything, and consider which operations need to be handled atomically. EG: the process of rotating a log: copying the log file, starting a new one, zipping the copy, deleting the copy (and this is not the order you currently do this in, but it needs to be I think) needs to be an atom, and it needs to block the log-line-write operations whilst it’s happening. Or even just the “copying the log file, starting a new one” bit treated as an atom so the log-line-writing is not blocked for too long. The lock should be keyed on the full file path, probably.
There’s a risk - given the nature of logging - that the CFML engine is not gonna like how much locking is involved, so I think this needs solid load testing too: something that clearly has not been done before, I think. This stuff fell over on us within hours, as soon as traffic ramped up. And our app is not public, so it’s not like it’s hugely high-throughput. If the CFML locking is not performant enough, I am certain this is a solved problem at Java level. It’s bread-and-butter sorta stuff in the Java world, I reckon..?
Happy to discuss further if that “helps”. You know how to find me
Activity
Show:
Luis Majano November 2, 2023 at 10:41 PM
I have completely rewritten how logging works now with files to make it consistent, simpler, and faster (as per my tests).
Can you please try your tests and report back to me please.
Luis Majano October 16, 2023 at 10:31 AM
Thanks as alwasys for the detailed explanation. I will be trying to tackle this, this week. However, please note I will need your help since debugging and making sure race conditions are not occurring is extremely hard to replciate.
Fixed
Pinned fields
Click on the next to a field label to start pinning.
I’m gonna copy this from Slack:
That’s a bit stream-of-consciousness, sorry.
OK so I think the fix here is that you need to centralise your log file writing (into a SynchronisedLogWriter class or something). I mean it probs should be like this anyhow, rather than burying filesystem operations in the middle of log management business logic, yeah? That’s not great app design, ATM.
Within that: lock everything, and consider which operations need to be handled atomically. EG: the process of rotating a log: copying the log file, starting a new one, zipping the copy, deleting the copy (and this is not the order you currently do this in, but it needs to be I think) needs to be an atom, and it needs to block the log-line-write operations whilst it’s happening. Or even just the “copying the log file, starting a new one” bit treated as an atom so the log-line-writing is not blocked for too long. The lock should be keyed on the full file path, probably.
There’s a risk - given the nature of logging - that the CFML engine is not gonna like how much locking is involved, so I think this needs solid load testing too: something that clearly has not been done before, I think. This stuff fell over on us within hours, as soon as traffic ramped up. And our app is not public, so it’s not like it’s hugely high-throughput. If the CFML locking is not performant enough, I am certain this is a solved problem at Java level. It’s bread-and-butter sorta stuff in the Java world, I reckon..?
Happy to discuss further if that “helps”. You know how to find me