Honestly, each situation needs to be addressed per the situation. I'm dealing with some of the same in my automation efforts. In development, everything works great ("It works on my machine" says every developer always). Each "false failure" indicates a general difference between the execution in the local development environment and the scope of the individual test cases versus the larger scope of multiple workstation environments, being run in sequence with other test cases, etc.
For us, we have needed to adjust some of our automation techniques to do a bit more "pro-active" work. As we're developing, we take into consideration what things we run into. Is the screen resolution of the remote machine the same as the developer machine? Do the tests run faster or slower? What issues of timing need to be addressed in these other environments? What environmental factors need to be considered?
We've added functions into our code libraries to handle a lot of these situations and have made it a practice that, while developing test cases, we use this. This have mitigated a LOT of our problems. Basically, there are two main categories of "false failures" we deal with. The majority has to do with timing where the test scripts run faster than the application can keep up with. So, we build into our test cases a lot these methods. These are things that will not be added "automatically" if we are recording... we do not rely on record/playback to build our test cases. They are built almost 75% as manual construction. The other part of our "false failures" has to do with environmental factors. The machines running the test cases are not as powered as the machines doing the development... that contributes somewhat to timing factors. But screen resolution plays a role if a component I need to click on is not on screen because the resolution is smaller on my execution system. Or the browser is a different version. Or the OS is a different version. Everyone of these takes investigation and then a mitigation effort to build out, again, pro-active practices and techniques to make sure that, as we are developing new tests, we are already anticipating these things.
Let me run down your list of things you're seeing and give you my suspicions:
1) "TE not able to click a button because it is invisible" - This sounds like a timing issue. Keep in mind that, by the time the screenshot is taken, the object may be resolved on screen but, by then, the error logging has already triggered. Try increasing your auto-wait timeout on your project to see if this resolves this. If so, you'll need to build out better wait logic in your test cases to account for these differences. Another less obvious possibility could be that your object identification may need to be adjusted. You may "see" it on screen but I've seen applications where I've mapped one object that appears to be the one on screen when, in fact, what was mapped was some sort of underlying layer that is not technically "visible"
2) "skipping some steps" - Everything that is in a test should be executed. So, to have something not execute means that something in the logic of the test is missing. If you're explicitly logging messages for each step, then yes, the log should show it. But if you're looking for "Event" messages, those won't always log. A project property option is to suppress events unless an error is logged. So, it could be that things ARE firing.... but just not logging becasue there is no need to. Would I would suggest is, if you suspect that something is not firing that should be firing, add more explicit debugging logging temporarily to the test case, run it again, and look for where things may be going wrong.
In short, as I mentioned, the mitigation "depends" on the situation. I can only give you generalized suggestions. But to specifically track down your problems and fix them for your specific situation... right now, only you can do that because only you have full access. Post your individual problems and situations with logs, sample code, screenshots of object properties, namemappings, etc. And we'll be able to help out. But generally... this takes elbow grease. But once you figure out the general patterns, you can build into your development process the necessary techniques to prevent additional false failures.