Forum Discussion
Hi,
We had the same issue several months ago, where about 1/2 of our TC files had become corrupted in this way. Here's the "short" summary of how I resolved it.
The basic problem stems from a file encoding compatibility issue between Perforce(P4) & TC (although I'm not exactly sure what triggered it, but we only saw the issue after adding a 2nd user); The theory was that when TC checks-out P4 files that were previously saved using a text editor, then there is a chance that the encoding conversion of the CR/LF characters gets messed up in the process and adds an extra "OD" byte to the linefeed.
In your screenshot example, every 2nd line is 'Japanese', because there is likely an extra byte on the linefeed charater which offsets the ascii values by one byte, which essentially "converts" every 2nd line as UTF-16 (2-bytes) (used by non-english languages such as Japanese). The next linefeed is offset by one more byte which effectivly corrects the issue for that line. Also if I recall, the corrupted files may have had a double byte-order-mark (BOM), which was another symptom of the corrupted file, but I'll explain this later.
We did report the issue to Perforce (Feb. 2015), and I believe it was planned to be fixed. If you contact their support they hopefully should be aware of the bug. I haven't followed up with them though.
A frustrating part with the problem was that even after reverting a file back in Perforce to a pre-corrupted state (e.g. head revision), it does not solve the problem. This is because of how P4 handles the 'encoding type' file information. But, there is a solution (or at least a workaround)...
Here's what I'd suggest:
PART 1: fix the files
- check everything into P4, then manually backup/archive your perforce workspace
- download HxD (a good & free hex editor)
- close P4 & TC
- open the corrupted file using HxD (probably *.vbs files for VBScript)
- press Ctrl+R (Find/Replace)
- search for: "0D 00 0D 0A" (no quotes) -- this is the erroneous CR/LF ascii hex values
- Replace with: "0D 00 0A" (no quotes)
** I'm assuming here that your VB Script files were corrupted in the same manner as my JScript files; if this is not the case, then it may not work. If it doesn't, then if you can post a small sample file, I can look at it.
- set Datatype = Hex-values
- then proceed to replace all
- repeat as needed for each corrupted file
- open the file in a text editor to confirm it is fixed (such as Notepad++); also, pay attention to the file encoding, which is displayed in the bottom-right corner in the N++ status bar.
- also, you may need to manually delete the byte order mark at the beginning of the file (BOM) (sorry I don't remember if this was required); I think I may have had to delete some double BOM's using HxD where there was both UTF-8 and UTF-16 BOM's. Deleting both should reset the encoding to ANSI as default.
PART 2: prevent the problem from recurring
- update P4 clients and server to the latest P4 version (they may have a fix by now, not sure)
- set all your TC Projects Properties' Units Encoding to UTF-8 (or ANSI, but I wouldn't use Auto or UTF-16, because I suspected that 'Auto/UTF-16' may have also been part of the problem).
- then edit each repaired file in TC (any change will do; for example, just add a space somewhere -- but there has to be a distinct change in the file), then save and check-in to P4
PART 3: multi-users in Perforce
- I also suspect that multi-users (P4-TC) is part of the cause to this issue
- without getting into details in this thread, there is a major limitation in TC with the default P4 integration. By default, every TC user must use the same P4 workspace, but that is virtually useless in a multi-user setup. However, I found a workaround for this as well. Using a text editor, simply delete the workspace connection information in the project suite file (*.pj)(search for 'auxpath' in the file). Once the workspace connection is gone, then you will be able to use the P4 integration in TC as expected.
Hope that helps,
Brian
So I followed suggested steps except the last one
"- also, you may need to manually delete the byte order mark at the beginning of the file (BOM) (sorry I don't remember if this was required); I think I may have had to delete some double BOM's using HxD where there was both UTF-8 and UTF-16 BOM's. Deleting both should reset the encoding to ANSI as default."
I see some statements converted back to English but i lost some as well. I Have attched file comparison of before and after Hex replacements.
Any hint whats missing here ?
- brk939410 years agoOccasional Contributor
Yes, I think I know what the problem is. Our starting points are different -- it is most likely that your corrupted file encoding is different than what mine was.
Mine was UTF-16 (LE). (LE = Little Endian). See attached file for sample (yellow highlighting indicates encoding).
If you look at the first 2 or 3 bytes of your original corrupted file in HxD, it's probably either UTF-8 (hex=EF BB BF), or UTF-16 (BE) (hex=FE FF). It's possible, but very unlikely, that you have one of the other 5 UTF encoding types.
From my screenshot, and for my encoding type, the actual Find/Replace values should have been to find "0D 00 0D 0A 00" and replace with it with "0D 00 0A 00". (sorry, in my original post, it worked for me, but was not 100% accurate because it was offset by 1 byte). Please compare both screenshots to see the Find/Replace difference.
However, based on your encoding, it will probably require a slightly different treatment. But without seeing the hex values, I can't tell you what the Find/Replace values should be.
So I would suggest to post a screenshot of the first few lines of your original corrupted file in HxD, or the original file itself (but be careful of propritary content as this is a public forum). Then I can tell you what the Find/Replace should be for your files.
Regards,
Brian
- mspatel10 years agoContributor
Hi
Since i Messed around with the file i posted earlier , Here is another file that was corrupted.
- brk939410 years agoOccasional Contributor
The BOM looks fine in your file. I'm pretty sure that a find / replace the hex values "0D 00 0D 0A 00" with "0D 00 0A 00" should do the trick for you. Good luck.
Related Content
- 5 years ago
Recent Discussions
- 2 hours ago