Removing BOM characters from UTF8 file using Power-Shell (converting UTF8 with BOM to UTF8 without BOM)

March 1, 2016 User Software 1

When creating UTF8 encoded files using Power-Shell, for example:


$orcFile = "testFile.txt"

Add-Content -Encoding UTF8 $orcFile "bla bla bla"

they are created with BOM (byte order mark) and if there is a need of a just plain UTF8 – we have a problem.

The byte order mark (BOM) is a Unicode character, U+FEFF BYTE ORDER MARK (BOM), whose appearance as a magic number at the start of a text stream can signal several things to a program consuming the text:

What byte order, or endianness, the text stream is stored in;
The fact that the text stream is Unicode, to a high level of confidence;
Which of several Unicode encodings that text stream is encoded as.

One of the solutions would be just removing BOM characters from the file itself:


 #remove utf BOM  from file

  $orcFile = "testFile.txt"

  (Get-Content $orcFile) |

 Foreach-Object {$_ -replace "\xEF\xBB\xBF", ""} |

 Set-Content $orcFile

Ray Mueller
April 6, 2018 at 9:27 am

Hi,

I’m afraid the file is no longer UTF-8 encoded after using the script.

BR

Ray

Reply

Ask Notes

Notes, hacks, hints/software, hardware, life

Removing BOM characters from UTF8 file using Power-Shell (converting UTF8 with BOM to UTF8 without BOM)

1 Comment

Leave a Reply Cancel reply