Using OCR

If you upload a PDF file or a scanned image to a project, the system will automatically recognize text in the uploaded files using Optical Character Recognition (OCR) and convert the file to a Word (DOCX) file. The result of this process depends greatly on the image quality. Before you start translating the resulting file, you can make sure that the file has been processed correctly and the text recognized without errors. In addition, if you are going to translate the document into multiple languages, you can adjust the formatting of the document before translation and save time correcting the format multiple times at the end of the process. 

When uploading a file to a project, select the Check and correct source layout option. This way you will be able to check the OCR results and make the necessary corrections to the document before you start translating.

Please note that you have to choose this option separately for each file. The corresponding paintbrush icon will be shown next to the file name if you choose to correct the layout.

mceclip2.png

Complete the project creation.

Note: The files for which you chose to correct the layout will not be processed using any of the linguistic assets (TM or MT) until the layout verification is finished. Any statistics generated at that point will not include the word count from files being processed this way.

Once in the project page, choose your file from the list and click Source layout and text check.

mceclip0.png

This will take you to to Source Layout Check page. Download the prepared file from Work Files section and make sure it has been properly recognized.

mceclip1.png

If everything is correct, press the Complete checking of source layout button. If not, make your corrections to the file and press the Upload button in the Work Files area to upload the corrected file. Please note that the corrected file may only be uploaded as Word (DOCX).

mceclip3.png

Press the Complete checking of source layout button button. The corrected document will then be processed using your linguistic assets and the pre-translation rules that you have defined.

Note: You can also assign the layout check task to anyone in your team or the marketplace, like any other task.

Source layout and text check can be enabled in the document settings after the project is created if you notice in the editor that the OCR results are not ideal.

mceclip4.png

Select Settings for the document you want to process and choose Check and correct source layout in the following dialogue box.

mceclip5.png

Any translation that might have been done will be lost. The process to check the layout is then the same as described above.

 Translation layout check

A translation often happens to be significantly longer or shorter than the original. This might cause the translation layout to be distorted in the completed file. To add a task to check the layout in the translated file, select the Check and correct post-translation layout option when uploading the original file to your project.

mceclip2.png

Note: This option can be selected for all formats processed using Smartcat and the task can be assigned to team members or freelancers. This is helpful if you offer desktop publishing services to your customers and want to be able to assign that task from within Smartcat to benefit from our collaboration features.

Once the translation is done, select the document in the list and press Post-translation layout check.

mceclip6.png

Download the translated file from the Work Files area and check the layout.

mceclip7.png

If everything is right, press the Save button. If not, make your corrections to the file, then press the Upload button in the Work Files area and select the corrected file. 

Press the Save button. The corrected file can now be downloaded from the project page for client delivery.