Using Optical Character Recognition (OCR)

If you upload a PDF file or a scanned image to a project, Smartcat will automatically recognize text in it using the Optical Character Recognition (OCR) technology and convert the file into a Word (DOCX) file.

The result of this process depends greatly on the image quality. Before you start translating, make sure that the file has been processed correctly and its content recognized without errors. In addition, if you need to translate the document into multiple languages, you can adjust its formatting before translation thus saving time as you won’t need to correct the formatting multiple times for each language at the end of the process.

Checking and correcting the source text and layout

When creating a new project or after uploading a file to an existing project, select the Check and correct the source layout option. This way you will be able to check the OCR results and make the necessary corrections before you start translating.

Note: The post-translation layout check is also often recommended with this type of documents. See further down for more explanation.

Please note that you have to choose this option separately for each file. The paint roller icon will appear next to the file name if you choose to correct the layout. Once you’re done, complete the project creation.


Note: The files for which you chose to correct the layout will not be processed using any of the linguistic assets (translation memories or machine translation) until the layout verification is completed. Any statistics generated at that point will not include the word count from the files being processed this way.

Once on the project page, choose your file from the list and click Source layout and text check.


This will take you to the Source layout check page. Download the prepared file from the Work files section and make sure it has been properly recognized.


If everything is correct, press the Complete checking of source layout button. If not, make your corrections to the file and press the Upload button in the Work files area to upload the corrected file. Please note that the corrected file may only be uploaded in Word (DOCX) format.


Press the Complete checking of source layout button. The corrected document will then be processed using your linguistic assets and the pretranslation rules that you have defined.

Note: Like any other task, you can assign the layout check task to anyone in your team or to freelancers from the Marketplace.

After the project is created, if you notice that the OCR results are not ideal, the source layout and text check can be enabled in the document settings.


Click on the gear icon for the document you want to process and choose Check and correct source layout in the dialog box that appears.


Please note that any completed translations will be lost. The process to check the layout is then the same as described above.

Checking the post-translation layout

A translation often happens to be significantly longer or shorter than the original. This might cause the translation’s layout to be distorted in the completed file. To add a task to check the layout in the translated file, go to the project settings and add the Layout check stage.


Note: This option can be selected for all formats, and the task can be assigned to team members or freelancers. This is helpful if you offer desktop publishing services to your clients and want to be able to assign that task from within Smartcat to benefit from our collaboration features.

Once an assignee has completed the layout check istage, you will be able to download the resulting file with corrections. 



Like it?


Similar articles:

    No related articles