Details
Details
Assignee
Unassigned
UnassignedReporter
Jakub Zaprzałka
Jakub ZaprzałkaLabels
Rasa Open Source Version
3.6.20
Rasa SDK Version
3.6.2
Python version
3.10
Operating System
Linux
Difficulty
Easy
Created July 10, 2024 at 12:14 PM
Updated July 10, 2024 at 2:23 PM
I am creating a bot for Polish language with RASA open source. I have encountered a bug when tring to run CLI command
rasa data split nlu
.The data generated into splits fails to encode custom
UTF-8
characters properly. The data is saved intoyaml
as unicode sequences\u0142
instead ofł
and is later read improperly by the importer. This error occurs only if this character is in entity dict.While All specific Polish characters like
ą, ę, ł, ć, ź, ...
work well when thay are not included in the entityvalue
dict (they are properly represented) inside the json-encoded dict they contain escape unicode sequences.I managed to fix this issue by properly encoding the entity dict while saving it with
TrainingDataWriter
.A simmilar issue is also referenced in GitHub Issue #7541 RasaHQ/rasa and marked as stale for earlier version of rasa