Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

加入利用词典后处理实现英文分词,解决空格问题 #1574

Closed
pinacle2000 opened this issue Jan 19, 2025 · 4 comments
Closed
Labels
enhancement New feature or request

Comments

@pinacle2000
Copy link

我通过magic-pdf转换不是很清晰的扫描版英文杂志时,发现里面的字母、单词识别的都挺对,但就是英文单词间的空格不见了。在Issue里也搜到了不少相关问题,也看到有的能解决而有的是英文空格因为大小问题不好识别。

能不能在识别后,再进行一遍后处理,用语言模型/词典来补入空格?甚至通过判断拼写错误,来提升英文文章的识别结果?

@pinacle2000 pinacle2000 added the enhancement New feature or request label Jan 19, 2025
@myhloli
Copy link
Collaborator

myhloli commented Jan 19, 2025

你先试试在huggingface或者modelscope的demo↑空格正常不正常

@pinacle2000
Copy link
Author

你先试试在huggingface或者modelscope的demo↑空格正常不正常

在线的没有这个问题,为什么呢?我是按照说明文档在win11上自己部署的

@pinacle2000
Copy link
Author

我又试了下载安装的windows app,空格问题仍旧存在,不过效果比我自己部署的好一些,也就是app版本的结果介于自己部署和在线demo之间。

@myhloli
Copy link
Collaborator

myhloli commented Jan 19, 2025

demo上是用于预览的版本,会拥有一些较新的功能和特性,一般在下个版本发布时会同步到正式版。

@myhloli myhloli closed this as completed Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants