乡音苑 Phonemica

Fixing broken audio

Last year when we changed hosts, some of the audio files on the site became corrupted. For most if not all, I have uncorrupted backups. The problem is really just finding the recordings for which the audio was cut short.

This is one such recording which I previously didn’t know about but discovered today when browsing the site. The audio was cut down to 5 seconds of the original ~30. It’s an easy fix, but it can’t be fixed until I find that it needs it.

I’ll continue to go through the site this week and look for such recordings. I know there are a couple more out there.

If you see others, feel free to comment on this post with a link to the story and I’ll be sure to get to it even quicker than if I’m just wandering on my own.

Also fixed:
“81蓝天”, “劳斯莱斯” & “方泽夫”

去年,当我们更换主机的时候,有些录音资料被毁坏了,主要的问题就是这些录音的一些片断丢失了,录音就缩短了。我们已经尽量通过使用备份来恢复这些录音。

比如我们最近浏览整个网站的时候就发现了这样一段缩短了的录音,而之前我们并没有意识到有这样的问题存在。这段录音从原来的30秒缩短成了5秒。这个恢复起来很容易,但是我们得先知道哪些录音发生了这样的问题,才能进行修复。

在接下来的时间, 我们会继续浏览整个网站尽量去发现这些出了问题的录音。我们估计应该还有一些这样的录音没有被发现和修复。

如果您有发现其他这样的录音,请在这个贴子下面留言给我们,并粘贴这个录音的链接。有你们的帮助,一定会比我们在网站上自个儿漫无目的地搜索来得快些。

我们还修复了下面三段录音量。

“81蓝天”“劳斯莱斯”“方泽夫”


Site updates

In the past we’ve complained that we weren’t blogging enough. We would remind each other that we need to do it, but then we’d end up not. We tried to keep the blog going with a rule that anything we wrote, we would have to write in English and Mandarin. I think that got in the way of us writing since it meant having to spend twice as long writing and then many more times longer proofing and making sure things were clear and consistent. I’m getting rid of that rule since it also gets in the way of us telling you what’s going on with the project. I think we have a responsibility to keep people informed, even if that means some things are only given in a single language sometimes. It’s in that spirit that I’m writing this post.

I’ve just finished putting together some big updates to the site, with a few more just around the bend. Consider this the release notes for that. There are a bunch of tiny changes that you might not notice, and I don’t think they need to be mentioned explicitly. Instead I’ll just go over the major points.

Map search & filtering

You can now search for keywords directly on the map page. We had this ages ago but had to ditch it when we changed the server-side infrastructure. It’s back, and not a moment too soon.

For example, now you can see the location of every speaker that talks about fish if that’s what you want to do.

We’ve also changed the appearance of the map, but this current look is also temporary. We’ll be getting back to a custom design like we once had, but it’s a couple weeks out still. We first need to work out the funds to do that.

Navigation improvements

It was taking too much time to get around the site. Now there are a bunch of shortcuts throughout the site to quickly get you wherever you need to be. There are dropdown menus and redundant links throughout to speed things up.

There are also dialect- and language-focused drop-down links throughout the site that will filter the map results. We had something similar to this last year but it wasn’t as useful or flexible as the current system.

Another change to how you interact with the site is live filtering and ordering of your search results. You can see this on the main page right now, where the mosaic of most-recent uploads can now be re-ordered and filtered to let you get to your own recent contributions more quickly, as well as having an easier way to filter out stories that might not interest you.

Ultimately, it should never take more than a couple clicks to get anywhere on the site. That’s something we’ve been ignoring, but not any more.

Search stories

Also something we had set up quite a while ago, but which was also abandoned at some point. You can once again search for words anywhere in the story transcripts, comments, or speaker info. There’s a search box in the top right corner that will take you to a results page like this.

There’s still some improvements to be made when searching in Chinese. It’s not very good right now, but we know why and it will be fixed with the upcoming shift toward a more research-friendly data management system, outlined a few paragraphs blow.

Language handling improvements

We started out being just limited to Sinitic. We built the whole system with that focus in mind. Then someone sent us a recording of Zhuang and we didn’t have the heart to reject it, so we tried to change things to be more inclusive. It didn’t work well since we were still working with the old system.

That’s fixed now, and you’ll see this more open approach reflected throughout the site. We only have a handful of recordings that are not of Sinitic varieties, but we will be changing that very soon.

Improved Korean site translation

It’s still not perfect, and we’d love to have someone help us fill in the gaps, but we’ve been working on improving the Korean version of the site’s user interface. We’d like to translate it into other languages as well, so if you happen to be a native speaker of something other than Chinese or English and would like to help translate the site, let us know. It’s not a very time-consuming process, and the whole thing could be done in a couple hours tops.

Mobile-friendliness

We did a little of this in the last update but not enough. The site is now much more mobile friendly. There are a couple pages that still need some attention, and there’s no guarantee that your phone will deal with the audio well, but it should be a much more likely thing now than it was last week.

Linguistic research compatibility

People are already using the data on the site for their research on dialect diversity, sociolinguistics, phonology, phonetic and lexical variation to name just the few areas I’ve specifically been told about. Last year I put together some import/export tools so that the site could more easily interact with applications like ELAN and FLEx. I plan to more properly integrate those in the next month so that they’ll be available to anyone who wants to move data around.

As part of that, we’re also making a couple changes to the depth of focus that the data can be worked with. Right now the smallest unit in the system is a phrase. There’s previously been no system in place for things like POS tagging. We’ve talked about it since the beginning, but it always got pushed back. It has recently become a much bigger priority.

Aside from this change being helpful for things like making linguistic analysis easier, it will also allow us to quickly compile things like word lists and dictionaries, phrase books and other useful study tools for people interested in learning something other than Standard Mandarin. I’ll get into the details of this in another post once things are a little more solidified.

Cosmetic updates

Things are cleaner. The design has been tightened up considerably. Six months ago we were in such a hurry to shift the site to a new infrastructure in order to get things more automated. Because of this we focused solely on the server-side of things, and the update to the site design wasn’t given the right attention. It left things ugly and it’s been bugging the hell out of me ever since. I’ve finally gotten around to fixing that. I hope you like the updated look.

Conclusion

If nothing else matters of the above, I hope you at least like that the site isn’t as ugly as it was.

We have more changes planned, and will continue to improve the platform. It’s been a slow few months with Phonemica as we’ve all been incredibly busy with other responsibilities, hence the recent radio silence. That’s over now, and you’ll be seeing a lot more activity from us in the coming months.

Check back here for future updates. We’ll keep you posted.

一直以来,我们被各位关心乡音苑的朋友指出我们没有发布足够的博客。我们会互相提醒我们需要在这方面做些改进,但是最终我们还是没有做到……之前,我们一直尽力遵守一个不成文的规则,即:不管我们在博客中写什么,我们都必须是中英双语的。我个人认为这个规则可能就是阻止我们发布更多博客的原因之一,因为中英双语的发布意味着我们必须花两倍的时间去写,然后花更多的时间去审核我们的中文翻译以确保意思的清晰以及与英文含义的一致。我现在打算不再使用这条规则,因为它使得我无法及时地告诉各位乡音苑的朋友们这个网站所发生的一些变化和更新。我认为我们有责任及时地告知大家这个网站所发生的一切,即使这些信息有时只能以一种语言(英文)的形式传递给大家。正是基于这样的一个考虑,我发布了这个贴子。

我刚刚完成了乡音苑几个大的更新项目,还有一些小的也正在进行中。我打算通过这个贴子把这些更新介绍给大家。一些细枝末节的变化大家可能没有注意到,我也认为没有必要把一些很小的变化特别地跟大家提出。因此,在这里我要向大家介绍的是乡音苑一些主要的更新。

地图检索和筛选

您现在可以直接在我们的“方言地图”上使用关键字进行搜索。我们其实很早以前就有这项功能了,但是由于更换服务器的问题,我们不得不丢弃了这个功能。现在这项功能又恢复了,而且正是时候。

例如,现在你可以在地图上找到所有谈论有关“鱼”的录音故事,如果这就是你想要找的、听的录音。

我们也改变了“方言地图”的样子,但是它现在的样子也还是暂时的。我们会改回原先的定制的地图样子,但是这还得需要几个星期的时间,因为我们得先筹到所需的资金去做这件事。
Navigation improvements

网站导航的改进

过去浏览整个网站需要比较长的时间。现在,有一些方法可以让你很快地去到任何一个你想去的地方。在整个网站有一些可以下拉的菜单和链接可以让你的浏览更快速。

另外,在整个网站还有关于方言和语言的下拉式菜单可以帮你进一步筛选你通过地图检索到的结果。我们去年就有类似这样的功能了,但是现在的会更好用。

另外一个关于你如何与本网站互动的变化是:实时地筛选和整理你的检索结果。你可以在网站的首页看到这样一项功能。现在最新上传的录音故事的图标可以被重新排序和筛选-以让你更快地去到你在乡音苑最新做出的编辑,同样可以让你更容易地筛选掉你不感兴趣的录音故事。

最终,我们的目的是能让你仅需要两三次的点击就可以到达网站的任何一个地方,这是我们之前忽略的地方,但是现在不再会被忽略了。

检索录音故事

同样还有一些功能我们之前就有,但是在某个时点被移除了。现在这一功能恢复了,你可以在整段录音故事的文字稿、评论或讲述者信息中搜索关键字。在网站的右上角有一个检索栏,输入你要检索的关键字后它就会带你去到这样一个页面。

在中文检索方面还不是非常好用,我们还需要做一些改进。但是我们知道原因在哪里,而且我们会在向一个更容易检索的数据管理系统迈进的同时完成这一改进。我们用以下几段文字来介绍这样一个系统的主要特点:

语言处理能力的提升

我们一开始只是想把这个网站做成是关于汉语方言的,因此网站的建设也是根据这个想法来的。但是,有人给我们发了一段壮语的录音,我们真得不忍心拒绝,所以我们尝试把这个网站做得更具包容性。但是由于我们仍然在老的系统上做着开发和改进,这一功能一直不能运行得很好。
但是现在我们已经解决这一问题。你将在整个网站都看到这一更具开放性的结构。现在我们还只有几个不是属于汉语语言的录音故事,但是很快这一现象就会有改变了。

网站韩语页面翻译的改进

在这一方面我们仍然做得不够好,所以我们希望有人可以帮我们在这一方面拾遗补缺,同时我们一直在努力改进网站的韩语用户使用界面。 我们还想把它翻译成其他的文字。所以如果你的母语是除中文和英文之外的语言,并且也愿意帮乡音苑做一些翻译工作,那么请一定联系我们。这并不需要花你很长的时间,最多两三个小时就能完成。

手机浏览的提升

在上次网站升级时我们在这一方面做了一些努力但是并不够。乡音苑现在可以更好地在手机上进行浏览了。虽然仍然有几个页面需要做一些改进,而且我们也无法保证你使用的手机可以很顺畅地播放录音,但是与前一段时间相比,你用手机浏览乡音苑肯定会有更好的体验。

语言检索的兼容性

人们已经在使用这个网站上的数据进行他们各方面的研究,包括:方言的多样性,社会语言学,音韵学,语音和词汇变化,以及名字。这还只是我被明确告知的几个方面。去年,我们使用了一些输入/输出的工具使得网站可以更好地与其他的一些应用如ELAN和FELx进行互动。在接下来的时间里,我打算进一步整合好这些功能,这样大家就可以使用这些功能来轻松导出并使用数据了。

做为其中的一个环节,我们还在就这些数据可以被使用和分析研究的深度做一些改变。目前,这个系统中数据的最小单位还是一个短句。一直以来,我们没有系统可以做来做POS跟贴。从一开始我们就在讨论这一点,但总是做不下去。最近,这个已经成为我们的一个主要工作任务了。

这个改变除了可以帮助诸如语言学研究变得更加容易,同样它还可以让我们快速地把例如字表、字典、词典和其他的有用的学习工具集合在一起,以供对标准汉语之外的语言有学习兴趣的人使用。等时机再成熟一些,我会再发一个贴子跟大家更详细地说一说这些改变。

外观的更新

乡音苑变得更整洁大方了。整个网站设计变得十分紧凑。 六个月之前,为了让网站变得更自动化,我们匆匆忙忙地将它进行了升级。因此,我们只关注了服务器方面的事情,而对于网站设计的更新则没有给予应有的注意。这让一切变得很糟糕,系统的漏洞层出不穷,让我穷于应付。现在我终于把这些都解决了,希望大家喜欢这个更新后的样子。

总结

对乡音苑更新的介绍大致如上,我希望至少大家会喜欢看到这个网站不再像以前那样糟糕了。
我们仍在计划其他的一些更新,并且会继续改进乡音苑这个平台。由于前几个月我们都忙于一些个人事务,所以乡音苑变得有些沉寂。现在,这个情况结束了,在接下来的几个月里你会看到我们很多的活动。

更多的更新请点击这里。我们会及时通知大家乡音苑的新变化。请继续关注我们(this is added by me )。