Fixing broken audio

Last year when we changed hosts, some of the audio files on the site became corrupted. For most if not all, I have uncorrupted backups. The problem is really just finding the recordings for which the audio was cut short.

This is one such recording which I previously didn’t know about but discovered today when browsing the site. The audio was cut down to 5 seconds of the original ~30. It’s an easy fix, but it can’t be fixed until I find that it needs it.

I’ll continue to go through the site this week and look for such recordings. I know there are a couple more out there.

If you see others, feel free to comment on this post with a link to the story and I’ll be sure to get to it even quicker than if I’m just wandering on my own.

Also fixed:
在接下来的时间, 我们会继续浏览整个网站尽量去发现这些出了问题的录音。我们估计应该还有一些这样的录音没有被发现和修复。




Site updates

In the past we’ve complained that we weren’t blogging enough. We would remind each other that we need to do it, but then we’d end up not. We tried to keep the blog going with a rule that anything we wrote, we would have to write in English and Mandarin. I think that got in the way of us writing since it meant having to spend twice as long writing and then many more times longer proofing and making sure things were clear and consistent. I’m getting rid of that rule since it also gets in the way of us telling you what’s going on with the project. I think we have a responsibility to keep people informed, even if that means some things are only given in a single language sometimes. It’s in that spirit that I’m writing this post.

I’ve just finished putting together some big updates to the site, with a few more just around the bend. Consider this the release notes for that. There are a bunch of tiny changes that you might not notice, and I don’t think they need to be mentioned explicitly. Instead I’ll just go over the major points.

Map search & filtering

You can now search for keywords directly on the map page. We had this ages ago but had to ditch it when we changed the server-side infrastructure. It’s back, and not a moment too soon.

For example, now you can see the location of every speaker that talks about fish if that’s what you want to do.

We’ve also changed the appearance of the map, but this current look is also temporary. We’ll be getting back to a custom design like we once had, but it’s a couple weeks out still. We first need to work out the funds to do that.

Navigation improvements

It was taking too much time to get around the site. Now there are a bunch of shortcuts throughout the site to quickly get you wherever you need to be. There are dropdown menus and redundant links throughout to speed things up.

There are also dialect- and language-focused drop-down links throughout the site that will filter the map results. We had something similar to this last year but it wasn’t as useful or flexible as the current system.

Another change to how you interact with the site is live filtering and ordering of your search results. You can see this on the main page right now, where the mosaic of most-recent uploads can now be re-ordered and filtered to let you get to your own recent contributions more quickly, as well as having an easier way to filter out stories that might not interest you.

Ultimately, it should never take more than a couple clicks to get anywhere on the site. That’s something we’ve been ignoring, but not any more.

Search stories

Also something we had set up quite a while ago, but which was also abandoned at some point. You can once again search for words anywhere in the story transcripts, comments, or speaker info. There’s a search box in the top right corner that will take you to a results page like this.

There’s still some improvements to be made when searching in Chinese. It’s not very good right now, but we know why and it will be fixed with the upcoming shift toward a more research-friendly data management system, outlined a few paragraphs blow.

Language handling improvements

We started out being just limited to Sinitic. We built the whole system with that focus in mind. Then someone sent us a recording of Zhuang and we didn’t have the heart to reject it, so we tried to change things to be more inclusive. It didn’t work well since we were still working with the old system.

That’s fixed now, and you’ll see this more open approach reflected throughout the site. We only have a handful of recordings that are not of Sinitic varieties, but we will be changing that very soon.

Improved Korean site translation

It’s still not perfect, and we’d love to have someone help us fill in the gaps, but we’ve been working on improving the Korean version of the site’s user interface. We’d like to translate it into other languages as well, so if you happen to be a native speaker of something other than Chinese or English and would like to help translate the site, let us know. It’s not a very time-consuming process, and the whole thing could be done in a couple hours tops.


We did a little of this in the last update but not enough. The site is now much more mobile friendly. There are a couple pages that still need some attention, and there’s no guarantee that your phone will deal with the audio well, but it should be a much more likely thing now than it was last week.

Linguistic research compatibility

People are already using the data on the site for their research on dialect diversity, sociolinguistics, phonology, phonetic and lexical variation to name just the few areas I’ve specifically been told about. Last year I put together some import/export tools so that the site could more easily interact with applications like ELAN and FLEx. I plan to more properly integrate those in the next month so that they’ll be available to anyone who wants to move data around.

As part of that, we’re also making a couple changes to the depth of focus that the data can be worked with. Right now the smallest unit in the system is a phrase. There’s previously been no system in place for things like POS tagging. We’ve talked about it since the beginning, but it always got pushed back. It has recently become a much bigger priority.

Aside from this change being helpful for things like making linguistic analysis easier, it will also allow us to quickly compile things like word lists and dictionaries, phrase books and other useful study tools for people interested in learning something other than Standard Mandarin. I’ll get into the details of this in another post once things are a little more solidified.

Cosmetic updates

Things are cleaner. The design has been tightened up considerably. Six months ago we were in such a hurry to shift the site to a new infrastructure in order to get things more automated. Because of this we focused solely on the server-side of things, and the update to the site design wasn’t given the right attention. It left things ugly and it’s been bugging the hell out of me ever since. I’ve finally gotten around to fixing that. I hope you like the updated look.


If nothing else matters of the above, I hope you at least like that the site isn’t as ugly as it was.

We have more changes planned, and will continue to improve the platform. It’s been a slow few months with Phonemica as we’ve all been incredibly busy with other responsibilities, hence the recent radio silence. That’s over now, and you’ll be seeing a lot more activity from us in the coming months.

在这一方面我们仍然做得不够好,所以我们希望有人可以帮我们在这一方面拾遗补缺,同时我们一直在努力改进网站的韩语用户使用界面。 我们还想把它翻译成其他的文字。所以如果你的母语是除中文和英文之外的语言,并且也愿意帮乡音苑做一些翻译工作,那么请一定联系我们。这并不需要花你很长的时间,最多两三个小时就能完成。








乡音苑变得更整洁大方了。整个网站设计变得十分紧凑。 六个月之前,为了让网站变得更自动化,我们匆匆忙忙地将它进行了升级。因此,我们只关注了服务器方面的事情,而对于网站设计的更新则没有给予应有的注意。这让一切变得很糟糕,系统的漏洞层出不穷,让我穷于应付。现在我终于把这些都解决了,希望大家喜欢这个更新后的样子。



更多的更新请点击这里。我们会及时通知大家乡音苑的新变化。请继续关注我们(this is added by me )。