Acceptance

If a conference wants to keep the acceptance rate at p out of n submissions and the noise in the review process is q (meaning that a random nq of the papers will be randomly accepted or rejected regardless of the quality):

  • How many submissions are needed for at least one guaranteed  publication?
  • How many junk papers one should submit to the conference to get an expected one paper published?

For a typical good conference in machine learning n≃1000 and p≃25%. My guess for q is around 10%, and I think it is strongly correlated with n.

A Well-Documented Hobby

Following the note previously posted here, I have started a website to discuss and exchange ideas on the realization of a more human-like speech synthesizer:

http://mahdi.milanifard.com/tts/

The website is based on CommentPress (a WordPress plugin and theme) and supports detailed commenting for each paragraph of each document. I want to keep this purely as a hobby, and the function to maximize is ultimately the sum of the squared fun we can have on this. If you are any interested, hop in and let’s move on. Post a comment or two on the website and let me know what you think.

A sad ending to a successful social experiment

I am writing this with a tiny hope that it’s going to be read by someone who has a say on the upcoming changes to Google Reader. To those of you who are not familiar with the product, well, I’m sorry for you. But if you have been using it (and not even extensively like I do), a major change like this might comes at a shock. We are about to see the end of a great social experiment, and one that is unlikely to happen again.

I know that a decision like this comes with major studies and after crunching many numbers and estimates. I do not have access to that sort of information, but what I’m about to share here is what, I think, is an important issue that is being neglected. Yes, I as a part of the Reader community know something that a Google CEO or a product manager might not know.

Anyone familiar enough with the Google Reader would call it a social network. A network of people who read and share what they think is worth reading, with occasional notes that are on par with, if not better than, the posts themselves. But, in my view, Reader is fundamentally different from other social networks, in that users do not just follow friends or people they know. They often follow people they don’t know, and don’t even care to know. Reader’s “shared items” are not people themselves. They are abstract entities, themes, genres, cohesive collections of feeling-related posts which no machine learning algorithm could put together. People follow shared items, not people –personalities, not names. And you have got to admit, it’s different and unique only to the Reader.

Now the question is, will Google+ fill the gap when/if Reader goes down? Well, I doubt it. You can for sure “follow” anyone you want in Google+, and you can see what they share publicly. Will they be sharing what is worth reading? Or what you care to read? I don’t think so. You will see what they share about their lives, say a photo they took with a friend at a birthday party, if they care to share it publicly, and believe me they do. You will see links to Youtube video, rants about life, gibberish nonsense and all sorts of items you already see on a Facebook public feed today. On a Google+ public stream, you will see people, not a booklet of readable blog posts or news items. Shared items in Reader are relatively free of the shenanigans going on in people’s lives. It’s no fun mixing Vodka with wine. It just won’t work.

I don’t expect a post like this to make any difference, though I’m certainly hopeful it would. If enough people think like I do, maybe Google will start to think twice about the decision. But if things go on as planned, well then I’m sure going to miss all those people I never knew.

A more human voice

I have this love-hate relationship with most sci-fi movies. They are often amusing in their futuristic views of science and technology. But nerdy critics like me would be annoyed by the smallest details that are left out of place. Say, a movie taking place in the year 3000, but in which cars are still using gas! But among many other goofs, the one that bugs me the most these days is that computers, far far in the future, are still using speech synthesizers that sound like “Microsoft’s Sam“, now more than a decade old. There are now text-to-speech engines that sound way more human-like than what you heard in your favorite recent sci-fi movie (check out this, or this, or this for some sample demos).

Don’t get me wrong. We still have a long way in synthesizing a voice that could pass as that of a human. Such a synthesizer should take into account contextual information about the text and should know where to put the emphasis and pauses in the sentence. Listen to this marvelous pieces by Stephen Fry:

Can we build a speech synthesizer that gets close this? Well, I think so. In fact, I think many pieces are already there: We have tools that can extract contextual information out of simple text (say, HMMs models, LDA-based methods, etc). We have algorithms to find the part-of-speech for each word (POS taggers), and grammatical analyzers (say, tree-based methods). And finally we know of ways to align spoken words with the corresponding transcribed text, which in turn help us tag a text with pauses, specific intonation, speed, tone and even emotional content of the voice. The only thing we need to work on here, is a supervised learner. It would have as input a large body of features, such as contextual and grammatical information, and would decide where to pause and emphasize. There are already some text-to-speech engines that let you include such control tags inside the text, resulting in astonishingly natural speech synthesis, but with a lot of manual trials. I conjecture that an automated method is relatively easy to come up with, even without deep knowledge of speech analysis.

I might start, as a hobby (as if I don’t get enough machine learning already), to look into an implementation of these ideas. It is very likely that people have already looked at this problem. But I’m mostly emphasizing on the fact that supervised learning can be a great game changer here. Let’s see how it pans out…

The vanishing smart, a downward spiral

Here is a scientific hypothesis: statistically speaking (across the globe), the number of children one has, is inversely proportional to one’s IQ level. This is a statistical statement, of course, and does not talk about individual cases (e.g. my advisor, a particularly smart computer scientist, has four kids). This hypothesis has been confirmed again and again in different countries and even on the global scale (read the Wikipedia article), and of course the obvious consequence is a drop in the general IQ level in each generation.

I tend to believe that this “downward spiral in the entire population” tend to get worse over time, for two reasons: One is the obvious effect of the discussed inverse relationship, suggesting an exponentially faster growth in the lower IQ sections of the population, and the other is the ever-increasing unwillingness of smarter people to have large families. Now this is yet to be studied and confirmed, but is nevertheless important for us to notice.

If I am right in this belief, knowing that the resources on the planet are limited, I would (naively) expect the world population to peak (in our lifetimes) and then restructure. One way for a non-trivial smart section of the population to survive would then be a situation in which lower IQ sections have lower survival rates (or otherwise the smart have access to more resources per capita). Massive economical and social gaps if properly aligned with intelligence levels could actually “help”. But this is not the way our societies are structured right now. Smart mathematicians and theoretical physicists are not even well-paid, let alone being the richest. This is very unlikely to change. A more likely scenario is for us to change the way we reproduce…

I’ve heard from particularly smart friends of mine that it’s cruel to bring a child to a world full of suffering. I, for one, tend to agree with this. But then again, if you are smart, it is cruel to the mankind in general not to reproduce. Moral obligations often collide like this. But for the moment, go donate your sperm or egg. If one is to be born anyway, at least let them be smart.

Shame on Sony

I am a PS3 owner, and my account and possibly credit card information was compromised in a recent attack on the PS Network. My Gmail account was accessed about a week after the attack. I still check my credit card usages regularly to make sure it is not being used by others. I was pissed at Sony for storing all the information and even the passwords in plain text. They did not inform us for a couple of weeks after the attack, and their service has been down since then.

These are stupid and rookie mistakes that any decent software developer would have avoided. No password should ever be stored in plain text. In fact, there should not be a way for you as the owner of the service to recover the password, let alone some hacker with limited resources.

Now everyone makes mistakes, I would give them that. They were stupid, had some moron do their work and now they should fire whoever responsible, fix their software and move on. But today, I hear this: Sony hacked again, over a million accounts compromised.

And here is what the hackers had to say:

SonyPictures.com was owned by a very simple SQL injection, one of the most primitive and common vulnerabilities… What’s worse is that every bit of data we took wasn’t encrypted. Sony stored over 1,000,000 passwords of its customers in plain text, which means it’s just a matter of taking it. This is disgraceful and insecure… This is an embarrassment to Sony.

So it seems that their fucked up practice of storing passwords in plain text is not unique to the PS network, and it has not been fixed since the first attack. Give me the code and I’ll do it in 10 minutes! And SQL injection?! Are you kidding me?! Is this really Sony we are talking about? Is this also a “very sophisticated” attack as they claimed for the one in April?! I just have to agree with the hackers:

Why do you put such faith in a company that allows itself to become open to these simple attacks?

A board game for the AI class

I have TA’ed the AI course at McGill a couple of times now. Each semester, we have a tournament at the end of the course between AI agents playing a board game. Implementing a player that uses minimax with alpha-beta pruning is the minimum requirement, and of course students are encouraged to use other methods to improve their agents.

This year we decide to come up with a new game, as the one we used in previous semesters, Breakthrough, is now over a decade old with a lot code and heuristic information available for it on the web. I invented a new checkers-style board game, called “Pushers”. A description of the game with an implementation is available here. I was really pleased with the game at the end and suggest using it for similar courses if you are trying to run a tournament.

These are what I found interesting about Pushers while running the tournament:

  • The branching factor is up to 24. The game averages at around 50 to 60 moves and the best players could only go 10 ply deep. This means that a combination of good pruning and good heuristic is needed.
  • For this game, alpha-beta pruning works great on reducing the branching factor. Some students achieved almost double the search depth (theoretical maximum gain for alpha-beta pruning) with good move orderings.
  • There are many good heuristics, most of which are intuitive and can be grasped by playing the game.
  • Most of the typical methods for two player zero-sum board games, proved to be useful with Pushers. These include: Negamax, Negascout, transposition tables with Zobrist hashing, “killer” heuristic, and iterative deepening for time management and move ordering. Some had even gone further with methods that I had not heard of.
  • Some students managed to find good features and learn linear scoring functions using TD(0)TDLeaf(0) and Genetic Algorithm with surprisingly few games. Given the level of machine learning knowledge for undergraduate students, I was amazed with some of their work.

I ran a full tournament of all possible games (two for each pair of players) with 80 students. The top two players almost tied with only 1 difference in the win count out of 158 games. The interesting point was that one of them used many methods to go deep in the search tree and one had focused more on the heuristic with algorithmic approaches based on dynamic programming. In general, I felt those who worked harder got better results.

Let me know if you need more information on the game and if you are going to use it for your courses.

Heartless bastard

Explaining a disadvantage of his AI player on a board game, an undergrad student writes in his project report:

“Being a soulless machine [...] it would uncaringly defeat even the most pitiful droopy-eyed armless child, playing in the final minutes of its shortened life.”

Reminded me of how my brother purposefully loses to my niece and nephew in card games :)

Then I should be the grandpa

From an undergrad project report on implementing an AI player for a board game:

“We saw our player return its first move, win against other machine players and later win against ourself for the first time, with a certain joy and proudness which is not unlike the one occasioned by parenthood.”

I wish there were more of these in my 600-page pile of TA work.

Note to Self – Honest Research

The process of writing every computer science paper involves a phase in which one (manually) searches in the space of free parameters of the empirical experiments to get the best looking results! Strictly speaking, this is science, as the results are reproducible and probably (statistically) significant. Though, any honest researcher should comment about this exhaustive search phase in the paper, talking about the cases where things do NOT work, as well as those in which the method works.