Skip to content

SXSW music downloader, crawler, and calendar helper for music discovery.

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



62 Commits

Repository files navigation

SXSW Crawler

This is a set of scripts for crawling SXSW and getting music for all the bands. The site format changes every year, different sites change their APIs and shit breaks. There is no guarantee this will work.

However, it's an excellent starting place for music discovery.

I recommend runnning this code mid-Feburary and doing your music listening from feb-march or so. It takes a long time to listen to 1500-2000 songs.

Artists are usually nailed down by then.

What's new?

Here on 2/1/2018, I don't see SXSW posting the artist event times anymore. They claim that showtimes will be available in Feb 2018, but they're not available yet.

The stage1 and stage2 code will focus on getting as much artist information as possible, in order to download music. I'll have to write new code (stage3?) to get event data when it comes up.


I think we've developed an excellent way to discover music at a festival that regularly hosts over 1500 bands. We're going to take a bit of a big-data approach here, and process music faster than any A&R person can.

Much credit for this process goes to jwz who wrote youtubedown and worked on this with me throughout the last 5 years of SXSW.

The process works like this:

  1. Crawl SXSW. Get all of the HTML for music events during the festival.
  2. Break the work queue down by type (soundcloud, raw mp3, youtube)
  3. Download all of the songs in each type using different mechanisms
  4. Feed into iTunes (manual, just drag the folder into iTunes)
  5. Rate with iTunesRater (
  6. Take that library and convert it into a schedule (./
  7. Take that schedule (as an ICS file) and put it into your phone to have during the event.
  8. Go see some damn music.

Caveats: I am not responsible for what you do with these scripts. Most of the music is copyrighted and you shouldn't steal it. Please don't abuse the bandwidth of any of the sites or services involved here.


Python 2.7 required.

FFMPEG required to convert video to mp3.


  easy_install requests
  easy_install soundcloud
  easy_install mutagen
  easy_install lxml
  easy_install ID3 -- or id3-py-1.2/ included in this directory
  easy_install fuzzy   # for sxsw to ical fuzzy matching

If you want to download soundcloud files you will also need Python > 3.0 installed and soundscrape from

youtubedown (get from

  • Make sure to get a current version of this. It should be in your $PATH

You will also need Valid soundcloud API keys. Get them from Soundcloud and put them in a file called make sure the file looks like this:


get_sc_data will use them as part of the soundcloud "best song" determination.


Run the crawl to get data. You should only have to do this once.

  # Crawl the site!

This will create data/queue.txt which everything else will key off of.

  # parse the data set for possible downloads

This will parse the HTML event files and log to determine where the audio files are. Now it's time to download.


Fortunately, SXSW still posts raw MP3s. We have some work to do to get the files named correctly and the ID3 tags right, but it's doable...

  # Get SXSW mp3 files

Now, you should have a big, fat directory (music/sx) full of mp3 files. Run "rename_mp" to rename them from "xxxx.mp3" to "artist - title.mp3" with proper ID3 tags.

The rename script will try to derive the proper artist and title name from the SXSW web pages. If it can't do that it'll fall back to the MP3 ID3 information.

If that doesn't work at all, we'll leave the file alone and you'll be stuck with the nnnn.mp3 filename, but hopefully not. At that point, you might want to resort to either exiftool or iTunes to resolve these issues for you.

Now, get the other file types. Historically, youtube and sound cloud make up a a small fragment of artists available from sxsw.


We'll download any youtube link and convert it into an mp3 using ffmpeg and youtubedown.


Music outputs to music/yt


Make sure you've got your API keys set up as previously described in the Installation section.


Please note that we are now using soundscrape and python3 to get soundcloud files. The prior solution no longer works thanks to Soundcloud API changes.

You also need to know that soundcloud is not issuing new API keys and that that built-in keys that are inside of soundscrape are maxxed out at 15,000 downloads a day across all users of soundscrape. If you edit the soundscrape code ( and replace them with your valid keys, this limit will go away and it might work. Otherwise soundscrape will throw 429 errors all day and you can't download.

See also this issue with soundscrape: when the 429 error is thrown. Miserlou/SoundScrape#203

Music outputs to music/sc

More about the files


(2013,2014) It used to be that they hosted MP3s for all of the bands on the SXSW sites for review. These days there's maybe 40-60 songs on the SXSW site, and the rest are on youtube or otherwise. But, we'll download those directly.

2016 Update: SXSW seems to be hosting most of their music on their own again. Yay!


We can download about 90-100% of youtube files provided youtubedown can break the ridiculous obfuscation that youtube applies to their files. We might miss a few, but we get very, very close.


Far more complicated but still possible. Soundcloud artists do not have songs, they have artist data listed, but we want to hear them to know if we should bother going to the show.

We need to find their most popular song and download it. Assumption: "Most Popular" is the hit song that might sell you on the band. (Who knows!)


Run this ONLY AFTER has finished. This will build the sc metadata catalog.

What do I do after the downloads finish

Import them all into iTunes. (see "The Process" above.)

Make a calendar

After you've imported, rated, and listened to all of the songs, go make your personal calendar.

./ -h

I usually rate 2 stars and 3 stars if I really want to go. Rarely, if ever do I rate 4 or 5 stars unless the band is amazing. Note that the sxsw_to_ical script processes ALL bands in iCal. It does not pick off just the SXSW ratings. If you rate multiple songs for a single band, it will use the HIGHEST rating you've given to that band.

Other Files

There is a bunch of other junk in here that frankly, I forget what they do.* - apparently this was used to dedupe the crawl. dead?* - I think I used this to rename the downloaded music.* - Maybe I used this to find more artists in the crawl* - ??* - tries to fixup MP3* - remove spaces from youtube files* - clean up soundcloud filenames* - first part of the fetcher


SXSW music downloader, crawler, and calendar helper for music discovery.






No releases published


No packages published